7,306 Matching Annotations
  1. Aug 2025
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments 1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet? *Response: The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing. To address this point, we will include additional experiment (see Revision Plan, experiment 1). This experiment will be based on the dissociation and re-aggregation of lens-forming organoids as suggested by the reviewer. To monitor cell type specific sorting, we will employ a lens progenitor reporter line Foxe3::GFP and the retina-specific Rx2::H2B-RFP. If different adhesive activities of lens and retinal progenitor cells are involved and drive the process of cell sorting, dissociation and re-aggregation will result in cell sorting based on their identity. *

      2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids. Response: Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLIFE). We assume that the formation of cup-looking structure of the ocular organoids is mediated by the following processes: establishment of retina and lens domains at the specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of the centrally formed lens towards the organoid periphery through the retina layer, places the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens from the center of the organoid. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2) and follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #2).

      3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Response: Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is indeed an important point. To address which tissue is the target of FGF signaling we will include additional experiments and assess the phosphorylation status of ERK (pERK) and expression of the ERK downstream target pea3, as suggested by the reviewer (see Revision Plan, experiment 3). That will allow to identify the tissue within the organoid responding to the Fgf signaling.

      Lens core size of organoids treated with SU5402 from day 0 to day 1 is fully comparable to the control (please see Figure 6b).

      • *

      4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      Response: That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.* * 5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      *Response: Yes. Figure 5e indicates the thickness of the cell sheet expressing Rx3 that lies on the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript. *

      • *

      6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      Response: What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question. To address the identity of cells differentiated under this condition we will include an additional experiment (see Revision Plan, experiment 4 as suggested by the reviewer). We will check for the expression of Lhx2, Otx2 and Huc/D to address this point.

      I have no minor comments

      **Referees cross-commenting**

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance (Required)):

      Strength: This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation: Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signnaling in organoid tissues.

      Advancement: The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience: The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.



      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Response: Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. We assume that the formation of cup-looking structures of the ocular organoids is mediated by following processes: establishment of retina and lens domains at a specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of centrally formed lenses towards the organoid periphery through the retina layer, place the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect the individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #1).

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens? *Response: The question how is the retinal and lens domain established in this specific manner is indeed intriguing and very interesting. We dedicated a part of the discussion to this topic. We discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments (see Revision Plan, experiment 3) addressing the source and target tissues of FGF and BMP signaling in the organoid will ultimately bring more clarity to our understanding of the tissue arrangements in the organoid. *

      *Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions. *

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      *Response: Lens formation is primarily dependent on acquisition/specification of Foxe3-expressing lens placode progenitors. If those are not present, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). In such conditions, organoids that do not have a lens, do not carry Foxe3-expressing cells. *

      *In the absence of the lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup (for details of such phenotypes please see Zilova et al., 2021, eLIFE). *

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock). How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages? The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids? *Response: We thank the reviewer for pointing this out. We were not clear in the wording and describing of our observation. Indeed, Matrigel is not required for acquisition of lens fate, which can be demonstrated with the expression of lens-specific markers. However, the presence of Matrigel has a profound impact on the structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells into the retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium can indeed negatively impact on the cellular organization and the overall lens structure. To clarify the contribution of the Matrigel to the speed of organoid lens development and to the overall structure of the organoid lens we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #3). *

      *The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes, it will require significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #3) and therefore cannot be addressed in the current manuscript. *

      **Referees cross-commenting** Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance (Required)):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      -The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      *Response: The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes it will require a significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #2) and therefore unfortunately cannot be addressed in the current manuscript. *

      *To clarify the contribution of the Matrigel to the organoid lens development we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #2). * -The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      Response: Yes. The figures show the expression of lens and retinal markers in the embryo in later developmental stages and the timing of their expression can be documented with higher temporal resolution. In the revised version of the manuscript, we will provide the information about the onset of expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens) (see attached figure). Rx3 represents one of the earlies markers labeling the presumptive eye field within the region of the anterior neural plate (S16, late gastrula). FoxE3::GFP expression can be detected within the head surface ectoderm before the lens placode is formed showing that Foxe3 is a suitable marker of placodal progenitors in medaka.

      *We are convinced that the onset of Rx3 and Foxe3-driven reporters is early enough to make the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids. *

      -The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Response: Indeed, addressing the source of BMP and FGF activation would bring more clarity in understanding the mechanism of retina/lens specification within the ocular organoids (cross reference with Reviewer #1). To address this point, we will include additional experiments (see Revision Plan, experiment 3). We will analyze the expression of respective ligands (Bmp4 and Fgf8) and activation of downstream effectors of BMP and FGF signaling pathways within the ocular organoids as suggested by Reviewer #1 and Reviewer #3.

      • *

      -The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this. Response: Following the extruding lens in vivo is indeed very relevant suggestion. To clarify the process of ocular organoid formation in the respect of tissue rearrangements and cell movements, we will include additional experiment (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion (cross-reference with Reviewer #1 and Reviewer #2).

      **Referees cross-commenting**

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance (Required)):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      • The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.
      • The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.
      • The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?
      • The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Significance

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

  2. Jul 2025
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __We thank the reviewers for the supportive suggestions and comments. We have addressed all comments underneath the original text in red. As suggested, we added to line numbers to the text and use these numbers to refer to the changes made. __

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript is well written and presents solid data, most of which is statistically analyzed and sound. Given that the author's previous comprehensive publications on seipin organization and interactions, it might be beneficial (particularly in the title and abstract) to emphasize that this manuscript focuses on the metabolic regulation of lipid droplet assembly by Ldb16, to distinguish it from previous work. Perhaps one consideration, potentially interesting, involves changes in lipid droplet formation under the growth conditions used for galactose-mediated gene induction.

      We thank the reviewer for the supportive comments and suggestions.

      Comments: (1) Fig. 3 and 4. The galactose induction of lipid droplet biogenesis in are1∆/2∆ dga1∆ lro1∆ cells though activation of a GAL1 promoter fusion to DGA1 is a sound approach for regulating lipid droplet formation. Although unlikely, carbon sources can impact lipid droplet proliferation and (potentially interesting) metabolic changes under growth in non-fermentable carbon sources may impact lipid droplet biogenesis; in fact, oleate has significant effects (e.g. PMID: 21422231; PMID: 21820081). The GAL1 promoter is a very strong promoter and the overexpression of DGA1 via this heterologous promoter might itself cause unforeseen changes. Affirmation of the results using another induction system might be beneficial.

      We thank the reviewer for these suggestions. In this study we focused on the organisation of the yeast seipin complex during the process of LD formation. We chose to use galactose-based induction of Dga1 because this is a well-established and widely used assay in the field, extensively characterized by many groups over the years. The tight control it provides, enabling synchronous and rapid LD induction, makes it the method of choice for many researchers. Importantly, the LDs formed using this assay are morphologically normal and involve the same components as LDs formed under other conditions.

      Regarding the role of metabolism in LD formation, it is worth noting that galactose is metabolized by yeast primarily through fermentation, following its conversion to UDP-glucose. Therefore, its use does not involve drastic metabolic changes. The impact of metabolism in LD biogenesis is an interesting question but it falls beyond the scope of the current study.

      (2) Fig. 3B. Although only representative images are shown, the panel convincingly shows that lipid droplets do form upon galactose induction of DGA1 in are1∆/2∆ dga1∆ lro1∆ cells. However, it does not show to what extent. Are lipid droplets synthesized at WT levels? How many cells were counted? How many lipid droplets per cell? Is there a statistical difference with respect to WT cells?

      We did not assess these parameters in this study. The aim of the study was to assess the relations between components of the seipin complex with and without lipid droplets. For this purpose, inducing lipid droplet formation over a 4-hour period was sufficient to address that specific question. As mentioned above, LDs formed using this assay are morphologically normal and involve the same components as LDs formed under other conditions. This being said, it is known that prolonged overexpression of Dga1 (> 12hours) can lead to enlarged LDs.

      (3) Fig. 2D. It is not clear how standard deviation can be meaningfully applied to two data points, let alone providing a p-value. For some of these experiments, triplicate trials might provide a more robust statistical sampling.

      We thank the reviewer for this suggestion. We have added 2 more repeats to the Co-IP in figure 2.

      Reviewer #1 (Significance (Required)):

      Klug and Carvalho report on the lipid droplet architecture of the yeast seipin complex. Specifically, the mechanism of yeast seipin Sei1 binding to Ldo16 and the subsequent recruitment of Ldb45 is analyzed. These results follow from a recent publication (PMID: 34625558) from the same authors and aims to define a more precise role for the components of the seipin complex. Using photo-crosslinking, Ldo45 and Ldo16 interactions are analyzed in the context of lipid droplet assembly.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Klug and Carvalho apply a photo-crosslinking approach, which has been extensively used in the Carvalho group, to investigate the subunit interactions of the seipin complex in yeast. The authors apply this approach to further study possible changes within the seipin complex following induction of neutral lipid synthesis and lipid droplet (LD) formation. The authors propose that Ldo45 makes contact with Ldb16 and that the seipin complex subunits assemble even in the absence of LDs.

      Major comments:

      Overall, this is a focused and well-executed study on one of the fundamental structural components of LDs. The study addresses the subunit interactions of the seipin complex but does not look into their functional consequences, for example how the mutations on Ldb16 that affect its interaction with Ldo45, influence LD formation; similarly, the authors make the interesting observation that Ldo16 may be differentially affected by the lack of neutral lipids (Fig. 3A) but this observation is not explored.

      We thank the reviewer for this comment. The Ldb16 mutations analyzed in this study have been previously characterized by us (see Klug et al., 2021 – Figure 3) and exhibit a mild defect in lipid droplet (LD) formation. This phenotype is unlikely to result from impaired Ldo16/45 recruitment, as deletion of Ldo proteins causes only a very mild effect on LD formation (as shown in Teixeira et al., 2018 and Eisenberg-Bord et al., 2018).

      We agree that the differential effect on Ldo proteins by the absence of neutral lipids is particularly interesting. However, its exploration falls outside of the scope of the current study and should be thoroughly investigated in the future.

      1. For the crosslinking pull-downs (Fig. 1), it seems that the authors significantly overexpress (ADH1 promoter) the Ldb16 subunit that carries the various photoreactive amino acid residues, while keeping the other (tagged) seipin complex members at endogenous levels. Would not this imbalance affect the assembly of the complex and therefore the association of the different subunits with each other?

      We thank the reviewer for this comment. The in vivo site-specific crosslinking is highly sensitive methodology to detect protein-protein interactions in a position-dependent manner. However, one of the caveats of the approach is the low efficiency of amber stop codon suppression and BPA incorporation. To mitigate this limitation, we (and others) induce the expression of the amber-containing protein (in this case Ldb16) from a strong constitutive promoter such as ADH1. Therefore, despite using a strong promoter, the overall levels of LDB16 remain comparable to endogenous levels due to the inherently low efficiency of amber suppression. Moreover, it is known that when not bound to Sei1, Ldb16 is rapidly degraded in a proteasome dependent manner (Wang, C.W. 2014), further preventing its accumulation.

      Although the authors do show delta4 cells with no LDs (Fig. 3B, 0h), galactose-inducible systems in yeast are known to be leaky. Given that the authors' conclusion that the complex is "pre-assembled" irrespective of the addition of galactose, I think it would be important to confirm biochemically that there is no neutral lipid at time point 0. Alternatively, it may be better to simply compare wt vs dga1 lro1 or are1are2 mutants - there is no need for GAL induction since the authors look at one time point only.

      Among the various regulable promoters, GAL1 shows a superior level of control. For example, expression of essential genes from GAL1 promoter frequently leads to cell death in glucose containing media, a condition that represses GAL1 promoter. Having said this, we cannot exclude that minute amounts of DGA1 are expressed prior galactose induction. However, if this is the case, the resulting levels of TAG are insufficient to be detected by sensitive lipid dyes and to induce LDs, as noted by the reviewer. Therefore, we believe our conclusions remain valid. This is consistent that we use in the text, where we refer to LD formation rather than complete loss of neutral lipids. To make this absolutely clear we replaced the word “presence” to “abundance” in line 236.

      Lastly, we do not agree with the reviewer that using double mutants (are1/2 or dga1/lro1 mutants) would be sufficient since these mutations are not sufficient to abolish LD formation – a key aspect of this study. The GAL1 system allows us to monitor 2 time points in the same cells –no LDs (time 0h) and with LDs (Time 4h). The system proposed by the reviewer would only allow a snap shot of steady state levels in different cells rather than within the same cell culture.

      Some methodological issues could be better detailed. For example, which of the three delta4 strains was used to induce neutral lipid in Fig. 4B? How exactly were the quantifications in Fig. 4D performed (I assume they were done under non-saturating band intensity conditions, as for some residues it is difficult to conclude whether the blot aligns with the quantification results).

      We thank the reviewer for these comments. We have clarified the strain number in the figure legend of figure 4B (strain yPC12630).

      We have also added the following text in rows 437-441 in the methods section: “Reactive bands were detected by ECL (Western Lightning ECL Pro, Perkin Elmer #NEL121001EA), and visualized using an Amersham Imager 600 (GE Healthcare Life Sciences). Data quantification was performed using Image Studio software (Li-Cor) to measure line intensity under non saturating conditions.”

      "our findings support the notion that Ldo45 is important for early steps of LD formation as previously proposed" I find this statement confusing given that the authors claim that Ldo45 is already bound to the complex before LD formation.

      We thank the reviewer for raising this important point. We believe that our findings support previous hypotheses on the role of Ldo45. It has been suggested that Ldo45 is important for the early stages of lipid droplet (LD) formation (Teixeira et al., 2018; Eisenberg-Bord et al., 2018). As such, Ldo45 would need to be recruited to the seipin complex before or at the onset of LD formation. The observation that Ldo45 is present at the complex prior to LD formation provides strong support for its role in the initial steps of this process.

      To clarify this idea in the manuscript, we have revised the sentence on line 310 as follows:

      “Irrespective of the mechanism, our findings support the notion that Ldo45 plays a role in the early steps of LD formation, as previously proposed…”

      The model in Fig. 5 is essentially the same as the one shown in Fig. 1G.

      To aid the reader and avoid confusion, we intentionally used a similar color scheme throughout the manuscript. This may contribute to the perception that the figures are very similar. However, there are clear distinctions between them. In Figure 1G, we summarize our findings regarding the positioning of Ldo45 within the complex and note that we do not yet have data on Ldo16. Building upon these findings, in Figure 5 we speculate where Ldo16 might interact with Ldb16 and highlight that recruitment of both Ldo16 and Ldo45 increases with neutral lipid availability.

      Therefore, we believe that both figures serve distinct and complementary purposes, and that each is useful for communicating our overall message.

      Minor comments

      In the pull-downs in Fig. 2C, it seems that full-length Ldb16 is not enriched after the FLAG IP. What is the reason of this?

      We thank the reviewer for raising this interesting aspect. We do not know why this occurs, but it is clear that full length Ldb16 is not efficiently pulled down. We could speculate that this has to do with access to the FLAG moiety at the C terminus that may become inaccessible due to interactions or folding in the long unstructured C-terminus of Ldb16. This might explain why when we truncate the C terminus in the 1-133 mutant we achieve a more efficient IP.

      At the blots at Fig. 2C and 3A, the anti-Dpm1 Ab seems to recognize in the IP fractions a band labelled as non-specific, however this band is absent from the input.

      We thank the reviewer for raising this. This non-specific band is the light chain of the antibody used in the pull down that detaches from the matrix during elution – thus not found in the input. This is a common non-specific band that appears in Co-IP blots.

      Reviewer #2 (Significance (Required)):

      Regulation of seipin function is essential for proper LD biogenesis in eukaryotes, so this study addresses a fundamental question in the field. As stated above some functional analysis that goes beyond the biochemistry would be beneficial. There is some overlap with a recently published paper from the Wang group that also examines the assembly of seipin in yeast.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Klug and Carvalho investigates the interaction of the yeast seipin complex (Sei1 and Ldb16) with Ldo45 and Ldo16. Using a site-specific photocrosslinking approach, the authors map some residues of the seipin complex in contact Ldo45, demonstrating that Ldo45 likely binds to Ldb16 in the center of the Sei1-Ldb16 complex. They find that both Ldo45 and Ldo16 copurify with Ldb16. Complex assembly is demonstrated to occur independently of the presence of neutral lipids. An Ldb16 mutant, harbouring the transmembrane domain (1-133) but lacking the cytosolic region (previously shown to allow normal LD formation and still bind to Sei1) showed photocrosslinks with Ldo45, but not Ldo16. No crosslinks between Sei1 and either Ldo45 or Ldo16 were detected.

      Major: 1. Figure 2 shows CoIPs using different Ldb16 mutants/truncations to test for binding of Ldo45 and Ldo16. Both Ldo16 and Ldo45 copurify with full length Ldb16. Loss of the cytosolic part of Ldb16 strongly reduced binding of both Ldo45 and Ldo16, indicating that the TM-Helix-TM domain of Ldb16 (1-133) alone is not sufficient for proper binding of Ldo45 or Ldo16. The quantifications (2D and 2E) presented for this CoIP represent a n=2 with mean, standard deviation and statistics. To be a meaningful statistical analysis, the authors need to increase their n to at least n=3. In addition, they refer to the statistics they use here as "two-sided Fischer's T-test" in the respective Figure legend. To my knowledge, there is no such test, either it is Student's T-test or Fischer's exact test? Can the authors please clarify?

      We thank the reviewer for this comment and suggestions. We have now included 2 additional repeats for this experiment and the results essentially support our conclusion.

      The two-sided Fischer’s T-test is the name of the test in Graphpad- Prism. We wanted to acknowledge the test name so that the reader can trace the exact test we used in the program.

      1. Figure 2E shows the same data as 2D with different normalization to highlight the differences between binding to the domain 1-133 per se and binding to this domain when the linker helix is mutated. These mutations seem to cause a further decrease in binding of both Ldo45 and Ldo16. Still, effects are rather small, and the n=2 does not allow any meaningful statistical tests. To make this point, the authors should increase their sample number (at least n=3) to show that this difference is indeed meaningful and to allow statistical analysis.

      We thank the reviewer for this comment and suggestions. We have now included 2 additional repeats for this experiment and the results essentially support our conclusion.

      For Ldo16, no crosslinks were detected with Ldb16 TM-HelixTM domain (Figure 1). In line, CoIP demonstrated that the interaction between Ldo16 and Ldb16 was strongly reduced when the Ldb16 domain 1-133 was used for IP. Still, additional mutation of the linker helix in this 1-133 domain further reduced this interaction (to a similar extend as for Ldo45). Could the authors please clarify why the additional mutations in the linker helix region also decreased the binding of Ldo16, though the authors conclude from their crosslinking approach in Fig. 1 that Ldo16 does not interact with this region?

      We thank the reviewer for raising this point. Our negative crosslinking results for Ldo16 do not exclude the possibility of binding to that region; rather, they indicate that we were unable to detect Ldo16 there. Additionally, mutations in the linker helix may influence how Ldb16 interacts with seipin, including its positioning within the seipin ring and the membrane bilayer. These structural changes could, in turn, affect Ldo16 recruitment in ways that we do not fully understand.

      Similarly, also in 4D, a quantification with n=2 is presented, showing that some of the crosslinks are more prominently detectable when LD biogenesis is induced. The findings of this manuscript are completely based on results obtained with CoIP and photocrosslinking, and quantification of a sufficient n to allow statistical analysis will be essential.

      While we agree that additional experiments are useful for the Co-IP because of variability between experiments, this is less of a concern for the photocrosslinking experiments. In the case of photocrosslinking, we typically see much less variability and normally, for a given position, the effects are much more “black and white”- either there is a crosslink or not.

      Why is there nowhere a blot with crosslinked Ldb16 bands shown (but only non-crosslinked Ldb16, e.g. Fig. 1C)?

      We thank the reviewer for this comment. In all cases the amount of crosslinked product is very minor. This is particularly obvious in the case of Ldb16, where the non-crosslinked species dominates in the blots (as can be observed in figure S1B).

      Figure 3: The authors conclude that galactose-induced expression of either Dga1, Lro1 or Are1 in cells lacking all four enzymes for neutral lipid synthesis (quadruple deletion mutant) increases the levels of Ldb16. However, I do not see any difference on the FLAG-Ldb16 blot when comparing Ldb16 levels in the quadruple deletion mutant with or without Dga1, Lro1 or Are1, and no quantification is presented that might reveal very subtle differences not visible on the blot.

      We agree with the reviewer and modified the text to more accurately describe our results.

      OPTIONAL: Have the authors considered to assess which sites/domains of Ldo45 and Ldo16 are employed to bind to Ldb16?

      This is a logical next step that will be undertaken in a future study.

      Minor: 1. Page numbers would have been helpful to refer to specific text sections.

      Page numbers have been added

      1. Figure 3C: Unclear to me why the authors label a part of their immunoblot where they detected HA with OSW5?

      This was a mistake and has been corrected

      1. Figure 4D and corresponding figure legend could be improved in respect to labeling to clarify.

      we have added an X axis label and made extra clarifications in the legend

      1. Please correct his sentence: "These variants we expressed in cells where the other subunits of the Sei1 complex were epitope tagged to facilitate detection and expressed their endogenous loci."

      This sentence has been corrected

      Reviewer #3 (Significance (Required)):

      This is a short and interesting study completely based on UV-induced site-specific photocrosslinking and CoIPs that provides some new insights into the interaction surface between the Seipin complex and Ldo45 and the interaction between Ldo16 and Ldb16. Though in parts still premature, these findings will likely be of interest to the large community interested in lipid metabolism, expanding the role of Ldb16 from neutral lipid binding to regulator recruitment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough review of our manuscript and your valuable suggestions. Here are our responses to each point you raised:

      (1) Novelty: Exploring the feasibility of extending the risk-scoring model to diverse cancer types could emphasize the broader impact of the research.

      Thank you so much for your thoughtful and insightful feedback. Your suggestion to explore extending the risk-scoring model to diverse cancer types is truly valuable and demonstrates your broad vision in this field. We deeply appreciate your interest in our research and the effort you put into providing such constructive input.

      After careful consideration, we have decided to focus our current study on the specific cancer type(s) we initially set out to explore. This decision was made to ensure that we can thoroughly address the research questions at hand, given our current resources, time constraints, and the complexity of the topic. By maintaining this focused approach, we aim to achieve more in-depth and reliable results that can contribute meaningfully to the understanding of this particular area.

      However, we fully recognize the potential significance of your proposed direction and firmly believe that it could be an excellent avenue for future research. We will definitely keep your suggestion in mind and may explore it in subsequent studies as our research progresses and evolves.

      (2) Improvement in Figure Presentation: The inconsistency in font formatting across figures, particularly in Figure 2 (A-D, E, F-H, I), Figure 3 (A-C, D-J, H, K), and the distinct style change in Figure 5, raises concerns about the professionalism of the visual presentation. It is recommended to standardize font sizes and styles for a more cohesive and visually appealing layout. This ensures that readers can easily follow and comprehend the graphical data presented in the article.

      The text in the picture has been revised as requested.

      (3) Enhancing Reliability of Immune Cell Infiltration Data: Address the potential limitations associated with relying solely on RNASeq data for immune cell infiltration analysis between ICD and ICD high groups in Figure 2. It is advisable to discuss the inherent challenges and potential biases in this methodology. To strengthen the evidence, consider incorporating bladder cancer single-cell sequencing data, which could provide a more comprehensive and reliable understanding of immune cell dynamics within the tumor microenvironment.

      Thank you very much for your meticulous review and the highly constructive suggestions. Your insight regarding the limitations of relying on RNASeq data for immune cell infiltration analysis and the proposal to incorporate bladder cancer single-cell sequencing data truly reflect your profound understanding of the field. We deeply appreciate your efforts in guiding our research and the valuable perspectives you've offered.

      After careful deliberation, given our current research scope, timeline, and available resources, we've decided to focus on further discussing and addressing the challenges and biases inherent in RNASeq-based immune cell infiltration analysis. By delving deeper into the methodological limitations and conducting more in-depth statistical validations, we aim to provide a comprehensive and reliable interpretation of the data within our study framework. This focused approach allows us to maintain the integrity of our original research design and deliver robust findings on the relationship between immune cell infiltration and ICD in the current context.

      However, we fully acknowledge the significant value of your proposed single-cell sequencing approach. It is indeed a powerful method that could offer more detailed insights into immune cell dynamics, and we believe it holds great promise for future research in this area. We will keep your suggestion in mind as an important direction for potential future studies, especially when we plan to expand and deepen our exploration of the tumor microenvironment.

      (4) Clarity in Data Sources and Interpretation of Figure 5: In the results section, provide a detailed and transparent explanation of the sources of data used in Figure 5. This includes specifying the databases or platforms from which the chemotherapy, targeted therapy, and immunotherapy data were obtained. Additionally, elucidate the rationale behind the chosen data sources and how they contribute to the overall interpretation of the study's findings. And, strangely, these immune-related genes are associated with cancer sensitivities to different targeted therapies.

      Thank you very much for your detailed and valuable feedback on Figure 5. We sincerely appreciate your careful review and insightful suggestions, which have provided us with important directions for improvement.

      Regarding the data sources in Figure 5, we used the pRRophetic algorithm to conduct a drug sensitivity analysis on the TCGA database. The reason for choosing these data sources is multi - faceted. Firstly, these databases and platforms are well - established and widely recognized in the field. They have strict data collection and verification processes, ensuring the accuracy and reliability of the data. For example, TCGA has a large - scale, long - term - accumulated chemotherapy case database, which can comprehensively reflect the clinical application and treatment effects of various chemotherapeutic drugs.

      Secondly, these data sources cover a wide range of cancer types and patient information, which can meet the requirements of our study's diverse sample size and variety. This comprehensiveness enables us to conduct a more in - depth and representative analysis of the relationships between different therapies and immune - related genes.

      In terms of the overall interpretation of the study's findings, the use of these data sources provides a solid foundation. The accurate chemotherapy, targeted therapy, and immunotherapy data help us clearly demonstrate the associations between immune - related genes and cancer sensitivities to different treatments. This allows us to draw more reliable conclusions and provides a scientific basis for understanding the complex mechanisms of cancer treatment from the perspective of immune - gene - therapy interactions.

      As for the unexpected association between immune - related genes and cancer sensitivities to different targeted therapies, this is indeed a fascinating discovery. In our analysis, we hypothesized that immune - related genes may affect the tumor microenvironment, thereby influencing the response of cancer cells to targeted therapies. Although this finding is currently beyond our initial expectations, it has opened up a new research direction for us. We will further explore and verify the underlying mechanisms in future research.

      Once again, thank you for your guidance. We will make corresponding revisions and improvements according to your suggestions to make our research more rigorous and complete.

      (5) Legends and Methods: Address the brevity and lack of crucial details in the figure legends and methods section. Expand the figure legends to include essential information, such as the number of samples represented in each figure. In the methods section, provide comprehensive details, including the release dates of databases used, versions of coding packages, and any other pertinent information that is crucial for the reproducibility and reliability of the study.

      We would like to express our sincere gratitude for your valuable feedback on the figure legends and methods section of our study. We highly appreciate your sharp observation of the issues regarding the brevity and lack of key details, which are crucial for further improving our research.

      We have supplemented the methods section with data including the number of samples, the release dates of the databases used, and the versions of the coding packages, etc. For TCGA samples: 421 tumor samples and 19 normal samples.Database release date: March 29, 2022, v36 versions.Coding package version: R version 4.1.1.We will immediately proceed to supplement these key details, making the research process and methods transparent. This will allow other researchers to reproduce our study more accurately and enhance the persuasiveness of our research conclusions.

      (6) Evidence Supporting Immunotherapy Response Rates: The importance of providing a robust foundation for the conclusion regarding lower immunotherapy response rates. Strengthen this section by offering a more detailed description of sample parameters, specifying patient demographics, and presenting any statistical measures that validate the observed trends in Figure 5Q-T. More survival data are required to conclude. Avoid overinterpretation of the results and emphasize the need for further investigation to solidify this aspect of the study.

      Thank you very much for your professional and meticulous feedback on the content related to immunotherapy response rates in our study! Your suggestions, such as providing a solid foundation for the conclusions and supplementing key information, are of great value in enhancing the quality of our research, and we sincerely appreciate them.

      The data in Figures 5Q to T are from the TCGA database, which has already been provided. The statistical measure used for Figures 5Q to T is the P-value, which has been marked in the figures. The survival data have been provided in Figure 3D.

      Reviewer #2 (Recommendations for the authors):

      Thank you for your thorough review of our manuscript and your valuable suggestions. Here are our responses to each point you raised:

      (1) There is no information on the samples studied. Are all TCGA bladder cancer samples studied? Are these samples all treatment naïve? Were any excluded? Even simply, how many samples were studied?

      Thank you so much for pointing out the lack of sample - related information. Your attention to these details has been extremely helpful in identifying areas for improvement in our study.

      All the samples in our study were sourced from the TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Immunome Atlas) databases. It should be noted that the patient data in the TCIA database are originally from the TCGA database. Regarding whether the patients received prior treatment, this information was not specifically mentioned in our current report. Instead, we mainly relied on the scores of the prediction model for evaluation. Since all samples were obtained from publicly available databases, we understand the importance of clarifying their origin and characteristics.

      We sincerely apologize for the omission of the sample size and other relevant details. We will promptly supplement this crucial information in the revised version, including a detailed description of the sample sources and any relevant characteristics. This will ensure greater transparency and help readers better understand the basis of our research.

      For TCGA samples: 421 tumor samples and 19 normal samples.Database release date: March 29, 2022, v36 versions.Coding package version: R version 4.1.1.

      (2) What clustering method was used to divide patients into ICD high/low? The authors selected two clusters from their "unsupervised" clustering of samples with respect to the 34 gene signatures. A Delta area curve showing the relative change in area under the cumulative distribution function (CDF) for k clusters is omitted, but looking at the heatmap one could argue there are more than k=2 groups in that data. Why was k=2 chosen? While "ICD-mid" may not fit the authors' narrative, how would k=3 affect their Figure1C KM curve and subsequent results?

      Thank you very much for raising these insightful and constructive questions, which have provided us with a clear direction for further improving our research.

      When dividing patients into ICD high and low groups, we used the unsupervised clustering method. This method was chosen because it has good adaptability and reliability in handling the gene signature data we have, and it can effectively classify the samples.

      Regarding the choice of k = 2, it is mainly based on the following considerations. Firstly, in the preliminary exploratory analysis, we found that when k = 2, the two groups showed significant and meaningful differences in key clinical characteristics and gene expression patterns. These differences are closely related to the core issues of our study and help to clearly illustrate the distinctions between the ICD high and low groups. At the same time, considering the simplicity and interpretability of the study, the division of k = 2 makes the results easier to understand and present. Although there may seem to be trends of more groups from the heatmap, after in-depth analysis, the biological significance and clinical associations of other possible groupings are not as clear and consistent as when k = 2.

      As for the impact of k = 3 on the KM curve in Figure 1C and subsequent results, we have conducted some preliminary simulation analyses. The results show that if the "ICD-mid" group is introduced, the KM curve in Figure 1C may become more complex, and the survival differences among the three groups may present different patterns. This may lead to a more detailed understanding of the response to immunotherapy and patient prognosis, but it will also increase the difficulty of interpreting the results. Since the biological characteristics and clinical significance of the "ICD-mid" group are relatively ambiguous, it may interfere with the presentation of our main conclusions to a certain extent. Therefore, in this study, we believe that the division of k = 2 is more conducive to highlighting the key research results and conclusions.

      Thank you again for your valuable comments. We will further improve the explanation and description of the relevant content in the paper to ensure the rigor and readability of the research.

      (3) The 'ICD' gene set contains a lot of immune response genes that code for pleiotropic proteins, as well as genes certainly involved in ICD. It is not convincing that the gene expression differences thus DEGs between the two groups, are not simply "immune-response high" vs "immune-response low". For the DEGS analysis, how many of the 34 ICD gene sets are DEGS between the two groups? Of those, which markers of ICD are DEGs vs. those that are related to immune activation?

      a. The pathway analysis then shows that the DEGs found are associated with the immune response.

      b. Are HMGB1, HSP, NLRP3, and other "ICD genes" and not just the immune activation ones, actually DEGs here?

      c. Figures D, I-J are not legible in the manus.

      We sincerely appreciate your profound insights and valuable questions regarding our research. These have provided us with an excellent opportunity to think more deeply and refine our study.

      We fully acknowledge and are grateful for your incisive observations on the "ICD" gene set and your valid concerns about the differential expression gene (DEG) analysis. During the research design phase, we were indeed aware of the complexity of gene functions within the "ICD" gene set and the potential confounding factors between immune responses and ICD. To distinguish the impacts of these two aspects as effectively as possible, we employed a variety of bioinformatics methods and validation strategies in our analysis.

      Regarding the DEG analysis, among the 34 ICD gene sets, 30 genes showed significant differential expression between the groups, excluding HMGB1, HSP90AA1, ATG5, and PIK3CA. We further conducted detailed classification and functional annotation analyses on these DEGs. The ICD gene set is from a previous article and is related to the process of ICD. Relevant literature is in the materials section. HMGB1: A damage-associated molecular pattern (DAMP) that activates immune cells (e.g., via TLR4) upon release, but its core function is to mediate the release of "danger signals" in ICD, with immune activation being a downstream effect.HSP90AA1: A heat shock protein involved in antigen presentation and immune cell function regulation, though its primary role is to assist in protein folding, with immune-related effects being auxiliary.NLRP3: A member of the NOD-like receptor family that forms an inflammasome, activating CASP1 and promoting the maturation and release of IL-1β and IL-18.Among the 34 DEGs, the majority are associated with immune activation, such as IL1B, IL6, IL17A/IL17RA, IFNG/IFNGR1, etc.

      (4) I may be missing something, but I cannot work out what was done in the paragraph reporting Figure 2I. Where is the ICB data from? How has this been analysed? What is the cohort? Where are the methods?

      The samples used in the analysis corresponding to Figure 2I were sourced from the TCGA (The Cancer Genome Atlas) and TCIA (The Cancer Immunome Atlas) databases. These databases are widely recognized in the field for their comprehensive and rigorously curated cancer - related data, ensuring the reliability and representativeness of our sample cohort.

      Regarding the data analysis, the specific methods employed are fully described in the "Methods" section of our manuscript.

      (5) How were the four genes for your risk model selected? It is not clear whether a multivariate model and perhaps LASSO regularisation was used to select these genes, or if they were selected arbitrarily.

      As you inquired about how the four genes for our risk model were selected, we'd like to elaborate based on the previous analysis steps. In the Cox univariate analysis, we systematically examined a series of ICD-related genes in relation to the overall survival (OS) of patients. Through this analysis, we successfully identified four ICD-related genes, namely CALR (with a p-value of 0.003), IFNB1 (p = 0.037), IFNG (p = 0.022), and IF1R1 (p = 0.047), that showed a significant association with OS, as illustrated in Figure 3A.

      Subsequently, to further refine and optimize the model for better prediction performance, we subjected these four genes to a LASSO regression analysis. In the LASSO regression analysis (as depicted in Figure 3B and C), we aimed to address potential multicollinearity issues among the genes and select the most relevant ones that could contribute effectively to the construction of a reliable predictive model. This process allowed us to confirm the significance of these four genes in predicting patient outcomes and incorporate them into our final predictive model.

      (6) How related are the high-risk and ICD-high groups? It is not clear. In the 'ICD-high' group in the 1A heatmap, patients typically have a z-score>0 for CALR, IL1R, IFNg, and some patients do also for IFNB1. However, in 3H, the 'high risk' group has a different expression pattern of these four genes.

      Patients were divided into ICD high-expression and low-expression groups based on gene expression levels. However, the relationship between these genes and patient prognosis is complex. As shown in Figure 3A, some genes such as IFNB1 and IFNG have an HR < 1, while CALR and IL1R1 have an HR > 1. Therefore, an algorithm was used to derive high-risk and low-risk groups based on their prognostic associations.

      (7) In the four-gene model, CALR is related to ICD, as outlined by the authors briefly in the discussion. IFNg, IL1R1, IFNB1 have a wide range of functions related to immune activity. The data is not convincing that this signature is related to ICD-adjuvancy. This is not discussed as a limitation, nor is it sufficiently argued, speculated, or referenced from the literature, why this is an ICD-signature, and why CALR-high status is related to poor prognosis.

      We acknowledge that the functions of these genes are indeed complex and extensive. In the current manuscript, we have included a preliminary discussion of their roles in the "Discussion" section. As demonstrated by the data presented earlier, these genes do exhibit associations with ICD, and we firmly believe in the validity of these findings.

      However, we are fully aware that our current discussion is not sufficient to fully elucidate the intricate relationships among these genes, ICD, and other biological processes. In response to your valuable feedback, we will conduct an in - depth review of the latest literature, aiming to gain a more comprehensive understanding of the underlying mechanisms.

      (8) Score is spelt incorrectly in Figures 3F-J.

      Figures 3F-J have been revised as requested.

      (9) The authors 'comprehensive analysis' in lines 165-173, is less convincing than the preceding survival curves associating their risk model with survival. Their 'correlations' have no statistics.

      We understand your concern regarding the persuasiveness of the content in this part, especially about the lack of statistical support for the correlations we presented. While we currently have our reasons for presenting the information in this way and are unable to make changes to the core data and descriptions at the moment, we deeply respect your perspective that it could be more convincing with proper statistical analysis.

      (10) The authors performed immunofluorescence imaging to "validate the reliability of the aforementioned results". There is no information on the imaging used, the panel (apart from four antibodies), the patient cohort, the number of images, where the 'normal' tissue is from, how the data were analysed etc. This data is not interpretable without this information.

      a. Is CD39 in the panel? CD8, LAG3? It's not clear what this analysis is.

      The color of each antibody has been marked in Fig 2B. The cohort information and its source have been supplemented. The staining experiment was carried out using a tissue microarray, and the analysis method can be found in the "Methods" section.Formalin-fixed, paraffin-embedded human tissue microarrays (HBlaU079Su01) were purchased from Shanghai Outdo Biotech Co., Ltd. (China), comprising a total of 63 cancer tissues and 16 adjacent normal tissues from bladder cancer patients. Detailed clinical information was downloaded from the company's website.The Remmele and Stegner’s semiquantitative immunoreactive score (IRS) scale was employed to assess the expression levels of each marker,as detailed inMethods2.5.CD39, CD8, and LAG3 were also stained, but the results were not presented.

      (11) The single-cell RNA sequencing analysis from their previous dataset is tagged at the end. CALR expression in most identified cells is interesting. Not clear what this adds to the work beyond 'we did scRNA-seq'. How were these data analysed? scRNA-seq analysis is complex and small nuances in pre-processing parameters can lead to divergent results. The details of such analysis are required!

      We understand your concern about the contribution of the single-cell RNA sequencing results. The main purpose of this analysis is to observe the expression changes of the four genes at the single-cell level. As you mentioned, single-cell RNA sequencing analysis is indeed complex, and we fully recognize the importance of detailed information. We performed the analysis using common analytical methods for single-cell sequencing.It has been supplemented in the Methods section.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Below is a point-by-point response to reviewers concerns.

      Main changes are colored in red in the revised manuscript.

      Reviewer #1 (Significance (Required)):

      General assessment:

      This study provides a valuable computational framework for investigating the dynamic interplay between DNA replication and 3D genome architecture. While the current implementation focuses on Saccharomyces cerevisiae, whose genome organization differs significantly from mammalian systems.

      Advance: providing the first in vivo experimental evidence in investigating the role(s) of Cohesin and Ctf4 in the coupling of sister replication forks.

      Audience: broad interests; including DNA replication, 3D genome structure, and basic research

      Expertise: DNA replication and DNA damage repair within the chromatin environment.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      By developing a new genome-wide 3D polymer simulation framework, D'Asaro et al. investigated the spatiotemporal interplay between DNA replication and chromatin organization in budding yeast: (1) The simulations recapitulate fountain-like chromatin patterns around early replication origins, driven by colocalized sister replication forks. These findings align with Repli-HiC observations in human and mouse cells, yet the authors advance the field by demonstrating that these patterns are independent of Cohesin and Ctf4, underscoring replication itself as the primary driver. (2) Simulations reveal a replication "wave" where forks initially cluster near the spindle pole body (SPB) and redistribute during S-phase. While this spatial reorganization mirrors microscopy-derived replication foci (RFis), discrepancies in cluster sizes compared to super-resolution data suggest unresolved mechanistic nuances. (3) Replication transiently reduces chromatin mobility, attributed to sister chromatid intertwining rather than active forks.

      This work bridges replication timing, 3D genome architecture, and chromatin dynamics, offering a quantitative framework to dissect replication-driven structural changes. This work provides additional insights into how replication shapes nuclear organization and vice versa, with implications for genome stability and regulation.

      We thank Reviewer 1 for her/his enthusiasm and her/his comments that help us to greatly improve the manuscript.

      However, the following revisions could strengthen the manuscript:

      Major:

      Generalizability to Other Species While the model successfully recapitulates yeast replication, its applicability to larger genomes (e.g., mammals) remains unclear. Testing the model against (Repli-HiC/ in situ HiC, and Repli-seq) data from other eukaryotes (particularly in mammalian cells) could enhance its broader relevance.

      We agree with the reviewer that testing the model in higher eukaryotes would be highly informative. The availability of Repli-HiC on one hand and higher resolution microscopy on the other could enable insightful quantitative analyses. With our formalism, it is in principle already possible to capture realistic 1D replication dynamics as the integrated mathematical formalism (by Arbona et al. ref. [63]) was already used to model human genome S-phase. In addition, the formalism developed for chain duplication is generic and can be contextualized to any species. However, when addressing the problem in 3D, we would likely require including other crucial structural features such as TADs or compartments. Such a model would require an extensive characterization worthy of its own publication. These considerations are now mentioned in the Discussion as exciting future perspectives (Page 17).

      On the other hand, we would like to highlight that, while very minimal in many aspects, our model includes many layers of complexity (explicit replication, different forks interactions, stochastic 1D replication dynamics, physical constraints at the nuclear level). In addition, addressing this problem in budding yeast offers the great advantage of simultaneously capturing at the same time both the local and global spatio-temporal properties of DNA replication and to focus first only on those aspects and not on the interplay with other mechanisms like A/B compartmentalization (absent in yeast) that may add confusions in the data analysis and comparison with experimental data . Studying such an interplay is a very important and challenging question that, we believe, goes beyond the scope of the present work.

      Validation with Repli-HiC or Time-Resolved Techniques

      The Hi-C data in early S-phase supports the model, but the intensity of replication-specific chromatin interactions is faint, which could be further validated using Repli-HiC, which captures interactions around replication forks. Alternatively, ChIA-PET or HiChIP targeting core component(s) (eg. PCNA or GINS) of replisomes may also solidify the coupling of sister replication forks.

      We thank the reviewer for the suggestion. Unfortunately, corroborating our HiC results using Repli-HiC or HiChIP would require developing and adapting the protocols to budding yeast which is well beyond the scope of this work mainly focused on computational modelling. In addition, we believe that the signature found in our Hi-C data is clear and significant enough to demonstrate the effect.

      However, we included in the Discussion (Page 15) a more detailed description on how our work compares with the Repli-HiC study in mammals. In particular, we added a new supplementary figure (new Fig. S23) where we discuss our prediction on how Repli-HiC maps would appear in yeast in both scenarios of sister-forks interaction. Interestingly, we find that:

      1) Fountain signals are strongly enhanced when sister forks interact.

      2) Only mild replication dependent enrichment is detected when diverging forks do not interact.

      These two results imply that disrupting putative sister-forks interaction would have a drastic effect on Repli-HiC if compared to HiC.

      Interactions Between Convergent Forks

      The study focuses on sister-forks but overlooks convergent forks (forks moving toward each other from adjacent origins), whose coupling has been observed in Repli-HiC. Could the simulation detect the coupling of convergent fork dynamics?

      We thank the reviewer for this suggestion. We included in our Hi-C analysis aggregate plots around termination sites. Interestingly, no clear signature of coupling between convergent forks was detected (such as type II fountains in mammals) in vivo and in silico. Similarly, from visual inspection of individual termination sites, no fountains were clearly observed. These results can be found in the new Fig. S24 and possible mechanistic explanations are described more in detail in the Discussion (Page 15).

      Unexpected Increase in Fountain Intensity in Cohesin/Ctf4 Knockouts.

      In Fig.3A, a schematic illustrating the cell treatment would improve clarity. In Sccl- and Ctf4-depleted cells, fountain signals persist or even intensify (Fig. 3A). This counterintuitive result warrants deeper investigation. Could the authors provide any suggestions or discussions? Potential explanations may include:

      Compensatory mechanisms (e.g., other replisome proteins stabilizing sister-forks).

      Altered chromatin mobility in mutants, enhancing Hi-C signal resolution.

      Artifacts from incomplete depletion (western blots for Sccl/Ctf4 levels should be included).

      A scheme illustrating the experimental protocol for degron systems (CDC45-miniAID & SCC1-V5-AID) with the corresponding western blots and cell-cycle progression are shown in Fig. S26. Note that for Ctf4, we are using a KO cell line where the gene was deleted.

      We do agree with the reviewer that there exist several possible explanations explaining the differences between WT fountains and those observed in mutants. In the revised manuscript, we discussed some of them in Section 2 II B (Page 8):

      (1) As already suggested in the paper, asynchronization of cells may impact the intensity of the fountains due a dilution effect mediated by the cells still in G1. Therefore, possible differences in the fractions of replicating/non-relicating cells between the different experiments (new Fig. S7C) would also result in differences in the signal. Moreover, it is important to highlight that aggregate plots are normalized (Observed/Expected) by the average signal (P(s)). Therefore, as Scc1-depleted cells do not exhibit cohesin-mediated loop-extrusion (see aggregate plots around CARs in new Fig. S7B), we may expect an enhancement of signal at origins due to dividing each pixel by a lower contact frequency with respect to the one found in WT.

      (2) In the new Fig. S10, we plotted the relative enrichment of Hi-C reads around origins. While we already used the same approach to compare replicon sizes between simulations and experiments (see Fig S7A and response to comment n°9 of Reviewer 3), this analysis is instructive also when comparing different experimental conditions. While we find that the experiment in WT and Scc1-depleted cells show very similar replicon sizes, we do observe a small increase in the peak height for the cohesin mutant. This may also partially motivate differences in the intensity of the fountain. For ctf4Δ, we observe significantly smaller replicons. We speculate that such a mutant might exhibit slower replication and consequently might be enriched in sister-forks contacts.

      (3) Compensatory mechanisms: we now briefly discussed this in the Discussion (Page 15).

      Inconsistent Figure References

      Several figure citations are mismatched. For instance, Fig. S1A has not been cited in the manuscript. Moreover, there is no Fig.1E in figure 1, while it has been cited in the text. All figure/panel references must be cross-checked and corrected.

      We thank the reviewer for this observation. We have now corrected the mismatches.

      Minor:

      Page2: "While G1 chromosomes lack of structural features such as TADs or loops [3]" However, Micro-C captures chromatin loops, although much smaller than those in mammalian cells, within budding yeast.

      Loops of approx 20-40 kb are found in interphase in budding yeast but only after the onset of S-phase ( ref. [52-61]). For this reason, our G1 model of yeast without loops well captures the experimental P(s) curves (Fig. S2). See also answer to point 12 of reviewer 2 .

      In figure 2E, chromatin fountain signals can be readily observed in the fork coupling situation and movement can also be observed. However, the authors should indicate the location of DNA replication termination sites and show some examples at certain loci but not only the aggregated analysis.

      The initial use of aggregate plots was motivated by the fact that fountains are quite difficult to observe at the single origin level in the experimental Hi-C due to the strong intensity of surrounding contacts (along the diagonal). However, when dividing early-S phase maps by the corresponding G1 map, we can now observe clear correlation between origin and fountain positions on such normalized maps. We now added an example for chromosome 7 in Fig.3 indicating early/late origins.

      In Fig. S8 and S9 (where we also included termination sites), we show that fountains are prominently found at origins during S-phase and are lost in G2/M.

      Reviewer #2 (Significance (Required)):

      The topic is relevant and the problem being addressed is very interesting. While there has been some earlier work in this area, the polymer simulation approach used here is novel. The simulation methodology is technically sound and appropriate for the problem. Results are novel. The authors compare their simulations with experimental data and explore both interacting and non-interacting replication forks. Most conclusions are supported by the data presented. Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript by D'Asaro et al. investigates the relationship between DNA replication and chromatin organization using polymer simulations. While this is primarily a simulation-based study, the authors also present relevant comparisons with experimental data and explore mechanistic aspects of replication fork interactions.

      We thank Reviewer 2 for her/his positive evaluation of our work and her/his suggestions that help us to clarify many aspects in our manuscript.

      The primary weakness is that many aspects are not clear from the manuscript. Below is a list of questions that the authors must clarify:

      In the Model and Methods section, it is written "Arbitrarily, we choose the backbone to be divided into two equally long arms, in random directions." It is unclear what is meant by "backbone to be divided" and "two equally long arms." Does this refer to replication?

      We agree with the reviewer that the term backbone may be ambiguous. In the context of the initialization of the polymer, it refers to the L/4 initial bonds used to recursively build an unknotted polymer chain of final size L using the Hedgehog algorithm (see refs [101,109]). As shown in the Fig S1A, these initial L/4 bonds define the initial backbone of each chromosome before they are recursively grown to their final size. We chose to divide them into two branches (called “arms” in the old version of the manuscript) of equal length (L/8) and with random orientations. To avoid any ambiguity between the term arm used in that context and the chromosome arms in a biological sense (sequences on the left and right with respect to centromeres), we changed it to “linear branches” to improve clarity. We highlighted in Fig. S1A two examples of such a “V-shaped” backbone.

      As stated in the text, these initial configurations are artificial and just aim to generate unknotted, random structures. After initiating the structures, we then added the geometrical constraints to the centromeric, telomeric and rDNA beads. This, combined with the tendency of the polymer to explore and fill the spherical volume, determine the relaxed G1-like state (see Fig. S2) obtained after an equilibration stage (corresponding to 10^7 MCS). Only after that initialization protocol, DNA replication is activated.

      In chromosome 12, since the length inside the nucleolus (rDNA) is finite, the entry and exit points should be constrained. Have the authors applied any relevant constraint in the model?

      Indeed, we did not introduce any specific constraint on the relative distance between rDNA boundary monomers in our model. They can therefore freely diffuse, independently from each other, on the nucleolus surface. This point is now clarified in the text. Note that, in this paper, we did not aim to finely describe the rDNA organization and its interactions with the rest of the genome, that is why we did not explicitly model rDNA. Moreover, to the best of our knowledge, there is not available experimental data to potentially tune such additional restraints.

      Previous models such as Tjong et al. (ref. [66]) and Di Stefano et al. (ref [67]) have used very similar approximations than us. In the works of Wong et al. (ref.[61]) and Arbona et al. (ref.[63]), rDNA is explicitly modelled via larger/thicker beads/segments, and thus accounts for some generic polymer-based constraints between rDNA boundary elements.

      However, note that all these different models, including ours, still correctly predict the strong depletion of contacts between rDNA boundaries, indicating that there exists a spatial separation between the two boundary elements that is qualitatively well captured by our model (See Fig. S1 D and Fig. 1B).

      What is the rationale for normalizing the experimental and simulation results by dividing by the respective P_intra(s = 10 kb)?

      This normalization was used in Fig. 1 to obtain a rescaling between experiments and simulations. This approach assumes that simulated and experimental Hi-C maps are proportional by a factor that, in Fig 1B, was set to P_exp(s=16kb)/P_sim(s=16kb). Similar strategies are used in a number of modeling studies (for example ref. [103,106]).

      We use the average contact frequency (P_intra) at this genomic scale (s in the order of 10s of kb) because our polymer simulations well capture the experimental P(s) decay above this scale. This method allows to plot the two signals with the same color scale and to give a qualitative, visual intuition on the quality of the modeling. Note that normalization has no impact on the Pearson correlation given in text. More generally, it allows to semi-quantitatively compare predicted and experimental Hi-C data.

      In Fig 1D, we instead normalize the average signal between pairs of centromeres (inter-chromosomal aggregate plot off-diagonal) by the average P_intra(s=10kb). This method allows estimating how frequently centromeres of different chromosomes are in contact relative to intra-chromosomal contacts at the chosen scale (10 kb). In the new paragraph “Comparison with in vivo HiC maps in G1” (Page 22) , we describe more in detail the quantitative insights that can be recovered from such analysis.

      As a comparison, such normalization is not required when computing Observed/Expected maps (Fig. 1C or aggregate plots in Fig. 2 and Fig. 3) as simulation and experimental maps are normalized by their own P(s) curves. We now clarify this aspect in the Materials in Methods under the paragraph “Comparison between on diagonal aggregate plots” (Page 22).

      In the sentence "For instance, chromosomes are strictly bound by the strong potential to localize between 250 and 320 nm from the SPB," is it 320 or 325 nm? Is there a typo?

      We confirm that the upper bound is indeed 325 nm as stated in Eq.2 and not 320 nm.

      Please list the number of beads in each chromosome and the location of the centromere beads.

      A new table (Table S2) was included to highlight beads number and centromere positions.

      In Eq. 7, when the Euclidean distance between the sister forks d_ij > 50 nm, the energy becomes more and more negative. This implies that the preferred state of sister forks is at distances much greater than 50 nm. Then how is "co-localization of sister forks" maintained?

      We corrected the typo sign in Eq.7. The corrected equation without the minus sign - consistently with what simulated - implies that sister forks tend to minimize their 3D distance. The term goes to zero when their distance is within 40 nm (2 nearest-neighbouring sites).

      The section on "non-specific fork interactions" is unclear. You state that the interaction is between "all the replication forks in the system," but f_ij is non-zero only for second nearest-neighbors. The whole subsection needs clarification.

      We corrected the text, specifying that the energy is non-zero for both first and second neighbours. In practice, two given forks do not experience any attractive energy unless their 3D distance is less than 2 nearest-neighbours. To clarify this aspect, we articulated more in the methods how non-specific fork interactions are implemented in the lattice during the KMC algorithm. We also included a new supplementary image (Fig. S15), where we schematize how forks move in 3D and how changes in their position update the table that tracks the number of forks around each lattice site.

      Eq. 6 has no H_{sister-forks}. Is this a typo?

      We confirm that it is a typo and the formula was corrected to H_{sister-forks}.

      While discussing the published work, the authors may cite the recent paper [https://doi.org/10.1103/PhysRevE.111.054413].

      The reference is now included when discussing previous polymer models of DNA replication.

      It is not clear how the authors actually increase the length of new DNA in a time-dependent manner. For example, when a new monomer is added near the replication origin (green bead in Fig. 3C), what happens to the red and blue polymer segments? Do they get shifted? How do the authors take into account self-avoidance while adding a new monomer? These details are not clear.

      The detailed description of the chain duplication algorithm and its systematic analysis was performed in our previous study (ref. [25]).

      However, we agree with the reviewer that to improve self-consistency more details must be included in the present manuscript (see also answer to comment 1 of Reviewer 3). In particular, we now highlight in Materials and Methods that self-avoidance is indeed temporarily broken when we add a newly replicated monomer on top of the site where the fork is. Such double occupancy in the lattice rapidly vanishes due to 3D local moves. We refer to our PRX work (ref [25] and in particular to the following figure (extracted from FIG. S1 in ref.[25]) which illustrates how the bonds/segments of the two sister chromatids are consistently maintained.

      How do the authors ensure that monomers get added at a rate corresponding to velocity v? The manuscript mentions "1 MCS = 0.075 msec," but in how many MC steps is a new monomer added? How is it decided?

      Similarly to origin firing, replication by fork movement along the genome occurs stochastically, with a rate which we derive by converting the physiological fork speed in yeast 2.2 kb/min (ref. [41]) into a rate in (number of monomer/MCS) units. In practice, we generate a random number that, if smaller than such a rate, leads to forks duplication. We clarify this aspect in the Materials and Methods, also referring to our previous work for a more detailed summary.

      The authors stress the relevance of loop extrusion. However, in their polymer simulation, the newly replicated chromatin does not form any loops. Is this consistent with what is known?

      Indeed, our simulations do not have any concurrent extrusion mechanism such as cohesin-mediated loops. This choice was purposely made to isolate and characterize replication-dependent effects.

      That is why we compare our predictions on chromatin fountain patterns (Fig. 3) with data obtained for the Scc1 mutant strain where cohesin is absent in order to disentangle the possible interference with loop-extruding cohesin. For subsection C where microscopy data are available only in WT condition, we cannot rule out that the observed discrepancies between experiments and predictions cannot be due to missing mechanisms including loop extrusion. It was already mentioned in the Discussion (Page 16). It is however unclear whether sparse and small loops between CARs (see Fig. S7B) in S-phase, could be sufficient to recapitulate the microscopy estimates on the sizes of replication foci and no clear signature of inter-origin loops (possibly mediated by loop extrusion) are observed in Hi-C data in WT and Scc1 deficient conditions.

      Moreover, as mentioned in the Discussion, the poorly characterized mechanisms behind forks/extruding-cohesin encounters does not allow for a straightforward modelling of such processes whose accurate description/simulation would require its own study.

      Please add a color bar to Fig. 4B.

      The color bar was included.

      In the MSD plot (Fig. 6), even though it appears to be a log-log plot, the exponents are not computed. Typically, exponents define the dynamics.

      We plot the expected 0.5 exponent at smaller time-scales as mentioned in the main text in Fig. 6, previously included only in new Fig. S19A.

      The dynamics will depend on the precise nature of interactions, such as the presence or absence of loop extrusion. If the authors present dynamics without extrusion, is it likely to be correct?

      The reviewer is correct in highlighting how our model does not capture the potential decrease in dynamics due to cohesin mediated loop extrusion. However, our model does capture the expected Rouse regime (see Fig. 6A, S19A and ref [83]), which justify our timemapping strategy. In comment 16 of reviewer 3, we discuss more in detail the robustness of our results with respect to variation in such a mapping. In the specific context of Fig. 6A, we predict the gradual decrease in dynamics due to sister chromatids intertwining independently of any cohesin-associated activity (both loop-extruding and cohesive). As loop extrusion is also decreasing chromatin mobility overall (ref. [87]), if such a decrease in mobility is observed in WT in vivo, it may be indeed difficult to assign such a decrease to replication rather than loop extrusion. That is why in the Discussion (Page 16), we propose to compare our prediction to experiments in cohesin-depleted cells. In the context of Fig.6B&C, we don’t expect loop extrusion to be a confounding effect as the predicted decrease in dynamics is specific to forks.

      Reviewer #3 (Significance (Required)):

      The work has been conducted thoroughly, and in general the paper is well written with good attention to detail. As far as I am aware, this is the first study where replication is simulated in a whole nucleus context, and the scale of the simulations is impressive. This allows the authors to address questions on replication foci and the spatiotemporal organisation of replication which would not be possible with more limited simulations, and to compare the model with previous experimental work. This, together with the new HiC data, I think this makes this a strong paper which will be of interest to biophysics and molecular biology researchers; the manuscript is written such that it would suit an interdisciplinary basic research audience.

      We thank Reviewer 3 for her/his enthusiasm and her/his comments that help us to greatly improve the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The paper "Genome-wide modelling of DNA replication in space and time confirms the emergence of replication specific patterns in vivo in eukaryotes" by D'Asaro et. al presents new computational and experimental results on the dynamics of genome replication in yeast. The authors present whole-nucleus scale simulations using a kinetic Monte Carlo polymer physics model. New HiC data for synchronised yeast samples with different protein knock-downs are also presented.

      The main questions which the paper addresses are whether sister forks remain associated during replication, whether there is more general clustering of replication forks, and whether replication occurs in a 'spatial wave' through the nucleus. While the authors' model data are not able to conclusively show whether sister forks remain co-localised, the work provides some important insights which will be of high interest to the field.

      I have no major issues with the paper, only some minor comments and suggestions to improve the readability of the manuscript or provide additional detail which will be of interest to readers. I list these here in the order in which they appear in the paper. There are also a number of typos and grammatical issues through the text, so I recommend thorough proofreading.

      The paper seems to be aimed at a broad interdisciplinary audience of biophysicists and molecular biologists. For this reason, the introduction could be expanded slightly to include some more background on DNA replication, the key players and terminology. Also, it seems that this work builds on previous modelling work (Ref. 19), so a bit more detail of what was done there, and what is new here would be helpful. The final paragraph the introduction mentions chromosome features such as TADs and loops, which should be explained in more detail.

      We now have expanded the introduction to address some of these aspects. In particular, also as a response to comment 1 of Reviewer 4, we included additional background on the eukaryotic replication time program. We address in more detail its known interplay and correlation with crucial 3D structural features such as compartments and TADs. Finally, we add a sentence to clarify how the current work is distinct from the prior implementation and the novelty introduced here.

      In the first results section, end of p2, the "typical brush-like architecture" is mentioned. This is not well explained, some additional detail or a diagram might help.

      As very briefly summarized in the mentioned paragraph, the yeast genome is organized in the so-called Rabl organization where chromosome arms are all connected via the centromeres at the Spindle Pole Body (SPB). This is analogous to the definition of a polymer brush where several branches (the arms in this case), are grafted to a surface or to another polymer (see new Inset panel in Fig S1B). We refer in the main text to the scheme in Fig. S1B where we also include the snapshot of a single chromosome and the physical constraints that characterize this large-scale organization and extend the caption to clarify the analogy. A typical emerging feature at the single chromosome level is described in Fig. 1 B and C.

      On p3-4, some previous work is described, with Pearson correlations of 0.86 and 0.94 are mentioned. What cases these two different values correspond to is not clear.

      These Pearson correlations are obtained for our own modeling. We correct the values in the main text and more clearly indicate the specific correspondence with the maps used. We describe now in the Materials and Methods (new paragraph “Comparison with in vivo HiC maps in G1” and Table S2) how these values were obtained.

      In section II-A-2, on the modelling details, it should be made clearer that the nucleus volume is kept constant, and that this is an approximation since typically the nucleus grows during S-phase. This is discussed in the Methods section, but it would be useful to also mention it here (and give some justification why it will not likely change the results).

      We now state more clearly in the main text the limitation of our model regarding the doubling of DNA content without any increase of nuclear size. As mentioned in the Discussion, we do not expect this approximation to strongly impact our results, which mainly focus on early S-phase.

      We now also included in the Discussion how the detection of the “replication wave” should be qualitatively independent of the density regime. In fact, even in the case of growing nuclei and constant density, the polarity induced by the Rabl organization and replication timing are the main drivers of such fork redistribution.

      Regarding the slowdowning in diffusion due to sister chromatids intertwinings (see response to comment 13), we instead verified that the effect is indeed density independent (new Fig S21).

      Fig 2. The text in Fig 2B is much smaller than other panels and difficult to read. Also Fig 3B, Fig 6.

      This is now corrected.

      In 2E, are the times given above each map the range which is averaged over? This could be clearer in the caption. In the caption it stated that these are 'observed over expected'; what the 'expected' is could be clearer.

      We reformulate the description in the caption to make clearer that the time indicated above the plots indicate the time window used for the computation. As mentioned more in detail in the response to comment 17 below (and comment 3 of Reviewer 2), we included in the Material and Methods a more precise description on the normalization used in the case of on-diagonal aggregate plots (observed-over-expected).

      In section II-B-2, the authors state that the cells are fixed 20 mins after release from S-phase. Can they comment on the rationale behind this choice, since from Fig 2 their simulations predict that the fountain pattern will no-longer be visible by that time.

      In the experimental setup, cells are arrested in G1 with alpha-factor and then released in S-phase (see Fig S26 with corresponding scheme). The release from G1 synchronisation is not immediate, and staging of cells by flow-cytometry every 5 minutes for 30 minutes after release (data not shown in the main text but provided below) proved 20 minutes to be an adequate early S-phase timepoint (Page 17 in the Materials and Methods). As a consequence, the times indicated when describing the in vivo experiment, do not correspond to the ones indicated in our in silico system, for which the onset of replication is well defined. For these reasons, we have to determine which time window among the ones used in Fig 2E, is the most appropriate to compare with the experiment (see response to comment 9 for more details).

      Fig.R1: Cell cycle progression monitored by flow cytometry after the release. For the first 15 minutes, cells are still mainly in G1 and only start replicating ~20 minutes after the release.

      Section II-B-2(b) could be clearer. I don't understand what the conclusion the authors take from the metaphase arrest maps is. I'm not sure why they discuss again the Cdc45-depleted cells here, since this was already covered in the previous section.

      Taken together, the G1, Cdc20 (metaphase-arrested cells), and Cdc45-depleted (early S cells but not replicated) conditions suggest that fountains reflect ongoing replication. Namely, G1-arrest shows that fountains require S-phase entry; Cdc45-depletion shows that fountains require origin firing and is not due to another S-phase event; and metaphase-arrested cells show that fountains are not permanent structures established by replication, but a transient replication-dependent structure.

      This demonstrates that the emerging signal is not trivially dependent on (1) the presence of the second sister chromatids; or on (2) potential overlaps between origin positions and barriers (CARs) to loop extrusion (see also comment 12 of Reviewer 2). A sentence at the end of II-a was added to clarify the different information gained with the two strains.

      We discuss again the cdc20 and cdc45 mutants in II-b to highlight how the results in II-a do not exclude potential interplay between cohesin-mediated loop-extrusion in presence forks progression. These considerations motivated our experiment in Scc1-depleted cells during early S-phase.

      At the start of p8 (II-B-3) there is a discussion of the mapping to times to the early-S stage experiments. This could have more explanation. I don't follow what the issue is, or the process which has been used to do the mapping. From Fig 2B, it seems that the simulation time is already mapped well to real time.

      As mentioned above in comment 7, we cannot clearly define a “t=0” when replication starts in vivo as the release from the G1-arrest is not immediate and perfectly synchronous. On the other hand, the times indicated within the text are those following the onset of polymer self-duplication in our simulations. Note that the mean replication time (MRT) shown in Fig.2B does not represent an absolute time, but rather an average relative timing along S-phase (signal rescaled between 0 and 1).

      For all these considerations, we think that the most reliable strategy to compare fountains in vivo and in silico is to look at the replicon size via the enrichment in raw contacts around early origins, as illustrated in Fig S7A. In practice, looking at the relative counts of contacts around early origins we have a proxy for the average replicon size that we can match by computing the same analysis on simulated signals (Fig S7A). As a result, we find that the best simulated time window is between 5 and 7.5 minutes, compatible with early-S phase and with an approximate duration of G1 after release of 15 minutes as observed in other studies (ref. [61]).

      Note that our conclusions are robust with respect to modulating this mapping method. In particular in Fig. S7, we thoroughly investigated how several confounding factors (such as time window used or partial synchronization) may impact the quantitative nature of our prediction without affecting the qualitative insights.

      We included a more precise reference to the Supplementary Materials, where the approach is described and clarified.

      In Fig 4A above each plot there is a cartoon showing the fork scenario. The left-hand cartoon is rendered properly, but the right-hand one has overlapping black boxes which I don't think should be there. These black boxes are present in many other figures (4B, 3B, 2E etc).

      This issue seems to appear using the default PDF viewer on Mac OS. We have corrected the problem and no more black boxes should appear in the main text and in the Supplementary Material.

      In II-C-2(b) it is mentioned that the number of forks within RFis is always assumed to be even. This discussion could be clearer. In particular, the authors state that under both fork scenarios, in the simulations they can detect odd numbers of forks within RFis - how can this happen in the case where sister forks are held together?

      We included a more accurate description in the main text about why Saner et al. (ref [20]) make these assumptions in their estimates. We highlight possible inconsistencies such as the presence of termination events which, in our formalism, break sister forks interactions and lead to single forks to be detected. We also clarify the latter point when describing Fig 5B and describe in more detail replication bubbles merging events in the Materials and Methods.

      Fig 6B and C, it would be useful if the same scale was used on both plots.

      We now use the same scale when plotting Fig 6B and C.

      Section II-D-1. There is a discussion on the presence of catenated chains; I did not understand how the replicated DNA becomes catenated, and what this actually means in this context. The way the process is described and the snapshots in Fig2C do not suggest that the chains are catenated. Some further discussion or a diagram would be useful here.

      We included a small paragraph to better explain how intertwining of sister chromatids occurs, and more clearly refer to a snapshot in supplementary figure S19D (Page 14). As correctly mentioned by the reviewer, replication bubbles by construction are always unknotted during their growth (see example in Fig. 2C). As we thoroughly characterize in our previous work (ref. [25]), when several replication bubbles merge, the random orientation of sister chromatids potentially lead to catenation points and intertwined structures. We show below a scheme from our previous work (ref [25]). While in this past work, we demonstrated that the center of mass of the two sister chromatids show subdiffusive behaviour due to the additional topological constraints of their intertwining, this new analysis in the present work suggests that possible effects may also be observed when tracking the MSD (mean square displacement at the locus level) in a more realistic scenario where we included correct replication timing, chromosome sizes and Rabl-organization.

      On p14 (section III) there is a section discussing possible mechanisms for sister fork interactions, and that result that Ctf4 might not play a role in this, as previously suggested. Are there any other candidate proteins which could be tested in the future?

      To the best of our knowledge, there is no other candidate protein of the replisome that has been directly associated to sister-fork pairing in previous studies (as Ctf4). However, components of the replisome such as Cdt1, that have the capacity to oligomerize/self-interact, could be good candidates. We now mention this possibility in the Discussion (Page 15).

      As on p14, second paragraph: there is a sentence "replication wave [51] cannot be easily visualised at the single cell level.", which seems to contradict the discussion on p9 "such a "wave" can also be observed at the level of an individual trajectory (Video S3,4) even if much more stochastic." I think more explanation is needed here.

      We rephrased the mentioned passages to clarify the differences in detecting such “replication wave” at the population vs single cell level. In video S3 and S4, we can still observe an enrichment of forks at the SPB and later in S-phase a shift towards the equatorial plane. However, the stochasticity of polymer dynamics and 1D replication strongly hinder the ability to clearly visualize such redistribution.

      In the methods section, p18, it is mentioned that the volume fraction is 3%. I assume this is before replication, and so after replication is complete this will increase to 6%. This should be stated more explicitly, with also a comment on the 5% volume fraction used in the time-scale mapping discussed on p17.

      Indeed, we choose to map the experimental MSD measured in ref [83] by simulating a homopolymer 5% volume fraction and in periodic boundary conditions for consistency to previous work in the group (ref. [102-106]) and our previous replication model (ref.[25]). Moreover, this intermediate density regime also lies in between the minimal (3%) and maximal (6%) densities present in our system. When redoing the time mapping with the G1 MSD plotted in Fig 6A and new Fig S19A, we obtain a very similar value of approx. 1MC=0.6ms. Note that the time mapping aims to obtain a rough estimation of real times as several factors, such as active processes, non-constant density, cell-cycle progression may all contribute to chromatin diffusion in vivo (see also comment 15 to Reviewer 2). In the context of our formalism, differences in time mapping do not affect the 1D replication dynamics as all the parameters to model the 1D process are rescaled by the same factor. Moreover, as we characterized in more depth in our previous work (ref [25]), a crucial aspect that defines self-replicating polymers is the relationship between fork progression and the polymer relaxation dynamics. In physiological conditions, we remain in the regime where forks progress almost quasi-statically to allow the bubbles to re-equilibrate. Therefore, small discrepancies in the time mapping will not modify this regime and our results should remain robust.

      On p20, processing of simulated HiC using cooltools is discussed. For readers unfamiliar with this software, a bit more detail should be given. Specifically, how does the normalisation account for having some segments which have been replicated and some which have not. Later on the same page (IV-C-2) two different strategies for comparing HiC maps are given; why are two different methods required, and what is the reasoning in each case?

      In the raw - unbalanced - data, we observe an artificial increase in contacts around origins in S-phase for both simulation and experiments. This is simply due to the presence of the second Sister chromatids and the fact that contacts between distinct DNA segments are mapped to a single bin.

      In the new Fig. S25, we illustrate this effect by computing aggregate plots around early origins using single-chromosome simulations. We demonstrate that the ICE normalization corrects for the variations in copy number due to replication and thus for such artificial increases in contacts during S-phase. We show that such a normalization is equivalent to explicitly divide each bin by the average copy-number of the corresponding segments.

      We have now included a sentence in the Materials and Methods to clarify this. Moreover, a detailed description of the other alternative strategies used to compare experiments and simulations were presented in response to comment 3 to Reviewer 2 and two new paragraphs were added in the Materials and Methods.

      The references section has an unusual formatting with journal names underlined.

      We updated the formatting.

      Reviewer #4 (Significance (Required)):

      D’Asaro et al focus on the problem of how genome structure is altered by the progression of replisomes through S-phase in the budding yeast S. cerevisiae. The authors employ computational polymer modeling of G1 chromosomes, then implement a hierarchical model of replication origin firing along these polymers to examine how the G1 chromosome structural state is perturbed by replisome progression. Their results indicate that replication origins create 'fountains' - Hi-C map features that other groups have demonstrated are likely to originate from symmetric extrusion by condensin / cohesin complexes originating at a fixed point. These 'fountains' appear to be cohesin-independent, as revealed by depletion Hi-C experiments. Finally, the authors provide evidence from their model of a 'replication wave' that emanates from the spindle pole body. This is an interesting manuscript that raises some exciting questions for the field to follow up on.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, "Genome-wide modeling of DNA replication in space and time confirms the emergence of replication specific patterns in vivo in eukaryotes," authors Asaro et al perform computational modeling analyses to address an important open question in the chromatin field: how is DNA replication timing coupled to 3D genome architecture? Over the past ten years, the convergence of high-resolution replication timing (RT) analysis with high-resolution 3D genome mapping (e.g. 'Hi-C' technology) has resulted in the discovery that replication timing domains overlap considerably with 3D genomic domains such as topologically associating domains (TADs). How and why this happens both remain unknown, and advances in 3D genome mapping technology have provided even more data to model the problem of both 1) scheduling replication from distinct series of origins / initiation zones, and 2) modeling how 3D genome architecture is altered by the progression of replication forks, which inherently destroy chromatin structure before faithfully reforming G1 structures on daughter chromatids. As such, the problem being tackled by this computational manuscript is interesting.

      We thank Reviewer 4 for her/his positive evaluation of our work and her/his comments that help us to greatly improve the manuscript.

      Reviewer Comments / Significance

      In their manuscript, "Genome-wide modeling of DNA replication in space and time confirms the emergence of replication specific patterns in vivo in eukaryotes," authors D’Asaro et al perform computational modeling analyses to address an important open question in the chromatin field: how is DNA replication timing coupled to 3D genome architecture? Over the past ten years, the convergence of high-resolution replication timing (RT) analysis with high-resolution 3D genome mapping (e.g. 'Hi-C' technology) has resulted in the discovery that replication timing domains overlap considerably with 3D genomic domains such as topologically associating domains (TADs). How and why this happens both remain unknown, and advances in 3D genome mapping technology have provided even more data to model the problem of both 1) scheduling replication from distinct series of origins / initiation zones, and 2) modeling how 3D genome architecture is altered by the progression of replication forks, which inherently destroy chromatin structure before faithfully reforming G1 structures on daughter chromatids. As such, the problem being tackled by this computational manuscript is interesting.

      D’Asaro et al focus on the problem of how genome structure is altered by the progression of replisomes through S-phase in the budding yeast S. cerevisiae. The authors employ computational polymer modeling of G1 chromosomes, then implement a hierarchical model of replication origin firing along these polymers to examine how the G1 chromosome structural state is perturbed by replisome progression. Their results indicate that replication origins create 'fountains' - Hi-C map features that other groups have demonstrated are likely to originate from symmetric extrusion by condesin / cohesin complexes originating at a fixed point. These 'fountains' appear to be cohesin-independent, as revealed by depletion Hi-C experiments. Finally, the authors provide evidence from their model of a 'replication wave' that emanates from the spindle pole body. This is an interesting manuscript that raises some exciting questions for the field to follow up on.

      Major Comments

      There is a tremendous amount of work coupling RT domains to 3D genome architecture, especially deriving from the ENCODE and 4D Nucleome consortia. These studies are not adequately highlighted in the introduction and discussion of this manuscript, and this treatment of the literature would ideally be amended in any revised manuscript.

      We include new sentences in the introduction to discuss more in detail the correlation between 3D genome architecture and replication timing program, and advancement in this field in the last decades. We also included additional citations to reviews and publications (ref [8-16]). These references were also included at the end of the Discussion where we address the exciting perspective of employing our model in higher eukaryotes and potentially tackle the complex interplay between 3D nuclear compartmentalization and replication dynamics (see also response 1 to Reviewer 1).

      S. cerevisiae origins of replication differ from metazoan origins of replication in that they are sequence-defined and are known to fire in a largely deterministic pattern (see classic study PMID11588253). From the methods of the authors it is not clear that the known deterministic firing pattern is being used here, but instead a stochastic sampling method? Please clarify in the manuscript. Specifically, it would be good to understand how the Initiation Probability Landscape Signal correlates with what is already known about origin firing timing.

      In our model, the positions of origins are stochastically sampled proportionally to the IPLS which was inferred directly from experimental MRT (ref. [63]) and RFD (ref. [44]). This modeling approach allows reproducing with a very high accuracy the known replication timing data (correlation of 0.96) and Fork directionality data (correlation of 0.91) (see ref. [71]). Origins were defined as the peaks in the IPLS signal. In Fig S3, we extensively compare these origins and the known ARS positions from the Oridb database. For example, most of our early origins (96%) are located close to known, confirmed ARS. Moreover, even if our algorithm is stochastic for origin firing, we remark that each early origin will fire in 90 % of the simulations, coherent with the quasi-deterministic pattern of origin firing and experimental MRT and RFD data. We now have added such statistics of firing in the revised manuscript (Page 4).

      It seems possible that experimental sister chromatid Hi-C data (PMID32968250) and nanopore replicon data (PMID35240057) could be used to further ascertain the validity of some of the findings of this paper. Specifically, could the authors demonstrate evidence in sister chromatid Hi-C data that the replisome is in fact extruding sister chromatids? Moreover, are the interactions being measured specifically in cis (as opposed to trans sister contacts)? For the nanopore replicon data, how do replicon length, replication timing, and position along the replication 'wave' correlate?

      We thank the reviewer for the suggestions.

      Hopelessly there is currently no Sister-C data available during S-phase. In the seminal study (PMID32968250), cells were arrested in G2/M via nocodazole treatment. For a different unpublished work, we already analysed in detail the SisterC dataset and we did not observe clear fountain-like signature, consistent with our own G2/M Hi-C maps (cdc20) where fountains were absent. Note that, in the present work, in order to compare our predictions with standard HiC data, we included all contacts (cis and trans chromatids), mapping pairwise contacts from distinct replicated sequences/monomers to a single bin (see also response to comment 17 to Reviewer 3 and new Fig. S25).

      We now mention in the Discussion that Sister-C data during S-phase could help monitoring the role of replisomes on relative sister-chromatids organization (Page 15).

      Main results from the nanopore replicon data study include the observed high symmetry between sister forks and their linear progression, as the density of replicons appears to be uniform with respect to their length. Since these two specific constraints are already present in the framework of Arbona et al. (ref. [63]), our model is able to reproduce these features of DNA replication captured by the nanopore data.

      Moreover, as we model with very high accuracy replication timing data (see response to comment 2) and forks positioning, we can assume that our formalism well captures replicon positioning and lengths observed in vivo.

      As this study does not include any additional exploration or variation of the parameters inferred by Arbona et al. (ref. [63]), we consider a quantitative comparison with the nanopore replicon data to be beyond the scope of this paper.

      Minor Comments:

      The paper is in most places easy to follow. However, Section C bucked this trend and in general was quite difficult to follow. We would recommend that the authors try to revise this section to make clearer the actual physical parameters that govern a 'replication wave' and the formation of replication foci - how many forks, the extent to which the sisters are coordinated, etc for early vs. late replicating regions.

      We now state more clearly with a sentence in the main text the driving forces behind the formation of such a “replication wave”. We believe that the several additions and clarifications following the various comments, improved the clarity of the manuscri

    1. Reviewer #1 (Public review):

      Munday, Rosello, and colleagues compared predictions from a group of experts in epidemiology with predictions from two mathematical models on the question of how many Ebola cases would be reported in different geographical zones over the next month. Their study ran from November 2019 to March 2020 during the Ebola virus outbreak in Democratic Republic of the Congo. Their key result concerned predicted numbers of cases in a defined set of zones. They found that neither the ensemble of models nor the group of experts produced consistently better predictions. Similarly, neither model performed consistently better than the other, and no expert's predictions were consistently better than the others'. Experts were also able to specify other zones in which they expected to see cases in the next month. For this part of the analysis, experts consistently outperformed the models. In March, the final month of the analysis, the models' accuracy was lower than in other months, and consistently poorer than the experts' predictions.

      A strength of the analysis is use of consistent methodology to elicit predictions from experts during an outbreak that can be compared to observations, and that are comparable to predictions from the models. Results were elicited for a specified group of zones, and experts were also able to suggest other zones that were expected to have diagnosed cases. This likely replicates the type of advice being sought by policymakers during an outbreak.

      A potential weakness is that the authors included only two models in their ensemble. Ensembles of greater numbers of models might tend to produce better predictions. The authors do not address whether a greater number of models could outperform the experts.

      The elicitation was performed in four months near the end of the outbreak. The authors address some of the implications of this. A potential challenge for the transferability of this result is that the experts' understanding of local idiosyncrasies in transmission may have improved over the course of the outbreak. The model did not have this improvement over time. The comparison of models to experts may therefore not be applicable to early stages of an outbreak when expert opinions may be less well-tuned.

      This research has important implications for both researchers and policy-makers. Mathematical models produce clearly-described predictions that will later be compared to observed outcomes. When model predictions differ greatly from observations, this harms trust in the models, but alternative forms of prediction are seldom so clearly articulated or accurately assessed. If models are discredited without proper assessment of alternatives then we risk losing a valuable source of information that can help guide public health responses. From an academic perspective, this research can help to guide methods for combining expert opinion with model outputs, such as considering how experts can inform models' prior distributions and how model outputs can inform experts' opinions.

      Comments on revisions:

      I am grateful to the authors for their responses to my previous comments. I think their updates have made the paper much clearer. I do not think the updates change the opinions already given in the public review so I have not modified it.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Munday, Rosello, and colleagues compared predictions from a group of experts in epidemiology with predictions from two mathematical models on the question of how many Ebola cases would be reported in different geographical zones over the next month. Their study ran from November 2019 to March 2020 during the Ebola virus outbreak in the Democratic Republic of the Congo. Their key result concerned predicted numbers of cases in a defined set of zones. They found that neither the ensemble of models nor the group of experts produced consistently better predictions. Similarly, neither model performed consistently better than the other, and no expert's predictions were consistently better than the others. Experts were also able to specify other zones in which they expected to see cases in the next month. For this part of the analysis, experts consistently outperformed the models. In March, the final month of the analysis, the models' accuracy was lower than in other months and consistently poorer than the experts' predictions. 

      A strength of the analysis is the use of consistent methodology to elicit predictions from experts during an outbreak that can be compared to observations, and that are comparable to predictions from the models. Results were elicited for a specified group of zones, and experts were also able to suggest other zones that were expected to have diagnosed cases. This likely replicates the type of advice being sought by policymakers during an outbreak. 

      A potential weakness is that the authors included only two models in their ensemble. Ensembles of greater numbers of models might tend to produce better predictions. The authors do not address whether a greater number of models could outperform the experts. 

      The elicitation was performed in four months near the end of the outbreak. The authors address some of the implications of this. A potential challenge to the transferability of this result is that the experts' understanding of local idiosyncrasies in transmission may have improved over the course of the outbreak. The model did not have this improvement over time. The comparison of models to experts may therefore not be applicable to the early stages of an outbreak when expert opinions may be less welltuned. 

      This research has important implications for both researchers and policy-makers. Mathematical models produce clearly-described predictions that will later be compared to observed outcomes. When model predictions differ greatly from observations, this harms trust in the models, but alternative forms of prediction are seldom so clearly articulated or accurately assessed. If models are discredited without proper assessment of alternatives then we risk losing a valuable source of information that can help guide public health responses. From an academic perspective, this research can help to guide methods for combining expert opinion with model outputs, such as considering how experts can inform models' prior distributions and how model outputs can inform experts' opinions. 

      Reviewer #2 (Public review):

      Summary: 

      The manuscript by Munday et al. presents real-time predictions of geographic spread during an Ebola epidemic in north-eastern DRC. Predictions were elicited from individual experts engaged in outbreak response and from two mathematical models. The authors found comparable performance between experts and models overall, although the models outperformed experts in a few dimensions. 

      Strengths: 

      Both individual experts and mathematical models are commonly used to support outbreak response but rarely used together. The manuscript presents an in-depth analysis of the accuracy and decision-relevance of the information provided by each source individually and in combination. 

      Weaknesses: 

      A few minor methodological details are currently missing.

      We thank the reviewers for taking the time to consider our paper and for their positive reflections and suggestions for our study. We recognise and endorse their characterisation of the study in the public reviews and are greatful for their interest and support for this work. 

      Reviewer #1 (Recommendations For The Authors): 

      I initially found Table 1 difficult to interpret. In the final two columns, the rows relate to each other but in the other columns, rows within months don't relate to each other. Could this be made clearer? 

      Thank you for your helpful suggestion. We agree that this is a little confusing and have now added vertical dividers to the table to indicate which parts of the table relate to each other.

      In Figure 1A, the colours are the same as in the colour-bar for Figure 1B but don't have the same meaning. Could different colours be used or could Figure 1A have its own colour-bar to aid clarity? 

      Thank you for your query. The colours are not the same pallette, but we appreciate that they look very similar. To help the reader we have changed the colour palette of panel A and added a legend to the left.  

      In Figure 3, can labels for each expert be aligned horizontally, rather than moving above and below the timeline each month? 

      Thank you for your perspective on this. We made the concious dicision to desplay the experts in this way as it allows the timeline to be presented in a shorter horizontal space. We appreciate that others may prefer a different design, but we are happy with this one. 

      On lines 292 and 293, the authors state that experts were less confident that case numbers would cross higher thresholds. It seems that this would be inevitable given the number of cases is cumulative. Could this be clarified, please? 

      Thank you for raising this point. We agree that this wording is confusing. We have now reworked the entire section in response to another reviewer. The equivalent section now reads: 

      Experts correctly identified Mabalako as the highest-risk HZ in December. They attributed an average 82% probability of exceeding 2 cases; Mabalako reported 38 cases that month, exceeding all thresholds, although the probability assigned to exceeding the higher thresholds was similar to that of Beni (3 cases)

      Reviewer #2 (Recommendations For The Authors): 

      (1) Some methodological details seem to be missing. Most importantly, the results present multiple ensembles (experts, models, and both), but I can't seem to find anywhere in the Methods that details how these ensembles are calculated. Also, I think it would be useful to define the variables in each equation. It would have been easier to connect the equations to the description if the variables were cited explicitly in the text. 

      Thank you for pointing out these omissions. We have included the following paragraph to detail how ensemble forecasts were calculated. 

      “Enslemble forecasts

      Ensemble forecasts were calculated as an average of the probabilities attributed by the members of the ensemble. For the expert ensemble the arithmetic mean was calculated across all experts with equal weighting. Similarly the model ensemble used the unweighted mean of the model forecasts. For the mixed (model and expert) ensemble, the mean was weighted such that the combined weight of the experts forecasts and the combined weight of the models forecasts were equal.”

      (2) Overall, I think the results provide a strong analysis of model vs. expert performance. However, some sections were highly detailed (e.g., the text usually discusses results for every month and all health zones), which clouded my ability to see the salient points. For example, I found it difficult to follow all the details about expert/model predictions vs. observations in the "Expert panel and health zones..." subsection; instead, the graphical illustration of predictions vs. observations in Figure 4 was much easier to interpret. Perhaps some of these details could be trimmed or moved to the supplementary material. 

      Thank you for your honest feedback on this point. We have shortened this section to highlight the key points that we feel are the most important. We have also simplified the text where we discuss the health zones nominated by experts. 

      (3) Figure 5C is a nice visualization of the fallibility of relying on a single individual expert (or model). I wonder if it would be useful to summarize these results into the probability that a randomly selected expert outperforms a single model. Is it the case that a single expert is more unreliable than a single model? The discussion emphasizes the importance of ensembles and compares a single model to an ensemble of experts, but eliciting predictions from multiple experts may not always be possible. 

      Thank you for raising this. We agree that this is an important point that eliciting expert opinions is not a trivial task and should not be taken for granted. We agree with the principle of your suggestion that it would be useful to understand how the models compare to indevidual experts. We don’t however believe that an additional analysis would add sufficiently more information than already shown in Figure 5, which already displays the full distribution of indevidual experts for each month and threshold. If you would like to try this analysis yourself, the relevant data (the indevidual score for each combination of expert, threshold, heal zone and month) is included in the github repo (https://github.com/epiforecasts/Ebola-Expert-Elicitation/blob/main/outputs/indevidual_results_with_scores.csv).

      Minor comments: 

      (1) Figure 2: the color scales in each panel are meant to represent different places, correct? The figure might be easier to interpret if the colors used were different.  

      Thank you for bringing this to our attention. We have now changed the palette of panel A to differ from panel B.  

      (2) Equation 7: is o(c>c_thresh) meant to be the indicator function (i.e. 1 if c>c_thresh) and 0 otherwise)? 

      Thanks for raising this. The function o is the same as in the previous equation – an observation count function. We appreciate that this is not immediately clear so have added a sentence to explain the notation after the equation.

      (3) Table 1: a brief description of the column headers would be useful.  

      Thank you for the suggestion. We have now extended the table caption to include more description of the columns. 

      “Table 1: Experts and health zones included in each round of the survey. The left part of the table details the experts interviewed (highlighted in green) the health zones included in the main survey in each month. In addition, the right part of the table details the health zones nominated by experts and the number of experts that nominated each one.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This study investigates how ant group demographics influence nest structures and group behaviors of Camponotus fellah ants, a ground-dwelling carpenter ant species (found locally in Israel) that build subterranean nest structures. Using a quasi-2D cell filled with artificial sand, the authors perform two complementary sets of experiments to try to link group behavior and nest structure: first, the authors place a mated queen and several pupae into their cell and observe the structures that emerge both before and after the pupae eclose (i.e., "colony maturation" experiments); second, the authors create small groups (of 5,10, or 15 ants, each including a queen) within a narrow age range (i.e., "fixed demographic" experiments) to explore the dependence of age on construction. Some of the fixed demographic instantiations included a manually induced catastrophic collapse event; the authors then compared emergency repair behavior to natural nest creation. Finally, the authors introduce a modified logistic growth model to describe the time-dependent nest area. The modification introduces parameters that allow for age-dependent behavior, and the authors use their fixed demographic experiments to set these parameters, and then apply the model to interpret the behavior of the colony maturation experiments. The main results of this paper are that for natural nest construction, nest areas, and morphologies depend on the age demographics of ants in the experiments: younger ants create larger nests and angled tunnels, while older ants tend to dig less and build predominantly vertical tunnels; in contrast, emergency response seems to elicit digging in ants of all ages to repair the nest.

      We sincerely thank Reviewer #1 for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we have provided a point-by-point response to address them.

      Reviewer #2 (Public review):

      I enjoyed this paper and the approach to examining an accepted wisdom of ants determining overall density by employing age polyethism that would reduce the computational complexity required to match nest size with population (although I have some questions about the requirement that growth is infinite in such a solution). Moreover, the realization that models of collective behaviour may be inappropriate in many systems in which agents (or individuals) differ in the behavioural rules they employ, according to age, location, or information state. This is especially important in a system like social insects, typically held as a classic example of individual-as-subservient to whole, and therefore most likely to employ universal rules of behaviour. The current paper demonstrates a potentially continuous age-related change in target behaviour (excavation), and suggests an elegant and minimal solution to the requirement for building according to need in ants, avoiding the invocation of potentially complex cognitive mechanisms, or information states that all individuals must have access to in order to have an adaptive excavation output.

      We sincerely thank reviewer #2 for the time and effort dedicated to our manuscript's detailed review and assessment. We have provided a point-by-point response to the reviewer's comments, which we have incorporated into the revised version of the manuscript.

      The only real reservation I have is in the question of how this relationship could hold in properly mature colonies in which there is (presumably) a balance between the birth and death of older workers. Would the prediction be that the young ants still dig, or would there be a cessation of digging by young ants because the area is already sufficient? Another way of asking this is to ask whether the innate amount of digging that young ants do is in any way affected by the overall spatial size of the colony. If it is, then we are back to a problem of perfect information - how do the young ants know how big the overall colony is? Perhaps using density as a proxy? Alternatively, if the young ants do not modify their digging, wouldn't the colony become continuously larger? As a non-expert in social insects, I may be misunderstanding and it may be already addressed in the citations used.

      We thank the reviewer for this interesting question. We find that the nest excavation is predominantly performed by the younger ants in the nest, and the nest area increase is followed by an increase in the population. However, if the young ants dig unrestricted, this could result in unnecessary nest growth as suggested by reviewer #2. Therefore, we believe that the innate digging behavior of ants could potentially be regulated by various cues such as;

      (a) Density-based: If the colony becomes less dense as its area expands, this could serve as a feedback signal for young ants to reduce or stop digging, as described in references (25, 29, 30).

      (b) Pheromone depositions: If the colony reaches a certain population density, pheromone signals could inhibit further digging by young ants, references (25, 29), or space usage as a proxy for the nest area. 

      Thus, rather than perfect information, decentralized control, and digging-based local cues probably regulate the level of age-dependent digging, without the ants needing to estimate the overall colony size or nest area.

      In any case, this is an excellent paper. The modelling approach is excellent and compelling, also allowing extrapolation to other group sizes and even other species. This to me is the main strength of the paper, as the answer to the question of whether it is younger or older ants that primarily excavate nests could have been answered by an individual tracking approach (albeit there are practical limitations to this, especially in the observation nest setup, as the authors point out). The analysis of the tunnel structure is also an important piece of the puzzle, and I really like the overall study.

      We thank the reviewer for the comments. We completely agree that individual tracking of ants within our experimental setup would have been the ideal approach, but we were limited by technical and practical limitations of the setup, as pointed out by the reviewer, such as; 

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b) The experimental duration of our colony maturation and fixed demographics experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      These details are described in detail within the revised version of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this study, Harikrishnan Rajendran, Roi Weinberger, Ehud Fonio, and Ofer Feinerman measured the digging behaviours of queens and workers for the first 6 months of colony development, as well as groups of young or old ants. They also provide a quantitative model describing the digging behaviours and allowing predictions. They found that young ants dig more slanted tunnels, while older ants dig more vertically (straight down). This finding is important, as it describes a new form of age polyethism (a division of labour based on age). Age polyethism is described as a "yes or no" mechanism, where individuals perform or not a task according to their age (usually young individuals perform in-nest tasks, and older ones foraging). Here, the way of performing the task is modified, not only the propensity to carry it or not. This data therefore adds in an interesting way to the field of collective behaviours and division of labour.

      The conclusions of the paper are well supported by the data. Measurements of the same individuals over time would have strengthened the claims.

      We sincerely thank reviewer #3 for the time and effort dedicated to our manuscript's detailed review and assessment. We completely agree with the reviewer’s comments on the measurements of the same individuals over time, however, we were limited by the technical and experimental limitations as described above and pointed out by reviewer #2.

      Strengths:

      I find that the measure of behaviour through development is of great value, as those studies are usually done at a specific time point with mature colonies. The description of a behaviour that is modified with age is a notable finding in the world of social insects. The sample sizes are adequate and all the information clearly provided either in the methods or supplementary.

      We thank reviewer #3  for this assessment.

      Weaknesses:

      I think the paper is failing to take into consideration or at least discuss the role of inter-individual variabilities. Tasks have been known to be undertaken by only a few hyper-active individuals for example. Comments on the choice to use averages and the potential roles of variations between individuals are in my opinion lacking. Throughout the paper wording should be modified to refer to the group and not the individuals, as it was the collective digging that was measured. Another issue I had was the use of "mature colony" for colonies with very few individuals and only 6 months of age. Comments on the low number of workers used compared to natural mature colonies would be welcome.

      Regarding the main comment 1

      We completely agree with the reviewer’s comment on considering inter-individual variability based on activity levels. We have discussed how individual morphological variability could influence digging behavior (references: 28, 31), and we will elaborate further on this aspect in future revisions.

      Regarding the main comment 2:

      The term ‘colony maturation’ in our study refers to the progressive development of colonies from a single queen, distinguishing it from experiments that begin with pre-established, demographically stable colonies. We provide a detailed explanation for this terminology in the revised version of the manuscript. We were practically limited by the continuation of the experiments for more than 6 months of age, predominantly due to the stability of nests, as they were made with a sand-soil mix. We also acknowledge that the colony sizes attained in our maturation experiments may be smaller than those of naturally matured colonies. This trend was observed generally in lab-reared colonies and could be attributed to differences in microclimatic conditions, foraging opportunities, space availability, and other factors. We have explicitly described these details in the revised version of the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The experimental design is fantastic. The large quasi-2D should allow for the direct visualization of the movements of individuals and the creation of the nest, and the inclusion of non-workers (specifically, a mated queen and pupae) is new and important. However, I have some questions and concerns about the results, as outlined below. Also, I found the paper difficult to read, and the connections between the various experiments and the model were not always clear. 

      We thank the reviewer for the time and effort dedicated to reviewing our manuscript. We have modified the manuscript substantially to address the comments and readability. 

      The assumption that the digging rate is constant across ants may be a strong one. Previous work (see, for instance, Aguilar, et al, Science 2018) has demonstrated a very heterogeneous workload distribution among ants. I am not sure what implications that may have for the results here, but the authors should comment on this choice. Related to the point above, given a constant digging rate, the variation in digging is attributed to an age-dependent "desired target area". Can the authors comment on the implications of this, specifically in contrast to a variable digging rate? The distinction between digging rate differences and target area differences seems to be important for the authors. However, the way this is presented, it is difficult to fully understand or appreciate this importance and its implications. What is the consequence of this difference, and why is this important?

      We apologize to the reviewer for the confusion.

      Our model does not assume that the digging rate (da/dt, Equation 1) remains constant throughout the experiment. Instead, we only treat the basal digging rate (r) as a constant.

      The variable digging rate (da/dt, Equation 1) is derived by multiplying the basal rate constant (r) by the term (1 - a/a<sub>age</sub>), which accounts for deviations from the age-dependent target area that the ants aim to achieve. This makes the actual digging rate dynamic, as it responds to changes in excavated area (e.g., expansion or rapid collapse)

      For example, according to our model (Equation 1), two ants with the same basal digging rate (r) may exhibit markedly different actual digging rates at a given time if they differ in age. This occurs because the variable digging rate (da/dt) depends not only on ‘r’ but also on the age-dependent term (1 - a/a<sub>age</sub>). Also, we emphasize that the use of a basal digging rate constant aligns with prior studies (refs. 24, 29, 30).

      In our work, we demonstrate that after a collapse event, ants of all ages dig at rates comparable to those observed in the initial (pre-collapse) phase of the experiment. This occurs because the ants are far from their age-dependent target area, effectively resetting their digging behavior. By comparing maximum digging rates pre- and post-collapse, we provide strong empirical evidence that this rate is age-independent (SI Fig. 6A, 6B), supporting the conclusion that the basal digging rate constant (r) is a fundamental property of the ants' behavior, unaffected by age.

      We agree with the reviewer that individual tracking of ants within our experimental setup would have been the ideal approach. Then, we could have taken the inter-individual variability of the digging activity into account. However, we were limited to doing so by the technical and practical limitations of the setup, such as; 

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b) The experimental duration of our colony maturation experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      In light of these points, the following lines are added to the discussion (line numbers: 283-295), signifying the above points:

      “Our age-dependent model demonstrates that the digging behavior in Camponotus fellah is governed by a basal digging rate constant (r) modulated by the age-dependent feedback (1 − a/aage). Crucially, we show that after a collapse, the maximum digging rates return to their pre-collapse levels, suggesting that this basal rate ’r’ represents an age-independent ceiling on how fast ants can dig, regardless of age or context (SI Fig. 6 A, B). Previous studies have demonstrated both homogeneous and heterogeneous workload distribution, with varying digging rates among ants (24, 29, 30, 35). Studies showing heterogeneous workload distribution relied on continuous individual tracking of ants to quantify digging rates (35). However, this approach was not feasible in our current design due to the experimental durations of both our colony maturation and fixed demographics experiments. Additionally, sample size requirements naturally limited our ability to conduct continuous individual tracking during nest construction in our study. Thus, based on empirical measurements from our fixed-demographics experiments and supported by the age-independent post-collapse digging rates, we adopted a constant basal digging rate for simulating our age-dependent model—an assumption aligned with both prior literature and the collective dynamics observed in our system (24,29,30)”.

      Model: as presented, the model seems to lack independent validation. The model seems to have built-in that there is an age-dependent target area, and this is what is recovered from the model. I am failing to see what is learned from the model that the experiments do not already show. Also, the model has no ant interactions, though ants are eusocial and group size is known to have a large effect on behavior (this is acknowledged by the authors at the beginning of the discussion). Can the authors comment on this?My recommendation would be to remove the model from this paper or improve the text to address the above comments.

      We did not draw the conclusion of the age-dependent target area from our model. We used the fixed demographics experiments to quantify the age-dependent area target as a function of the age of individuals. We then used this age-dependent area target in our model to quantify the excavation dynamics of the colony maturation experiments, where ants span a variety of ages, as the nest population changes over time, resulting in natural variation in the ages of individuals within the nest.  These results could not have been obtained by performing any of the individual experiments, whether colony maturation or the fixed demographics, young or old, on their own. The need for different age demographics was crucial to quantify the age-dependent effects in nest excavation, which were lacking in previous studies. 

      First, the age-dependent model provides a very good estimate for the natural growth of the nest.  More importantly, after fixing an age threshold of 56 days (mean + standard deviation of the young ant age), the model provides an estimate of which ants are doing the majority of the digging during natural nest expansion. This teaches us that during natural expansion, the older ants are far from their density target and therefore do not engage in any substantial digging, which is shown in Figure 4. C. 

      On the other hand, the younger ants are close to their area targets and induced to dig. Indeed, the target area fitted for the age-independent model closely approximates the empirically measured age-dependent target when extrapolated to very young ants. This provides further support for the idea that, in the colony maturation experiments, the youngest ants are responsible for most of the digging.

      Our model is a simple analytical model, inspired by earlier models that used a fixed area target (such as density models) for nest construction. However, because we knew the precise age of workers in our experiments, we were able to obtain age-dependent area targets, thereby challenging the use of a constant area target (as employed in prior studies) in light of our findings from the fixed demographics of young and old colonies.

      Empirically Quantifiable Parameters: We wanted our model to have empirically quantifiable parameters. Since we did not continuously record the experiment, we could not quantify agent-agent interactions, pheromonal depositions, or similar factors.

      Minimal Model Design: We aimed to keep the model as minimal as possible, which is why we did not include complex interactions such as those found in continuous tracking experiments.

      However, the model does set up some interesting hypotheses that could easily be tested with the experimental setup (e.g., marking the ants / tracking individual activity levels). For instance, it is hypothesized that older ants dig less often, but when they do dig, they do so at the same rate. Given the 2D setup, the authors could track individual ants and test this hypothesis. Also, if the desired target area does decrease with age, the authors could verify this hypothesis by placing older ants into arenas with different-sized pre-formed nests to observe how structure is changed to achieve the desired area/ant.

      We thank the reviewer for this comment.

      We believe that the confusion with the usage of a constant basal digging rate is resolved now. To briefly reiterate, ants dig at variable rates that can be decomposed to a (constant on short time scales but age-dependent) basal rate times the (variable) distance from the density target. The suggested experiments are beyond the scope of our current study, and further studies could utilize the suggested experimental design with better time-resolved imaging for individual ant tracking that could verify the predictions from our model. 

      Specific comments:

      Title:

      The title suggests a broad result, yet the study focuses on one ant species. Please modify the title to more accurately reflect the scope of the work.

      We thank the reviewer for the comment.

      The title is modified as “Colony demographics shape nest construction in Camponotus fellah ants.”

      Introduction:

      Important information and context are missing about this ant species. For instance, please add the following about this species in the introduction:

      What is their natural habitat and substrate? How does the artificial soil compare?

      What is their (rough) colony size? [later, discuss experiment group size choice and potential insights/limitations of results when applied to the natural system].

      The details have been added to the introduction (line numbers : 49-55) and the materials and methods section (Study species).

      “Camponotus fellah ants are native to the Near East and North Africa, particularly found in countries like Israel, Egypt, and surrounding arid and semi-arid regions, where they prefer to nest in moist, decaying wood, including tree trunks, branches, or stumps (49,50). The species lives in monogynous colonies with tens to thousands of individuals. Nests are commonly found in a sand-loamy mix, which is a combination of sand, soil, clay, or gravel, providing structural stability and moisture retention (51). They are typically found under rocks, in the crevices of dried vegetation, or dry, sandy soils, sometimes in areas with loose gravel, with a colony size ranging from tens to thousands of workers”.

      What is the natural life expectancy of a worker? A queen? [later, discuss fixed demographic age choices in this context and/or why were age ranges chosen for experiments?].

      The lifespan of ants, including both queens and workers, varies significantly based on caste, species, and environmental conditions.

      (1) Queen Longevity: From the literature, Camponotus fellah queens can live up to 20 years, with one documented case reaching 26 years (50). 

      (2) Worker Longevity: In contrast to queens, the lifespan of workers is much shorter. Lab studies on Camponotus fellah (82) and other Camponotus species (83) suggest that workers can live for several months depending on environmental conditions, colony health, and caste-specific roles (e.g., minor vs. major workers)

      (3) Laboratory vs. Natural Conditions: Worker longevity is highly variable between laboratory and natural conditions

      Therefore, in the context of the old worker lifespan in our experiments, ~200 days (roughly 6–7 months), we strongly believe that the worker lifespan used in our experiments represents a substantial portion of a worker's expected life. While exact figures for C. fellah workers are unavailable, inferences from related species suggest that workers nearing 200 days are approaching the latter stages of their lifespan, making them meaningfully "old". 

      The details are added to the main text (line numbers: 124-127) and discussion (line numbers: 278-282).

      Why was this species chosen? Convenience, or is there something special about this species that the readers should know? Specifically, is there something that might make the results more general or of broader interest?

      Camponotus fellah was chosen for this study because it is native to Israel, making it convenient to collect and maintain in the lab. Additionally, its nuptial flights occur close to the study location, ensuring a steady supply of colonies. We were able to provide them with a nesting substrate similar to what they naturally use, as their nests are typically found in a sand-loamy mix, similar to the sand-soil mix in our artificial nests. This was possible because we had the opportunity to observe their habitat and nesting behavior in the wild, allowing us to gather preliminary information on their natural nesting conditions.

      Results:

      Line 60: "several brood items" - how many exactly? Was this consistent across experiments? Do mated queens ever produce more pupae during the experiments?

      Yes, the number of brood items (5) was added consistently across the experiments. Additionally, the mated queen did produce pupae during the course of the experiments, which was evident from the noticeable increase in the number of workers in the nest. This was significantly higher than the number of brood items present at the start of the study.

      The above points are added to the section (line numbers : 68-69).

      Figure 1: Panel A - The food ports are never mentioned in the text. Are the ants fed during the experiments? If so, what? With what frequency? Is the water column replenished/maintained? If so, how and how often? panel C - how long did this experiment last?

      We thank the reviewer for pointing this out. We have now updated the nest maintenance section in the Materials and Methods (line numbers : 349-354) part to include all the necessary details and clarifications.

      “We provided food to the ants ad libitum through three separate tubes containing water, 20 % sucrose water, and protein food. The protein mixture included egg powder, tuna, prawns, honey, agar, and vitamins. Each of the three tubes was filled with 5 ml of their respective contents and sealed with a cotton stopper to prevent overflow. The tubes were positioned at a slight angle and connected using a custom-made plexiglass adapter to facilitate the flow of liquids. These tubes were replenished once depleted, and regularly replaced once the nest maintenance was carried out bi-weekly.”

      Line 76: "...excavation was commenced by the founding queen". How were the queen and pupae introduced into the system?

      We initiated colony maturation experiments by introducing a single mated queen and several brood items (pupae) at random positions on the soil layer of the nest (line numbers : 68-69)

      Line 87: Please provide bounds for 11cm2/ant value. Is there any biological or physical justification for this number?

      We thank the reviewer for the suggestion. We have now provided the bounds as requested (line numbers : 97-101). 

      We were unable to pinpoint a specific biological justification based solely on this treatment. However, on extrapolating the age-dependent area fit we derived from the fixed demographics experiment, we found that at the age of 1 day, an ant has a target area of approximately 11.17 cm², which is the largest age-dependent area target possible within our experimental setup.

      From the colony maturation experiment, we obtained the value of  11.6 (±1.15) cm² as the area per ant. The consistency between the area per ant obtained from two completely different treatments across different colonies yielded similar results. We propose that under standardized conditions, a 1-day-old ant has a theoretical maximum target area of 11.17 cm²—the highest value observed in our experimental framework.

      Lines 98-99: "one straightforward possibility would be that newborn ants are the ones that dig". This statement contradicts the results presented in Figures 1 and S1 - the population increase seems to occur at least a few days before increased excavation in nearly all cases.

      We apologize for any confusion caused by our initial phrasing. To clarify, we proposed that a lag likely exists between population growth and nest area expansion. This lag could arise from two sequential processes: (1) newborn ants require time to mature and become active (first delay), and (2) digging to expand the nest takes additional time (second delay; estimated at ~10 days from the cross-correlation analysis). Thus, our results suggest that it is not the population that lags behind the area, but rather the area that lags behind the population, as demonstrated in Figures 2D and SI. Figure. S1.

      The sentence “one straightforward possibility would be that newborn ants are the ones that dig” is modified as below (line numbers : 112-119) to prevent further confusion.

      “One possible explanation is that, although all ants are capable of digging, it is primarily the newly emerged ants who perform this task. In this case, nest expansion would lag behind colony growth due to two delays: first, the time needed for young ants to mature enough to begin digging, and second, the physical time required to excavate additional space (e.g., around 10 days). This mechanism could eliminate the need for ants to assess overall colony density, as each new group of active workers simply enlarges the nest as they become ready. An alternative possibility is that all ants, regardless of age, respond to increased density by initiating excavation. In that scenario, nest expansion would follow more immediately after the emergence of new individuals, making delays less prominent (24, 29, 30)”.

      Line 105: How do group sizes compare to natural colony size? Line 106: How do "young" and "old" classifications compare to natural life expectancy?

      We have already addressed this question in an earlier comment. The details are added to the main text (line numbers: 124-127) and discussion (line numbers: 278-282).

      Line 118-119: How are nests artificially collapsed?

      We have added a new section in the Materials and Methods section that describes the nest collapsing procedure (Nest artificial collapse - line numbers : 386-399).

      Figure 2 Panel A: The white dotted line is nearly impossible to see. Please use a more visible color.

      We thank the reviewer for the comment.

      We changed the solid circles to violet and the dotted line color to continuous white.

      Figure 3: The use of circle markers as post-collapse recovery in young and old as well as old pre-collapse is confusing. Use different symbols for old pre-collapse vs young and old post-collapse.

      We thank the reviewer for pointing out the confusion. We have revised the figure markers as suggested and modified the main text accordingly.

      • Young; pre-collapse : star

      • Young; post-collapse : diamond

      • Old; pre-collapse : circle

      • Old; post-collapse: triangle.

      Figure 3 Panel C: Indicate that fixed demographic values here are pre-collapse. Also, as presented, it appears that there is a large group-size dependence that is not commented on. Previous results (Line 87 and Figure 2C) suggest a constant excavation area per ant of 11cm2/ant. Figure 3, panel C appears to suggest a group-size dependence. If these values are divided by group size, is excavated area per ant nearly constant across groups? How does the numerical value compare to the slope from Figure 2C?

      We thank the reviewer for their insightful comments.

      First, we would like to clarify that the area target of 11.1 (±1) cm²/ant, as described in Line 87, was obtained from the colony maturation experiments. In these experiments, we were unable to track the age of each individual ant, so the area target was calculated by normalizing the total excavated area by the number of ants.

      We normalized the excavated area by the group size for both young and old colonies as suggested, and found that the area per ant was not significantly different across the group sizes (see new SI Fig. 5A). This indicates that the excavated area per ant remains relatively constant within each demographic group. Moreover, this shows that the total excavated area is proportional to group size, in agreement with previous works (24, 29, and 30). 

      We have explicitly described the above information in the line numbers: 142-146

      Regarding the slope comparisons, the slope of Figure 2C (10.71), from the colony maturation experiments, is the largest, followed by the area per ant from the short-term young (8.79 ± 0.98) cm²/ant, and short-term old experiments (5.16 ± 0.44) cm²/ant.

      Lines 128-129: "...younger ants aim to approach a higher target area". Seems hard to know what they "aim" to do... rephrase to report what they are observed to do.

      We thank the reviewer for the comment. The sentence is rephrased as suggested (line numbers : 158-161).

      “In the previous sections, we showed that in fixed-demographics experiments, younger ants excavated a significantly larger nest area compared to older ants (Fig. 3. C).  This difference emerged despite similar temporal patterns in digging rates across age groups, with excavation activity peaking within the first 7 days before asymptotically decaying as nest expansion approached saturation (SI Fig. 8).”

      Lines 133-141: The model description is not clear. Specifically, what parameters are ant-dependent? How does A relate to a?

      We appreciate the reviewer's request for clarification. In our model:

      (1) Equation 1 describes the change in the excavated area due to the digging activity of a single ant. Here, the variable 'a' represents the area excavated by one ant. This formulation allows us to capture the individual digging behavior and its impact on the excavation process.

      (2) Equation 2 extends this concept to the total area excavated in the nest, denoted by 'A'. Specifically, 'A' is the sum of the areas excavated by all ants present in the nest. In other words, it aggregates the individual contributions of each ant, linking the microscopic digging behavior to the macroscopic excavation dynamics.

      Therefore, the relationship between 'a' and 'A' is as follows:

      ●     'a' = Area excavated by a single ant.

      ●     'A' = ∑ 'a' (Summed over all ants in the nest).

      We have explicitly mentioned this in the line numbers “ 161-179”, and describe the model assumptions and parameters in detail.

      Figure 4:

      Figure 4, Panel A: The equation quoted in the caption does not match the data in the figure. The equation has a positive slope and negative intercept, while the figure has a negative slope and a positive intercept. Please provide the correct equation and bounds on fit parameters.

      We thank the reviewer for spotting this typing mistake.

      The equation was already updated in the reviewed preprint published online. The correct equation and the fit bound are provided in the figure caption.

      “Target areas decrease linearly with the ant age (y = −0.032x + 11.22 , 95 % CI (Intercept : (-0.035,-0.027), Slope : (10.53,11.91)), R2 = 0.96 ).”

      Figure 4, Panel A: There seem to be three "fixed target area per ant values" in the paper: around 11cm2/ant (line 87), 11.6 cm2/ant (SI Figure 2), and linearly dependent value from fit to Figure 4A. The distinctions between these values and their significance are hard to keep track of. Can the authors add a discussion somewhere that helps the reader better understand? Is there a way to connect/rationalize/explain these different values in terms of demographics?

      We thank the reviewer for the suggestion.We have added a paragraph in the discussion (line numbers : 270-277) describing the area targets.

      “In our colony maturation experiments, we found that area per ant was highest when the workers were youngest, with values around 11.1–11.6 (±1–1.15). This aligns with observations from naturally growing nests, where newly eclosed ants dominate the population and nest volumes are relatively large. Supporting this, fixed-demographics experiments showed that the area excavated per ant declines linearly with worker age, indicating that the youngest ants contribute most to excavation. Notably, the target area we fit for the age-independent model (11.6 ± 1.15) closely matches the extrapolated value for very young workers (Fig. 4. A), reinforcing the idea that young ants are the primary excavators during early colony growth. In contrast, during events like collapses or displacement, when space is urgently needed, ants of all ages participate in excavation.”

      Figure 4, Panel A: What are various symbols and colors for data with error bars? If consistent with Figure 3, then this panel and subsequent model confound two factors: (1) the age dependence and (2) the behavioral differences pre- and post-collapse (structures are different pre-and post-collapse, according to SI Figure 6; line 120: "...colonies ceased digging when they recovered 93{plus minus}3% of the area lost by the manual collapse..."; lines 201-202: "We find significant quantitative and qualitative differences between nests constructed within this natural context and nests constructed in the context of an emergency") and behavior is different (according to SI Figure 7 and line 119: "...all ants dig after collapse...")). Therefore, without further supporting evidence, it does not seem that these data should be used to fit a single line that defines a model parameter a_age for each ant in equation 2.

      The symbols are the area per ant quantified from the fixed demographics of young, and old experiments. The symbols show the following;

      A.  Star - Young, pre-collapse

      B.  Diamond - Young, post-collapse 

      C.  Circle - Old, pre-collapse

      D.  Triangle - Old, post-collapse.

      The details are clearly described in the figure caption. 

      We apologize to the reviewer for the confusion. We argue that the data can be fit by a single line to quantify the parameter ‘a_age’ as follows. 

      A. All data presented in Figure 4A were obtained from the same fixed-demographics experiments (containing only young and old ants) under experimental collapse conditions, pre- and post-collapse. These results, therefore, exclusively reflect emergency nest-building behaviors during emergency scenarios and do not include any observations from natural colony maturation processes.

      B. Age-dependent excavation differences: As correctly noted by the reviewer, the observed difference in excavated area before versus after collapse reflects the natural aging of ants in our experimental colonies. While colonies recovered >90% of lost area post-collapse, the residual variation was not negligible—instead, it systematically correlated with colony age structure. By tracking colonies across this demographic transition, we obtained additional data points spanning a broader developmental spectrum. This extended range strengthened our ability to detect and quantify the linear relationship between worker age and excavation output.

      C.The quoted sentence (lines 201-202, submitted version) refers to comparisons across all three experimental cases: (1) fixed-demographics young ants, (2) fixed-demographics old ants, and (3) the natural scenario (mixed-age colonies). Importantly, these comparisons are based on pre-collapse steady-state excavation areas, ensuring a consistent baseline across treatments. We highlight quantitative and qualitative differences between these distinct experimental groups, not between pre- and post-collapse phases within the same treatment. The pre- and post-collapse data within fixed-demographics groups were analyzed separately to avoid conflating aging effects with emergency responses.

      To avoid confusion, the whole paragraph in the discussion (line numbers : 253-260) is rephrased.

      In lines 201-202; “We find significant quantitative and qualitative differences between nests constructed within this natural context and nests constructed in the context of an emergency”. 

      Here, by natural context, we mean the nests excavated in the colony maturation experiments. We believe that it could have been confusing, and the sentence is modified as answered for the previous question. 

      Figure 4, Panel B: This uses the model with a_age determined by from Figure 4A and the life table (as shown in the supplemental), whereas the supplemental Figure SI 8 uses the fixed blue line a_age value for the model, which comes from the colony maturation experiments. The age-independent model in the supplemental fits the data better, yet the authors claim the supplemental model cannot be applied to the data because of their experimentally determined age-dependent target area. Given the age-independent target area model fits better, additional evidence/justification is needed to support the choice of the model.

      We agree with the reviewer that the age-independent model fits the data well. However, we believe that the fixed area target cannot be used to explain the excavation dynamics for the following reasons.

      We make an important assumption in our model: that the ants rely on local cues and that individual ants can not distinguish between the fixed demographics and colony maturation experiments (line numbers : 161-166). Given this assumption, the ants cannot change their behavior between experiments, meaning the same model should fit all of our results. However, the fixed demographics experiments revealed a significant difference in the areas excavated by young vs. old cohorts, despite having the same group size. If the ants regulated the excavated area based on an age-independent constant density target model, then the excavated area in the fixed demographics of young and old colonies would have been similar. This discrepancy indicates that the target area per ant is not constant, as assumed in the age-independent density model (SI. Fig. 8). We emphasize that while the age-independent model provides a better fit for the excavated area in colony maturation experiments, the age-dependence of excavation is empirically supported by fixed-demographics experiments. Therefore, we implemented this age-dependence through a variable target area within the age-dependent model framework to explain excavation dynamics in the colony maturation experiments.

      These details are explicitly mentioned in the main text (line numbers : 187 - 198)

      Figure 4, Panel C: Is this plot entirely from the model, or are the data points measured from experiments? Please label this more clearly.

      We apologize to the reviewer for the confusion.

      The Figure 4C is based on the age-dependent digging model. We applied the model to population data from the long-term experiments (n = 22). By setting an age threshold of 56 days (since ants used in the short-term young experiment had an average age of 40 ± 16 days), we categorized the ants into young and old groups. We then quantified the area dug by the young ants, the queen, and the old ants in terms of the percentage of the total area excavated. We hypothesized that, because young ants have a lower digging threshold, they would perform the majority of the digging. We indeed confirm this in Figure 4C.

      This information is added to the main text and described in detail (line numbers: 200 - 208).

      Lines 162-165: "...Furthermore, we quantified the area dug by each ant in the normal colony growth experiment as estimated from the age-dependent model and found that all ants excavated more or less the same amount...". Figure 4D shows a distribution with significant values ranges from 1-16 cm2... how is this interpreted as "more or less the same amount" and what is the significance of this?

      We apologise to the reviewer for the confusion.

      We quantified the percentage contribution to the excavated area of each histogram bin (provided in the new SI table: 4), and found that the area excavated between 5 cm² and 13 cm² accounts for 73.76% of the total excavated area. This indicates that most ants dug within this range rather than exhibiting extreme variations. Additionally, the mean excavation amount is 7.84 cm², with a standard deviation of 3.44 cm², meaning that most values fall between 4.4 cm² and 11.28 cm², which aligns well with the 5–13 cm² range. Since the majority of the excavation is concentrated within this narrow interval, and the mean is well centered within it, this suggests that ants excavated more or less the same amount, rather than forming distinct groups with highly different excavation behaviors.

      We have modified the main text (line numbers: 209-216) to include these points.

      The biological significance of this finding is that since all ants in the colony maturation experiments are born inside the nest, we hypothesize that they should excavate similar amounts. To test this, we quantified the area contribution of each ant over the entire duration of the experiment using the age-dependent digging model as described above and found that they indeed excavated more or less the same amount. From our analysis of fixed demographics experiments, we showed that the youngest ants excavate the largest area. Since the majority of the youngest ants participated in the colony maturation experiments, this further supports our hypothesis.

      Figure 5.

      Figure 5, Panels A-C: Please provide a scale bar. 

      The scale bar is provided in the figure as suggested. The algorithm for the cutoffs for tunnel vs wide tunnels is described in detail in the section “Nest skeletonization, segmentation, and orientation.”

      Figure 5, Panel E: Why does the chamber error bar for 5 ants go to zero?

      In Figure 5, E, we plot the standard error, as described in the figure caption. In the experiments, the chamber area contributions were (0,0,39.94,0) respectively. The mean of the 4 numbers is 9.985, the standard deviation is 19.97, and the standard error is 9.985. So, the mean and the standard error are the same, so the lower error bar goes to zero, and the upper error bar goes to 19.97. This implies that in these experiments, the chamber area is often zero.

      Figure 5, Panel I: Why are there no chambers for young colonies in I when they are in the histogram in E?

      We apologize to the reviewer for the confusion. We initially missed adding the chamber orientation data of the young colonies to Panel I, but it has now been included.

      Line 212: "...densities of ants never become too high...". What is too high? Is there some connection to biological or physical constraints?

      Under normal growth conditions, nest volume is kept proportional to the number of ants, ensuring that the density remains within a specific range. This prevents overcrowding, which could otherwise lead to excessively high densities.

      Yes, we believe there is likely a connection to both biological and physical constraints. The proportional relationship between nest volume and the number of ants is likely driven by factors such as:

      (1) Biological Constraints:

      Ant Colony Size: Ants typically adjust their behavior and social structure to maintain an optimal population size relative to available resources and space.Overcrowding could lead to potentially a breakdown in colony function.

      Colony Health: High densities can lead to faster epidemic spread, leading to negative effects on reproduction, foraging efficiency, and overall colony health. By maintaining density within a specific range, the colony can thrive without these adverse effects.

      (2) Physical Constraints:

      Spatial Limitations: The physical space within the nest limits how many ants can occupy it before space becomes constrained. The nest’s structure and size must physically accommodate the ants, and the volume must be large enough to prevent overcrowding, and efficient resource distribution.

      Lines 272 and 302: How often were photos taken? These two statements seem to suggest different data collection rates.

      As stated in line 272, photos were taken every 1 to 3 days. During each photo session, four photos were taken, with each photo separated by 2 seconds, as mentioned in line 302. To avoid confusion, we rephrased the sentence (line numbers: 359-361).

      “We photographed the nest development every 1-3 days. During each photography session, four pictures of the nest were taken, with a 2-second interval between each.”

      Reviewer #2 (Recommendations for the authors):

      Some more minor points/questions/clarifications:

      This might be pedantic, but I don't think the nest serves as the skeleton of the superorganism, while it does change and grow, the analogy becomes weak beyond that point. The skeleton serves to protect the internal organs of the organism, facilitates movement and muscle attachment, and creates new blood cells. I would be more comfortable with a statement that the nest can grow or shrink according to need.

      We sincerely thank the reviewer for their time and effort in providing a detailed review and assessment of our manuscript. A point-by-point response to the comments is provided below.

      The analogy of treating a nest structure to the skeleton of a superorganism was based on the following points;

      (a) Protection: A nest protects the colony on a collective scale. This is analogous to protecting "organs" by a skeletal framework.

      (b) Organization and Division of Space: The skeletal structure organizes the body's internal layout, just as nest structures are organized into various spatial compartments for various colony functions, with specific regions designated for brood chambers, food storage, and waste disposal.

      Thus, we believe that the analogy can still be valid in a metaphorical way.

      Does this statement need justification with a citation, or is that information contained in the subsequent clause? "However, for more complex structures where ants congregate in specific chambers, workers are less likely to assess the overall nest density." The idea that workers do (or do not) assess overall density touches on many issues, including that of perfect information and adaptive responses, that it seems it needs to be well founded in previous work to be stated in such unequivocal terms.

      We thank the reviewer for this comment. The references for this argument are provided in the next sentence. We have now moved these references to the relevant sentence (reference number: 24, 29,30; line number : 30-31 ) 

      Can you give some more information on this statement? "Experiments were terminated either when the queen died or when she became irreversibly trapped after a structural collapse." Why was this collapse irreversible and therefore unlike treatment 2? Did the queen die in these instances? Was this event more likely than in natural colonies? And if so, was there something inherently different about your experiments that limit interpretation under natural conditions (e.g. the narrow nature of the observation setup? The consistency of the sand?)

      Our nest excavation experiments were terminated under two primary scenarios: (1) the queen died of natural causes, reflecting the baseline mortality expected when queens are brought into laboratory conditions, or (2) the nest experienced a structural collapse that left the queen irreversibly trapped. The second scenario is further elaborated below:

      Irreversible Collapses: These collapses were classified as irreversible because the queen could not be rescued alive. This occurred when the structural stability of the nest failed, burying the queen in a manner that prevented recovery. In some cases, the collapse resulted in the queen's immediate death, while in others, she was trapped beyond reach, and any rescue attempt risked further structural damage.

      Collapse and Experimental Context: These collapses were not uniquely associated with natural colonies or fixed-demographic experiments; rather, they occurred across various experimental setups.

      The sentence is modified as below to improve clarity (line numbers : 70-72 ).

      “In all instances where a collapse resulted in the queen's death or her being irreversibly trapped in the nest, the experiment was excluded from analysis starting from the point of the collapse, as such events did not reflect normal colony dynamics.”

      I want to make sure I understand the following statement: "Moreover, the area excavated by the young cohorts was similar to that excavated by naturally maturing colonies at the point in which they reached the same population size (Tukey's HSD; group size: 5; p = 0.61, group size: 10; p = 0.46, group size: 15; p = 0.20)." Do I have it right that this means a group of (e.g. 10) young ants excavates an area similar to that of a group of 10 naturally maturing ants at the same age as the young ants?

      Yes, the interpretation provided is correct. We apologize to the reviewer for the confusion. We have rephrased the sentence for better readability (line numbers : 146-148).

      “Furthermore, the area excavated by the young cohorts was comparable to that excavated by naturally maturing colonies when they reached the same population size (Tukey's HSD; group size: 5, p = 0.61; group size: 10, p = 0.46; group size: 15, p = 0.20)”

      How old do ants get? Is the 'old' demographic (~200 days) meaningfully old in the context of the overall worker lifespan? While the results certainly demonstrate there is an age effect, I would like to understand how rapid this is in terms of overall lifespan.

      The lifespan of ants, including both queens and workers, varies significantly based on caste, species, and environmental conditions.

      (1) Queen Longevity: From the literature, Camponotus fellah queens can live up to 20 years, with one documented case reaching 26 years. This remarkable longevity underscores the queen's central role in maintaining the colony.

      (2) Worker Longevity: In contrast to queens, the lifespan of workers is much shorter.

      However, specific data on worker longevity in Camponotus fellah colonies are lacking. Studies on other Camponotus species (50, 82) suggest that workers can live for several months depending on environmental conditions, colony health, and caste-specific roles (e.g., minor vs. major workers).

      (3) Laboratory vs. Natural Conditions: Worker longevity is highly variable between laboratory and natural conditions

      Therefore, in the context of the old worker lifespan in our experiments of, ~200 days (roughly 6–7 months) we strongly believe that the worker lifespan used in our experiments represents a substantial portion of a worker's expected life. While exact figures for C. fellah workers are unavailable, inferences from related species suggest that workers nearing 200 days are approaching the latter stages of their lifespan, making them meaningfully "old."

      These details are added to the main text (line numbers : 124 - 127) and to the discussion (line numbers : 278-282)

      Reviewer #3 (Recommendations for the authors):

      We sincerely thank the reviewer for their time and effort in providing a detailed review and assessment of our manuscript. A point-by-point response to the comments is provided below.

      L10: "fixed demographics": I find this term unclear, what does it mean, it should specify if the groups are with or without a queen.

      We thank the reviewer for the comment. The sentence is modified in the abstract, and definitions are later added in detail in the introduction (line numbers : 8-10) and the Materials and Methods section (Fixed demographics colonies). 

      “We experimentally compared nest excavation in colonies seeded from a single mated queen and allowed to grow for six months to excavation triggered by a catastrophic event in colonies with fixed demographics, where the age of each individual worker, including the queen, is known”.

      The details of the “fixed demographics” treatments were explained in the later portion of the text (line numbers: 58-61).

      L36: I think it is documented that younger individuals are the ones who involved in nest construction in many species.

      Previous studies on nest construction were predominantly performed on mature colonies of specific age demographics or rather mixed demographics, where age was not considered as a factor influencing nest construction. Some studies have speculated that young ants could be the most probable ones to dig, but this has not been experimentally verified to the best of our knowledge.

      L50: I do not think the colony should be called mature after only 6 months, given that colonies reach thousands of workers.

      The sentence is changed as suggested (line numbers : 56-57).

      “The "Colony-Maturation" experiment observed the development of colonies up to six months, starting from a single fertile queen and progressing to colonies with established worker populations.” 

      L60: Where was the queen introduced? It is specified in the Methods but a word here would be helpful.

      The detail is added as suggested (line numbers : 68-69).

      “We initiated colony maturation experiments by introducing a single mated queen and several brood items (n = 5, across all experiments) at random positions on the soil layer of the nest.”

      L106: Young vs Old workers 40 vs 171 days. Maybe cite a reference or provide a reason for the selection of those ages?

      Previous studies have shown that the Camponotus fellah queens can live up to 20 years, with one documented case reaching 26 years (50). To the best of our knowledge, specific data on worker longevity in Camponotus fellah colonies in natural conditions are lacking. Lab studies on Camponotus fellah (82) and other Camponotus species (50) suggest that workers can live for several months depending on environmental conditions, colony health, and caste-specific roles (e.g., minor vs. major workers). 

      We intentionally selected workers from two distinct age groups: younger ants (40 ± 16 days old) and older ants (171.56 ± 20 days old). These ages represent functionally different life stages - the younger group had completed about 25% of their expected lifespan at the start of the experiment, while the older group had lived through most of theirs (50, 82). This 4-fold age difference allowed us to compare excavation behaviors across fundamentally different phases of adult life.

      Our experiments lasted for 60-90 days, during which all participating workers continued to age. To ensure all ants remained alive throughout the experiments, and given the constraints of the experimental timeline, we selected young and old workers within the specified age range. 

      These details are added to the main text (line numbers :  124 -127), and the discussion (line numbers  : 278-282)

      L122-123: But usually ants can vary highly in their behaviours. Can the authors comment on their choice to consider an average, implying that all ants of the same age had the same digging rates?

      We thank the reviewer for the comment.

      In our experiments, we could not track each worker's activity over time. As described in the methods, we took snapshots of the nest structure over days and recorded the population size of the nest. Thus, we could not capture the activity of single ants in the nest as described in the response to major comments in the reviewed preprint.

      We agree that individual tracking of ants within our experimental setup would have been the ideal approach. Then, we could have taken the inter-individual variability of the digging activity into account. However, we were limited to doing so by the technical and practical limitations of the setup, such as; 

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b)The experimental duration of our colony maturation and fixed demographics experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      To clarify this, we have added the following to the discussion (line numbers: 286-292).

      “Previous studies have demonstrated both homogeneous and heterogeneous workload distribution, with varying digging rates among ants (24,29,30,35). Studies showing heterogeneous workload distribution relied on continuous individual tracking of ants to quantify digging rates (35). However, this approach was not feasible in our current design due to the experimental durations of both our colony maturation and fixed demographics experiments. Additionally, sample size requirements naturally limited our ability to conduct continuous individual tracking during nest construction in our study.”

      L171: A line on how the nest structure was acquired and data extracted would be welcome here.

      The algorithm for the nest structure segmentation, data extraction, and analysis is added in detail to the SI section: Nest skeletonization, segmentation, and orientation. The line is modified (line numbers : 221-224) in the main text as suggested.

      “We compared nest architectures by segmenting raw nest images into chambers and tunnels (see SI Section: Nest Skeletonization, Segmentation, and Orientation). Chambers were identified as flat, horizontal structures, while tunnels were narrower and more vertical in orientation (see SI Fig. 9, SI Section: Nest Skeletonization, Segmentation, and Orientation)”.  

      Figure 3: Where does the data of the mean in panel C come from: is it the mean of the first 30 days, before the collapse? How is it comparable with the rest?

      We apologize to the reviewer for the confusion.

      In panel C, the mean values (solid stars and circles) for fixed-demography colonies (young/old groups) represent pre-collapse excavation areas. For colony maturation experiments (where no collapses were induced), we instead plot the mean saturated excavation area for each group size. This allows direct comparison of mean excavated areas across experimental conditions at equivalent colony sizes.

      To improve readability, the following sentences are added to the main text (line numbers : 139 - 146 ) 

      “We compared the saturated excavation areas (pre-collapse) from fixed-demographics experiments (young and old groups) with those from colony maturation experiments of the same colony sizes (Fig. 3C). We find that, for a given age cohort (young or old), the saturation areas increase linearly with the colony size (GLMM, F(35,37); p < 0.0001) (Fig. 3 C, SI. Fig 7 A). The observed proportional scaling between excavated area and group size aligns with previous studies, even though those studies did not explicitly account for age demographics (24, 29, 30). After normalizing the pre-collapse excavated area by group size for both young and old colonies, we found no significant difference in area per ant across group sizes (SI Fig. 5. A). This indicates that the excavated area per ant remains relatively constant within each demographic group”.

      L209-210: I would be more parsimonious in saying that the results presented prove that the target area decreases with age, as the individual behaviour of the ants was not monitored. Suggestion: rephrase to "the target of the group decreases with age".

      The sentence is rephrased as suggested (line numbers : 265-266).

      “Our results reveal that this target area of the group decreases linearly with age, such that young ants are more sensitive to shortages in space.”

      L246: Are C.fellah colonies really found with such few workers?

      Previous studies have speculated that mature Camponotus fellah colonies are a monogynous species typically founded by a single queen following nuptial flights (50,51,82), and can range from tens to thousands of workers. However, during the founding stage (as in our experiments), colonies naturally pass through smaller developmental sizes comparable to the matured colonies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Thank you for your precise summary of our study, and being very positive on the novelty and significance of the study.

      Weaknesses

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      We now toned down the description of the paradigm and cited more related references.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      We have modified the description as suggested.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      Indeed, we could not faithfully distinguish boxing and tussling. To address this concern, we now made textual changes in the result section we occasionally observed the high-intensity boxing and tussling behavior in male flies, which are difficult to distinguish and hereafter simply referred to as tussling.

      We also added this information in the Materials and Methods section Tussling is often mixed with boxing, in which both flies rear up and strike the opponent with forelegs. Since boxing is often transient and difficult to distinguish from tussling, we referred to the mixed boxing and tussling behavior simply as tussling.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      We now cited this important study in both the Introduction and Discussion sections.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      We now provided more detailed description about behavioral assays and how we analyze them. For example All testers were loaded by cold anesthesia. After a 30-minute adaptation, the film was gently removed to allow the two males to fell into the behavioral chamber, and the aggressive behavior was recorded for 2 hours.

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      We used the wtcs flies from Baker lab in Janelia Research Campus, and are not sure where they are originated. We appreciate your concern on the use of wild-type strains as they may show different fighting levels, but this study mainly used wild-type strains to compare behavioral differences between SH and GH males. All flies tested in this study are in w+ background, based on w+ balancers flies but are not backcrossed. We have listed detailed genotypes of all tested flies in Table S1 in the revised manuscript.

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      We used a fixed female to restrict it in the center of food. These females remain active throughout the assay as their legs and abdomens can still move. Such design intends to combine the attractive effects from both female and food. One can also use decapitated females, but in this case, males can push the decapitated female into anywhere in the behavioral chamber. The logic to use fixed females has now been added in the Materials and Methods section of the revised manuscript.

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      Indeed, tussling is very low in SH males in our paradigm, which may be due to different genetic backgrounds and behavioral assays. Since tussling behavior is a rare fighting form, it is not surprising to see variation between studies from different labs. Nevertheless, this study compared tussling behaviors in SH and GH males, and our finding that GH males show much more tussling behaviors is convincing. The longer duration of tussling in our paradigm may also be due to the modified behavioral paradigm, which also supports that tussling is a high-level fighting form.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      Thank you for this important comment. We now toned down the statement on pC1SS2 function.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      We conducted pathway analysis in the FlyWire electron microscopy database to investigate the connection between Or47b neurons and pC1 neurons. The results indicate that at least three levels of interneurons are required to establish a connection from Or47b neurons to pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they provide a reference for circuit connect in males.

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

      Thank you for this suggestion. We agree with you and other reviewers that increased tussling behaviors correlate with better mating competition, but it is difficult for us to make a direct link between them. Thus, in the revised manuscript, we prefer to tone down this statement but not expanding on this part.

      Reviewer #2 (Public review):

      Summary

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Thank you for the acknowledgment of the novelty and significance of the study, and your suggestions for improving the manuscript.

      Weaknesses

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      We agree that our data did not directly show that it is the change of aggression mode that results in territory and reproductive advantages in GH males. To address the concern, we have toned down the statement throughout the manuscript. For example, we made textual changes in the abstract as following

      Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success, mitigating the disadvantages associated with aging. Our findings identify distinct sensory and central neurons for two fighting forms and suggest how social experience shapes fighting strategies to optimize reproductive success.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

      Thank you for this important suggestion. We now analyzed duration of tussling and lunging, and found that a lunging event is often very short (less than 0.2s), while a tussling event may last from seconds to minutes. This new data is added as Figure 2G. In addition, we also provided more detailed methods regarding to tussling behavior

      .<br /> Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are keys for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Thank you for the precise summary of the key findings and novelty of the study, and your insightful suggestions.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      Thank you for this very important suggestion. We now provided more detailed description of the two fighting forms in the Materials and Methods section. See below

      Lunging is characterized by a male raising its forelegs and quickly striking the opponent, and each lunge typically lasts less than 0.2 seconds through detailed analysis. Tussling is characterized by both males using their forelegs and bodies to tumble over each other, and this behavior may last from seconds to minutes. Tussling is often mixed with boxing, in which both flies rear up and strike the opponent with forelegs. Since boxing is often transient and difficult to distinguish from tussling, we referred to the mixed boxing and tussling behavior simply as tussling. As we manually analyze tussling for 2 hours for each pair of males, it is possible that we may miss some tussling events, especially those quick ones.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      Thank you very much for this important question. Indeed, there are many experiments that could do to better understand the function of pC1SS2 neurons, and we only provide the initial characterization of them due to the limited scope of this study. My lab has been focused on studying P1/pC1 function in both male and female flies and will continue to do so.

      To partially address your concern, we made the following revisions

      (1) We provided higher-resolution images of P1a and pC1SS2 (Figure 4C-4E). While their cell bodies are very close, they project to distinct brain regions, in addition to some shared ones.

      (2) By staining these neurons with GFP and co-staining with anti-FruM or anti-DsxM antibodies, we showed that P1a neurons are partially FruM-positive and partially DsxM-positive, while pC1SS2 neurons are DsxM-positive and FruM-negative (Figure 5A-5D).

      (3) As pC1SS2 neurons are DsxM-positive and FruM-negative, we also examined how DsxM regulates the development of these neurons. We found that knocking down DsxM expression in pC1SS2 neurons using RNAi significantly affected pC1 development regarding to both cell numbers (Figure 5G) and their projections (Figure 5H).

      (4) We further found that DsxM in pC1SS2 neurons is crucial for executing their tussling-promoting function, as optogenetic activation of these neurons with DsxM knockdown failed to induce tussling behavior in the initial activation period, and a much lower level of tussling in the second activation period compared to control males (Figure 5I-5K).

      (5) While it is very difficult to identify the upstream and downstream neurons of P1a and pC1SS2 neurons, we made an initial step by utilizing trans-tango and retro-Tango to visualize potential downstream and upstream neurons of P1a and pC1SS2 (Figure 4-figure supplement 2), which certainly needs future investigation.  

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other.

      It has been previously found that Or47b-expressing ORNs respond to fly pheromones common to both sexes, and group-housing enhances their sensitivity. Regarding to how Or47b ORNs promotes two different types of innate behaviors, a simple explanation is that they act on multiple second-order and further downstream neurons to regulate both courtship and aggression, not mentioning that neural circuitries for courtship and aggression are partially shared. We did not include this in the discussion as we would like to focus on aggression modes, and how different ORNs (Or47b and Or67d) mediate distinct aggression modes.

      Regarding to the relationship between Or47b ORNs and pC1<sub>SS2</sub> neurons, or in general ORNs to P1/pC1, it is interesting and important to explore, but probably in a separate study. We tried to conduct pathway connection analyses from Or47b to pC1 using the FlyWire database, and found that Or47b neurons can act on pC1 neurons via three layers of interneurons. Although the FlyWire database currently only contains neuronal data from female brains, they can provide a certain degree of reference. We hope the editor and reviewers would agree with us that identifying these intermediate neurons involved in their connection is beyond this study.

      Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Indeed, we do not have direct evidence to show it is tussling that makes socially experienced males to dominate over socially isolated males. To address your concern, we have made following revisions

      (1) We toned down the statements about the relationship between fighting strategies and reproductive success throughout the manuscript. For example, in the abstract Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success.

      (2)  Regarding to whether a SH male can engage in tussling with a GH male, we found that while two SH males rarely perform tussling, paired SH and GH males displayed similar levels of tussling like two GH males, although tussling duration from paired SH and GH males is significantly lower compared to that in two GH males (Figure 6-figure supplement 2).

      (3) To support the potential role of tussling in territory control and mating competition, we performed additional experiments to silence Or47b or pC1SS2 neurons that almost abolished tussling, and paired these males with control males. We found that males with Or47b or pC1SS2 neurons silenced cannot compete over control males, further suggesting the involvement of tussling in territory control and mating competition.  

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

      Thank you very much for this comment and we almost totally agree.

      (1) Our results suggest the involvement of distinct sensory neurons and central neurons for lunging and tussling, but do not exclude the possibility that they may also utilize shared neurons. For example, activation of P1a neurons promotes both lunging and tussling in the presence of light.

      (2) We have now toned down the statements about the relationship between fighting strategies and reproductive success throughout the manuscript.

      (3) We provided more detailed methods, genotypes of flies to improve transparency of the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1 Supplement 1 shows that increased aging has a linear and inverse relationship with the number of lunges, this is in contrast to a previous study from Dierick lab (Chowdhury, 2021), where using Divider assays they showed that aggressive lunges increased up to day 10 and subsequently decreased in 30-day old flies. Given that this study did not use 14-day-old flies, it might be useful to comment on this.

      Thank you for this comment. Indeed, Chowdhury et al., suggested a decline of lunging after 10 days, which is not contradictory to our findings that lunging in 14d-old males is lower than that in 7d-old males. It is ideally to perform a time-series experiments to reveal the detailed relationship between ages and aggression (lunging or tussling) levels, but given our initial findings that 14d-old males showed stable tussling behavior, we prefer to use this time point for the rest of this study.

      (2) For Figure 3, do various manipulations also affect the duration of tussling and boxing besides frequency and latency?

      Thank you for this comment. We only analyzed latency and frequency, but not duration, as data analysis was performed manually rather than automatically on every fly pair for about 2 hours, which is very labor-consuming. We hope you could agree with us that the two parameters (frequency and latency) for tussling are representative for assaying this behavior.

      (3) For Figure 3 A-F, the housing status of the males is not clearly mentioned either in the main text or the figure. What is the status of the tussling and lunging status when this housing condition is reversed when Or47b neurons are silenced, or the gene is knocked down? Do these manipulations overcome the effect of housing conditions similar to what is seen in NaChBac-mediated activation experiments?

      Figure 3A-F used group-housed males and we have now added such information in the figure legends as well as Table S1.

      We appreciate your suggestion on using different housing conditions. As silencing Or47b neurons or knocking down Or47b reduced tussling, it is reasonable to use GH males (as we did in Figure 3A-F) that performed stable tussling behavior, but not SH males that rarely tussle.

      (4) The connections between Or47b neurons and pC1SS2 or P1a neurons can be addressed by available connectomic datasets or TransTango/GRASP approaches.

      Thank you for this important suggestion. We used the FlyWire electron microscope database to analyze the pathway connections between these two types of neurons. The results indicated that there are at least three levels of interneurons for connecting Or47b and pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they can provide a certain degree of reference for males.

      The lack of direct synaptic connection also suggests that it is challenging to resolve the connection between these two neuronal types using methods like trans-Tango/GRASP. To partially address this question, we utilized trans-Tango and retro-Tango techniques to visualize potential downstream and upstream neurons of P1a and pC1SS2 (Figure 4-figure supplement 2). Future investigations are certainly needed for clarifying functional connections between Or47b/Or67d and P1a/pC1SS2 neurons.

      (5) Figure 5, 'Winning index' and 'Copulation advance index' while described in Material and Methods, should be referred to in the main text.

      We now described these two indices briefly in the main manuscript, and in the Discussion section with more details.

      (6) Figure 6 shows comparisons for territorial control and mating outcomes where four different housing and aging conditions are organized in a hierarchical sequence. It is not clear from the data in Figure 5, how this conclusion was arrived at. A supplementary table with various outcomes with statistical analysis would help with this.

      We now added a supplementary table (Table S2) with various outcomes with statistical analysis.

      Minor Comments

      (1) Line 26 says that the courtship levels in SH and GH males are not different, however, unilateral wing extension is higher in SH males as compared to GH males (Pan & Baker, 2014; Inagaki et al., 2014), also it was shown that courtship attempts are higher in D. paulsitorium (Kim & Ehrman, 1998). It would be better to clarify this statement.

      Indeed, it is found in some cases that SH males court more vigorously than GH males. We have added more references on this matter in the introduction.

      (2) Figure 4, correct 'Tussing' to 'Tussling' or 'Box, Tussling' as appropriate.

      Corrected.

      (3) Duistermars, 2018 should be cited while discussing the role of vision in aggression (Figure 4). [A Brain Module for Scalable Control of Complex, Multi-motor Threat Displays]

      We now cited this reference and added more discussion in the revised manuscript.

      (4) Reviews on Drosophila aggression and social isolation can be cited in the introduction/discussion to incorporate recent literature e.g., Palavicino-Maggio, 2022 [The Neuromodulatory Basis of Aggression Lessons From the Humble Fruit Fly]; Yadav et al., 2024[Lessons from lonely flies Molecular and neuronal mechanisms underlying social isolation], etc.

      We now cited these references in both the introduction and discussion sections.

      (5) The concentration of apple juice agar should be mentioned in the methods.

      We added this and other necessary information for materials in the Materials and Methods section of the study.

      (6) Source of the LifeSongX software and, if available, a Github link would be helpful to include in the materials and methods section.

      We now provided the source of the LifesongY software (website https//sourceforge.net/projects/lifesongy/), which is a Windows version of LifesongX (Bernstein, Adam S.et al., 1992).

      Reviewer #2 (Recommendations for the authors):

      (1) Major comment 1

      As pointed out in the public review, the weakness of this study is that the relationship between the aggression strategy and reproductive success is an inference that is not based on experimental facts; I understand that the frequency of tussling is not so high, but at least tussling-like behavior can be observed in the territory control experiment shown in Video 3. Wouldn't it be possible to re-analyse data and examine the correlation between aggressive behavior and territory control? Even if the analysis of tussling itself in this setup is difficult, for example, additional experiments using Or47b knock-out fly or pC1[SS2]-inactivated fly could provide stronger support.

      Indeed, we can only make a correlation between the type of aggressive behavior and territory control. We now toned down this statement throughout the manuscript. For example, in the abstract, we changed our conclusions as following

      Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success. Our findings identify distinct sensory and central neurons for two fighting forms and suggest how social experience shapes fighting strategies to optimize reproductive success.

      To further address the concern, we now performed additional experiments to silence Or47b or pC1SS2 neurons that almost abolished tussling, and paired these males with control males. We found that males with Or47b or pC1SS2 neurons silenced cannot compete over control males (Figure 6-figure supplement 3), further suggesting the involvement of tussling in territory control and mating competition.

      In relation to the above, some of the text in the Abstract should be changed.Line 28 These findings "reveal" how social experience shapes fighting strategies to optimise reproductive success.

      "suggest" is more accurate at this stage.

      Changed as suggested.

      (2) Major comment 2

      The tussling is the central subject of this paper. However, neither the main text nor Materials and Methods section provides a clear explanation of how this aggression mode was detected. Did the authors determine this behavior manually? Or was it automatically detected by some kind of image analysis? In either case, the criteria and method for detecting the tussling should be clearly described.

      The behavioral data analysis in this study was performed manually. We now provided more detailed description of the two fighting forms in the Materials and Methods section. See below

      Lunging is characterized by a male raising its forelegs and quickly striking the opponent, and each lunge typically lasts less than 0.2 seconds through detailed analysis. Tussling is characterized by both males using their forelegs and bodies to tumble over each other, and this behavior may last from seconds to minutes. Tussling is often mixed with boxing, in which both flies rear up and strike the opponent with forelegs. Since boxing is often transient and difficult to distinguish from tussling, we referred to the mixed boxing and tussling behavior simply as tussling. As we manually analyze tussling for 2 hours for each pair of males, it is possible that we may miss some tussling events, especially those quick ones.

      For the experimental groups where tussling cannot be observed, the latency is regarded as 120 min, but this is a value depending on the observation time. While it is reasonable to use the latency to evaluate the behavior such as the lunging that is observed at relatively early times, care should be taken when using it to evaluate the tussling. Since similar trends to those obtained for the latency are observed for Number of tussles and % of males performing tussling, it may be better to focus on these two indices.

      We initially intended to provide all three statistical metrics. However, we found that using the "% of males performing tussling" would require a significantly larger sample size for subsequent statistical analysis (using chi-square tests), greatly increasing the workload. At the same time, we believe that the trend observed with "% of males performing tussling" is consistent with the other two indices, and the percentage information can also be derived from the individual sample scatter data of the other two metrics. Therefore, we opted to use "latency" and "numbers" as the statistical metrics, despite the caveat as you mentioned.

      The authors repeatedly mention that tussling is less frequent but more vigorous. The low frequency can be understood from the data in Fig. 1 and Fig. 2, but there are no measured data on the intensity. As the authors mention in line 125, each tussling event appears to be sustained for a relatively long period, as can be seen from the ethogram in Fig. 2. For example, it would be possible to evaluate the intensity by measuring the duration of the tussling event.

      Thank you for your valuable suggestion. We now analyzed duration of tussling and lunging, and found that a lunging event is often very short (less than 0.2s), while a tussling event may last from seconds to minutes, further supporting their relative intensities. This new data is added as Figure 2G.

      (3) Minor comments

      a) Line 117 How many flies were placed in one vial for group-rearing (GH)? Were males and females grouped together? Please specify in the Materials and Methods section.

      We have added this information in the Materials and Methods section. In brief, 30-40 virgin males were collected after eclosion and group-housed in each food vial.

      b) Line 174 The trans-Tango is basically a postsynaptic cell labeling technique. It is unlikely that the labeling intensity changes depending on neuronal activity. Do the authors want to say in this text the high activity of Or47b-expressing neurons under GH conditions? Or are they trying to show that the expression level of the Or47b gene, which is supposedly monitored by the expression of GAL4, is increased by GH conditions? The authors should clarify which is the case.

      Although the primary function of the trans-Tango technique is to label downstream neurons, the original literature indicates that the signal strength in downstream neurons depends on the use of upstream neurons evidenced by age-dependent trans-Tango signals. Therefore, the trans-Tango technique can indirectly reflect the usage of upstream neurons. Our findings that GH males showed broader Or47b trans-Tango signals than SH males can indirectly suggest that group-housing experience acts on Or47b neurons. We made textually changes to clarify this.

      c) Line 178 Which fly line labels the mushroom body; R19B03-GAL4?

      Yes, we now provided the detailed genotypes for all tested flies in the Table S1.

      d) Line 184 It was reported in Koganezawa et al., 2016 that some dsx-expressing pC1 neurons are involved in aggressive behavior. The authors should also refer to this paper as they include tussling in the observed aggressive behavior.

      Thank you for this comment, and we now cited this reference in the revised manuscript.

      e) Line 339 I think you misspelled fruM RNAi.

      Thank you for pointing this out. fruMi refers to microRNAi targeting fruM, and we have now clearly stated this information in the main text.

      f) Line 681 Is tussling time (%) the total duration of tussling occurrences during the observation time? Or is it the percentage of individuals observed tussling during the observation time? This needs to be clarified.

      It is the former one. We now clearly stated this definition in the Materials and Methods section

      Reviewer #3 (Recommendations for the authors):

      For authors to support their conclusion that enhanced tussling among socially experienced flies allows them to better retain resources, it is necessary to quantify aggressive behaviors (mainly tussling and lunging) in Figure 5.

      We agree that we can only make a correlation between enhanced tussling behavior and mating competition. We now toned down this statement throughout the manuscript. For example, in the abstract, we changed our conclusions as following Moreover, shifting from lunging to tussling in socially enriched males is accompanied with better territory control and mating success. Our findings identify distinct sensory and central neurons for two fighting forms and suggest how social experience shapes fighting strategies to optimize reproductive success.

      To further address the concern, we now performed additional experiments to silence Or47b or pC1SS2 neurons that almost abolished tussling, and paired these males with control males. We found that males with Or47b or pC1SS2 neurons silenced cannot compete over control males (Figure 6-figure supplement 3), further suggesting the involvement of tussling in territory control and mating competition.

      In contrast to the authors' data in Figure 4, movies in ref 36 clearly show instances of 2 flies exchanging lunges after the optogenetic activation of P1a neurons, like the examples shown in supplementary movies S1-S3. It is a clear discrepancy that requires discussion (and raises a concern about the lack of transparency about behavioral quantification).

      In our study, optogenetic activation of P1<sup>a</sup> neurons failed to induce obvious tussling behavior, and temperature-dependent activation of P1<sup>a</sup> neurons can only induce tussling in the presence of light. These data are different from Hoopfer et al., (2015), but are generally consistent with a new study (Sten et al., Cell, 2025), in which pC1SS2 neurons but not P1a neurons promote aggression. Such discrepancy has now been discussed in the revised manuscript.

      The authors often fail to cite relevant references while discussing previous results, which compromises the scholarship of the manuscript. Examples include (but are not limited to)

      (1) Line 85-86 Simon and Heberlein, J. Exp. Biol. 223 jeb232439 (2020) suggested that tussling is an important factor for flies to establish a dominance hierarchy.

      Reference added.

      (2) Line 142-143 Cuticular compounds such as palmitoleic acid are characterized to be the ligands of Or47b by ref #18.

      Reference added.

      (3) Line 185-187 pC1SS1 and pC1SS2 are first characterized by ref #46. Expression data of this paper also implies that pC1SS1 and pC1SS2 label different neurons in the male brain.

      We have now added this reference at the appropriate place in the revised manuscript. In addition, we have clarified that these two drivers exhibit sexually dimorphic expression patterns in the brain.

      (4) Line 196-199 Cite ref #36, which describes the behavior induced by the optogenetic activation of P1a neurons.

      Reference added.

      (5) Line 233-235 The authors' observation that control males do not form a clear dominance directly contradicts previous observations by others (Nilsen et al., PNAS 10112342 (2002); Yurkovic et al., PNAS 10317519 (2006); also see Trannoy et al., PNAS 1134818 (2016) and Simon and Heberlein above). The authors must at least discuss why their results are different.

      There is a misunderstanding here. We clearly state that there is a ‘winner takes all’ phenomenon. However, for wild-type males of the same age and housing condition, we calculated the winning index as (num. of wins by unmarked males – num. of wins by marked males)/10 encounters * 100%, which is roughly zero due to the randomness of marking.

      (6) Line 251-254 The authors' observation that aged males are less competitive than younger males contradicts the conclusion in ref #18. Discussion is required.

      We have now added a discussion on this matter. In brief, Lin et al., showed that 7d-old males are more competitive than 2d-old males, which is probably due to different levels of sexual maturity of males, but not a matter of age like our study that used up to 21d-old males.

      (7) Line 274-275 It is unclear which "previous studies" "have found that social isolation generally enhances aggression but decreases mating competition in animal models". Cite relevant references.

      Reference added.

      (8) Line 309-310 The evidence supporting the statement that "there are only three pairs of pC1SS2 neurons". If there is a reference, cite it. If it is based on the authors' observation, data is required.

      We have now provided additional data on the number of pC1SS2 neurons in Figure 5G of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The manuscript by Feng et al. reported that the Endothelin B receptor (ETBR) expressed by the satellite glial cells (SGCs) in the dorsal root ganglions (DRG) acted to inhibit sensory axon regeneration in both adult and aged mice. Thus, pharmacological inhibition of ETBR with specific inhibitors resulted in enhanced sensory axon regeneration in vitro and in vivo. In addition, sensory axon regeneration significantly reduces in aged mice and inhibition of ETBR could restore such defect in aged mice. Moreover, the study provided some evidence that the reduced level of gap junction protein connexin 43 might act downstream of ETBR to suppress axon regeneration in aged mice. Overall, the study revealed an interesting SGC-derived signal in the DRG microenvironment to regulate sensory axon regeneration. It provided additional evidence that non-neuronal cell types in the microenvironment function to regulate axon regeneration via cell-cell interaction. 

      However, the molecular mechanisms by which ETBR regulates axon regeneration are unclear, and the manuscript's structure is not well organized, especially in the last section. Some discussion and explanation about the data interpretation are needed to improve the manuscript. 

      We thank the reviewer for the positive comments. We agree that the mechanisms by which ETBR signaling functions as a brake on axon growth and regeneration remain to be elucidated. We believe that unraveling the detailed molecular pathways downstream of ETBR signaling in SGCs that promote axon regeneration is beyond the scope of this manuscript. Answering these questions would first require cell specific KO of ETBR and Cx43 to confirm that this pathway is operating in SGCs to control axon regeneration. We would also need to identify how SGCs communicate with neurons to regulate axon regeneration, which is a large area of ongoing research that remains poorly understood. Our data showing that pharmacological inhibition of ETBR with specific FDA-approved inhibitors enhances sensory axon regeneration provide not only new evidence for non-neuronal mechanisms in nerve repair, but also a new potential clinical avenue for therapeutic intervention.

      As suggested by the reviewer, we have extensively revised the organization of the manuscript, especially the last section of results. We have performed additional snRNAseq experiments to establish the impact of aging in DRG. We have also performed additional experiments to determine if blocking ETBR improves target tissue reinnervation. Following the reviewer’s suggestion, we have also expanded the Discussion section to discuss alternative mechanisms and o]er additional interpretation of our data. Below we describe how we address each point in detail.

      (1) The result showed that the level of ETBR did not change after the peripheral nerve injury. Does this mean that its endogenous function is to limit spontaneous sensory axon regeneration? In other words, the results suggest that SGCs expressing ETBR or vascular endothelial cells expressing its ligand ET-1 act to suppress sensory axon regeneration. Some explanation or discussion about this is necessary. Moreover, does the protein level of ETBR or its ligand change during aging?  

      We thank the reviewer for this point. Our results indeed indicate that one endogenous function of ETBR is to limit the extent of sensory axon regeneration. This may be a part of a mechanism to limit spontaneous sensory axon growth or plasticity and maladaptive neural rewiring after nerve injury. While the increased growth capacity of damaged peripheral axons can lead to reconnection with their targets and functional recovery, the increased growth capacity can also lead to axonal sprouting of the central axon terminals of injured neurons in the spinal cord, and to pain (see for example Costigan et al 2010, PMID: 19400724).  In the context of aging that we describe here, this protective mechanism may hinder beneficial recovery. Other mechanisms that slow axon regeneration have been reported, and include, for example, axonally synthesized proteins, which typically support nerve regeneration through retrograde signaling and local growth mechanisms. RNA binding proteins (RBP) are needed for this process. One such RBP, the RNA binding protein KHSRP is locally translated following nerve injury. Rather than promoting axon regeneration, KHSRP promotes decay of other axonal mRNAs and slows axon regeneration.  Another example includes the Rho signaling pathway, which was shown to function as an inhibitory mechanism that slows the growth of spiral ganglion neurites in culture. We have now included these examples in the Discussion section.

      To address the reviewer’s second question, we have checked protein levels of ETBR and ET-1 in adult and aged DRG tissue. We observed a robust increase in ET-1 in aged DRG, while the levels of ETBR did not appear to change significantly. These results are now presented in Figure 4- Figure Supplement 1, and further support the notion that in aging, activation of the ETBR signaling hinders axon regeneration.

      (2) In ex vivo experiments, NGF was added to the culture medium. Previous studies have shown that adult sensory neurons could initiate fast axon growth in response to NGF within 24 hours. In addition, dissociated sensory neurons could also initiate spontaneous regenerative axon growth without NGF after 48 hours. Some discussion or rationale is needed to explain the di]erence between NGF-induced or spontaneous axon growth of culture adult sensory neurons and the roles of ETBR and SGCs. 

      We appreciate the reviewer’s suggestion. In adult DRG explant or dissociated cultures, NGF is not typically required for survival or axon outgrowth. However, in dissociated culture, the addition of NGF to the medium stimulates growth from more neurons compared to controls (Smith and Skene 1997). In the DRG explant, NGF does not promote significant e]ects on axon growth, but stimulates glial cell migration (Klimovich et al 2020). We opted to included NGF in our explant assay to increase the potential of stimulating axon regeneration with pharmacological manipulations of ETBR. We have now clarified these considerations in the Method section.

      (3) In cultured dissociated sensory neurons, inhibiting ETBR also enhanced axon growth, which meant the presence of SGCs surrounding the sensory neurons. Some direct evidence is needed to show the cellular relationship between them in culture.  

      We thank the reviewer for raising this point and have added new data, now presented in Figure 2B, to show that in mixed DRG cultures, SGCs labeled with Fabp7 are present in the culture in proximity to neurons labeled with TUJ1, but they do not fully wrap the neuronal soma. These results are consistent with prior findings reporting that as time in culture progresses, SGCs lose their adhesive contacts with neuronal soma and adhere to the coverslip (PMID: 22032231, PMID: 27606776).  While in some cases SGCs can maintain their association with neuronal soma in the first day in culture after plating, in our hands, most SGCs have left the soma at the 24h time point we examined. 

      (4) In Figure 3, the in vivo regeneration experiments first showed enhanced axon regeneration either 1 day or 3 days after the nerve injury. The study then showed that inhibiting ETBR could enhance sensory axon growth in vitro from uninjured naïve neurons or conditioning lesioned neurons. To my knowledge, in vivo sensory axon regeneration is relatively slow during the first 2 days after the nerve injury and then enters the fast regeneration mode on the 3rd day, representing the conditioning lesion e]ect in vivo. Some discussion is needed to compare the in vitro and the in vivo model of axon regeneration. 

      We agree that axon growth is relatively slow the first 2 days and enters a fast growth mode on day 3. This has been elegantly demonstrated in Shin et al Neuron 2012 (PMID: 22726832), where an in vivo conditioning injury 3 days prior increases axon growth one day after injury. In vitro, similar e]ects have been described: a prior in vivo injury accelerates growth capacity within the first day in culture, but a similar growth mode occurs in naive adult neurons after 2-3 days in vitro (Smith and Skene 1996). We also know that the neurite growth in culture is stimulated by higher cell density, likely because non-neuronal cells can secrete trophic factors (Smith and Skene 1996). Our in vitro results thus suggest that blocking ETBR in SGCs in these mixed cultures may alter the media towards a more growth promoting state. In vivo, our data show that Bosentan treatment for 3 days partially mimics the conditioning injury and potentiate the e]ect of the conditioning injury. One possible interpretation is that inhibition of ETBR alters the release of trophic factors from SGCs. Future studies will be required to unravel how ETBR signaling influence the SGCs secretome and its influence on axon growth. We have now included these discussions points in the Results and Discussion Section.

      (5) In Figure 5, the study showed that the level of connexin 43 increased after ETBR inhibition in either adult or aged mice, proposing an important role of connexin 43 in mediating the enhancing e]ect of ETBR inhibition on axon regeneration. However, in the study, there was no direct evidence supporting that ETBR directly regulates connexin 43 expression in SGCs. Moreover, there was no functional evidence that connexin 43 acted downstream of ETBR to regulate axon regeneration.  

      We thank the reviewer for this point and agree that we do not provide direct evidence that connexin 43 acts downstream of ETBR to regulate axon regeneration. To obtain such functional evidence would require selective KO of ETBR and Cx43 in SGCs, which we believe is beyond the scope of the current study. We have revised the Results and Discussion sections to emphasize that while we observe that ETBR inhibition increases Cx43 levels and Cx43 levels correlates with axon regeneration, whether Cx43 directly mediates the e]ect on axon regeneration remains to be established.  We also discuss potential alternative mechanisms downstream of ETBR in SGCs that could contribute to the observed e]ects on axon regeneration. Specifically, we discuss the possibility that  ETBR signaling may limit axon regeneration via regulating SGCs glutamate reuptake functions, because of the following reasons: 1) Similarly to astrocytes, glutamate uptake by SGCs is important to regulate neuronal function, 2) exposure of cultured cortical astrocytes to endothelin results in a decrease in glutamate uptake that correlates with a major loss of basal glutamate transporter expression (GLT-1 and1), 3) Both glutamate transporters are expressed in SGCs in sensory ganglia 4) GLAST and glutamate reuptake function is important for lesion-induced plasticity in the developing somatosensory cortex. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this interesting and original study, Feng and colleagues set out to address the e]ect of manipulating endothelin signaling on nerve regeneration, focusing on the crosstalk between endothelial cells (ECs) in dorsal root ganglia (DRG), which secrete ET-1 and satellite glial cells (SGCs) expressing ETBR receptor. The main finding is that ETBR signaling is a default brake on axon growth, and inhibiting this pathway promotes axon regeneration after nerve injury and counters the decline in regenerative capacity that occurs during aging. ET-1 and ETBR are mapped in ECs and SGCs, respectively, using scRNA-seq of DRGs from adult or aged mice. Although their expression does not change upon injury, it is modulated during aging, with a reported increase in plasma levels of ET-1 (a potent vasoconstrictive signal). Using in vitro explant assays coupled with pharmacological inhibition in mouse models of nerve injury, the authors demonstrate that ET-1/ETBR curbs axonal growth, and the ETAR/ETBR antagonist Bosentan boosts regrowth during the early phase of repair. In addition, Bosentan restores the ability of aged DRG neurons to regrow after nerve lesions. Despite Bosentan inhibiting both endothelin receptors A and B, comparison with an ETAR-specific antagonist indicates that the e]ects can be attributed to the ET-1/ETBR pathway. In the DRGs, ETBR is mostly expressed by SGCs (and a subset of Schwann cells) a cell type that previous studies, including work from this group, have implicated in nerve regeneration. SGCs ensheath and couple with DRG neurons through gap junctions formed by Cx43. Based on their own findings and evidence from the literature, the pro-regenerative e]ects of ETBR inhibition are in part attributed to an increase in Cx43 levels, which are expected to enhance neuron-SGC coupling. Finally, gene expression analysis in adult vs aged DRGs predicts a decrease in fatty acid and cholesterol metabolism, for which previous work by the authors has shown a requirement in SGCs to promote axon regeneration. 

      Strengths: 

      The study is well-executed and the main conclusion that "ETBR signaling inhibits axon regeneration after nerve injury and plays a role in age-related decline in regenerative capacity" (line 77) is supported by the data. Given that Bosentan is an FDA-approved drug, the findings may have therapeutic value in clinical settings where peripheral nerve regeneration is suboptimal or largely impaired, as it often happens in aged individuals. In addition, the study highlights the importance of vascular signals in nerve regeneration, a topic that has gained traction in recent years. Importantly, these results further emphasize the contribution of longneglected SGCs to nerve tissue homeostasis and repair. Although the study does not reach a complete mechanistic understanding, the results are robust and are expected to attract the interest of a broader readership. 

      We thank the reviewer for the positive comments, especially in regard to the rigor and originality of our study.

      Weaknesses: 

      Despite these positive comments provided above, the following points should be considered: 

      (1) This study examines the contribution of the ET-1 pathway in the ganglia, and in vitro assays are consistent with the idea that important signaling events take place there. Nevertheless, it remains to be determined whether the accelerated axon regrowth observed in vivo depends also on cellular crosstalk mediated by ET-1 at the lesion site. Are ECs along the nerve secreting ET-1? What cells are present in the nerve stroma that could respond and participate in the repair process? Would these interactions be sensitive to Bosentan? It may be di]icult to dissect this contribution, but it should at least be discussed.  

      We thank the reviewer for this important point and agree that the in vivo e]ects observed cannot rule out the contribution of ECs or SCs at the lesion site in the nerve. Dissecting the contribution of ETBR expressing cells in the nerve would require cell-specific manipulations that go beyond the scope of this manuscript. We have revised the Discussion section to highlight the potential contribution of ECs, fibroblast and SCs in the nerve.  

      (2) It is suggested that the permeability of DRG vessels may facilitate the release of "vascularderived signals" (lines 82-84). Is it possible that the ET-1/ETBR pathway modulates vascular permeability, and that this, in turn, contributes to the observed e]ects on regeneration?  

      We thank the reviewer for raising this interesting point. ET-1 can have an impact on vascular permeability. It was indeed shown that in high glucose conditions, increased trans-endothelial permeability is associated with increased Edn1, Ednra and Ednrb expression and augmented ET1 immunoreactivity (PMID: 10950122). It is thus possible that part of the e]ects observed results from altered vascular permeability. We have included this point in the Discussion section. Future experiments will be required to test how injury and age a]ects vascular permeability in the DRG.

      (3) Is the a]inity of ET-3 for ETBR similar to that of ET-1? Can it be excluded that ET-3 expressed by fibroblasts is relevant for controlling SGC responses upon injury/aging?  

      We thank the reviewer for raising this point. ET-1 binds to ETAR and ETBR with the same a]inity, but ET3 shows a higher a]inity to ETBR than to ETAR (Davenport et al. Pharmacol. Rev 2016 PMID: 26956245). We attempted to examine ET-3 level in adult and aged DRG by western blot, but in our hands the antibody did not work well enough, and we could not obtain clear results. We thus cannot exclude the possibility that ET-3 released by fibroblasts contribute to the e]ects we observe on axon regeneration. Indeed, in cultured cortical astrocytes, application of either ET-1 or ET-3 leads to inhibition of Cx43 expression. We have revised the text in the Discussion section to highlight the possibility that both ET-1 and ET-3 could participate on the ETBRdependent e]ect on axon regeneration.

      (4) ETBR inhibition in dissociated (mixed) cultures uncovers the restraining activity of endothelin signaling on axon growth (Figure 2C). Since neurons do not express ET-1 receptors, based on scRNA-seq analysis, these results are interpreted as an indication that basal ETBR signaling in SGC curbs the axon growth potential of sensory neurons. For this to occur in dissociated cultures, however, one should assume that SGC-neuron association is present, similar to in vivo, or to whole DRG cultures (Figure 2C). Has this been tested?

      We thank the reviewer for this point. In dissociated DRG culture, neurons, SGCs and other nonneuronal cells are present, but SGCs do not retain the surrounding morphology as they do in vivo. Within 24 hours in culture, SGCs lose their adhesive contacts with neuronal soma and adhere to the coverslip (PMID: 22032231, PMID: 27606776).  We have included new data in Figure 2B to show that in our culture conditions, SGCs are present, but do not wrap neurons soma as they do in vivo. We also know from prior studies that the density of the culture a]ects axon growth, an e]ect that was attributed to trophic factors released from non-neuronal cells (Smith and Skene 1997). Therefore, although SGCs do not surround neurons, the signaling pathway downstream of ETBR may be present in culture and contribute to the release of trophic factors that influence axon growth. We have revised the Results section to better explain our in vitro results and their interpretation.

      In both in vitro experimental settings (dissociated and whole DRG cultures) how is ETBR stimulated over up to 7 days of culture? In other words, where does endothelin come from in these cultures (which are unlikely to support EC/blood vessel growth)? Is it possible that the relevant ligand here derives from fibroblasts (see point #6)? Or does it suggest that ETBR can be constitutively active (i.e., endothelin-independent signaling)? Is there any chance that endothelin is present in the culture media or Matrigel? 

      We thank the reviewer for raising this point.  Our single-cell data indicate that ET-1 is expressed by endothelial cells and ET-3 by fibroblasts. In dissociated DRG culture at 24h time point, all DRGs cells are present, including endothelial cells and fibroblasts, and could represent the source of ET-1 or ET-3. In the explant setting, it is also possible that both ET-1 and ET-3 are released by endothelial cells and fibroblasts during the 7 days in culture. According to information for the suppliers, endothelin is not present neither in the culture media nor in the Matrigel. While mutations can facilitate the constitutive activity of the ETBR receptor, we are not aware of data showing that endogenous ETBR can be constitutively active.  Because the molecular mechanisms governing ETBR -mediated signaling remain incompletely understood (see for example PMID: 39043181, PMID: 39414992) future studies will be required to elucidate the detailed mechanisms activating ETBR in SGCs and its downstream signaling mechanisms.  We have now expanded the Results and discussion sections to clarify these points. 

      (5) The discovery that ET-1/ETBR signaling in SGC curtails the growth capacity of axons at baseline raises questions about the physiological role of this pathway. What happens when ETBR signaling is prevented over a longer period of time? This could be addressed with pharmacological inhibitors, or better, with cell-specific knock-out mice. The experiments would certainly be of general interest, although not within the scope of this story. Nevertheless, it could be worth discussing the possibilities. 

      We agree that this is an interesting point. As mentioned above in response to point #1 of reviewer 1, the physiological role of this pathway could be to limit plasticity and prevent maladaptive neural rewiring that can happen after injury (Costigan et al 2009, PMID: 19400724), but can also hinder beneficial recovery after injury. Other mechanisms that limit axon regeneration capacity have been described and involve local mRNA translation and Rho signaling. We have revised the Discussion section to include these points. We agree that understanding the consequence of blocking ETBR over longer time periods is beyond the scope of the current study, but we now discuss the possibility that blocking ETBR with a cell specific KO approach could unravel its physiological function on target innervation and behavior. 

      (6) Assessing Cx43 levels by measuring the immunofluorescence signal (Figure 5E-F) is acceptable, particularly when the aim is to restrict the analysis to SGCs. The modulation of Cx43 expression by ET-1/ETBR plays an important part in the proposed model. Therefore, a complementary analysis of Cx43 expression by quantitative RT-PCR on sorted SGCs would be a valuable addition to the immunofluorescence data. Is this attainable? 

      We agree and have attempted to perform these types of experiments but encountered technical di]iculties. We attempted to sorting SGCs from transgenic mice in which SGCs are fluorescently labeled. However, the cells did not survive the sorting process and died in culture.  We think that increasing the viability of cells after sorting would require capillary- free fluorescent sorting approaches. However, we do not currently have access to such technology. We attempted this experiment with cultured SGCs, following a previously published protocol (Tonello et al. 2023 PMID: 38156033). In these experiments, SGCs are cultured for 8 days to obtain purity. We did not observe any di]erence in Cx43 protein or mRNA level upon treatment with ET-1 with or without BQ788. However, in these SGCs cultures, Cx43 displayed a di]use localization, rather than puncta as observed in vivo. Therefore, despite our multiple attempts, quantifying Cx43 on sorted or purified SGCs was not attainable.

      (7) The conclusions "We thus hypothesize that ETBR inhibition in SGCs contributes to axonal regeneration by increasing Cx43 levels, gap junction coupling or hemichannels and facilitating SGC-neuron communication" (lines 303-305) are consistent with the findings but seem in contrast with the e]ect of aging on gap junction coupling reported by others and cited in line 210: "the number of gap junctions and the dye coupling between these cells increases (Huang et al., 2006)". I am confused by what distinguishes a potential, and supposedly beneficial, increase in coupling after ETBR inhibition, from what is observed in aging. 

      We agree that the aging impact of Cx43 level and gap junction number appears contradictory. Procacci et al 2008 reported that Cx43 expression in SGCs decreases in the aged mice. Huang et al 2006 report that both the number of gap junctions and the dye coupling between these cells were found to increase with aging. Procacci et al suggested as a possible explanation for this apparent discrepancy that additional connexin types other than Cx43 may contribute to the gap junctions between SGCs in aged mice. Our snRNAseq data did not allow us to verify this hypothesis, because there were less SGCs in aged mice compared to adult, and connexin genes were detected in only 20% or less of SGCs.  Furthermore, our quantification did not look specifically at gap junctions, but just at Cx43 puncta. Cx43 can also form hemichannels in addition to gap junctions, and can also perform non-channel functions, such as protein interaction, cell adhesion, and intracellular signaling. Thus, more research examining the role of Cx43 in SGCs is necessary to address this discrepancy in the literature. We have expanded the Discussion section to include these points. 

      (8) I find it di]icult to reconcile the results in Figure 5F with the proposed model since (1) injury increases Cx43 levels in both adult and aged mice, (2) the injured aged/vehicle group has a similar level to the uninjured adult group, (3) upon injury, aged+Bosentan is much lower than adult+Bosentan (significance not tested). It seems hard to explain the e]ect of Bosentan only through the modulation of Cx43 levels. Whether the increase in Cx43 levels following ETBR inhibition actually results in higher SGC-neuron coupling has not been assessed experimentally. 

      We thank the reviewer for this point and agree that the e]ect of Bosentan is likely not exclusively through the modulation of Cx43 levels in SGCs, and that Cx43 levels may simply correlate with axon regenerative capacity. We have revised the manuscript to clarify this point.  We have also added the missing significance test in Figure 5F.

      Cell specific KO of Cx43 and ETBR would allow to test this hypothesis directly but is beyond the scope of the current study. We have not tested SGCs-neuron coupling, as these experiments are currently beyond our area of expertise. Cx43 has also other functions beyond gap junction coupling, such as protein interaction, cell adhesion, and intracellular signaling. Investigating the precise function of Cx43 would require in depth biochemical and cell specific experiments that are beyond the scope of this study. Furthermore, as we now mentioned in response to reviewer #2 point 5, ETBR signaling may also have other downstream e]ects in SGCs, such as glutamate transporters expression, or a]ect other cells in the nerve during the regeneration process. We have revised the Discussion section to include these alternative mechanisms.

      Reviewer #3(Public Review): 

      Summary: 

      This manuscript suggests that inhibiting ETBR via the FDA-approved compound Bosentan can disrupt ET-1-ETBR signalling that they found detrimental to nerve regeneration, thus promoting repair after nerve injury in adult and aged mice. 

      Strengths: 

      (1) The clinical need to identify molecular and cellular mechanisms that can be targeted to improve repair after nerve injury. 

      (2) The proposed mechanism is interesting. 

      (3) The methodology is sound. 

      We thank the reviewer for highlighting the strengths of our study

      Weaknesses: 

      (1) The data appear preliminary and the story appears incomplete. 

      We appreciate the reviewer’s point. We would like to emphasize that our results provide compelling evidence that ETBR signaling is a default brake on axon growth, and inhibiting this pathway promotes axon regeneration after nerve injury and counters the decline in regenerative capacity that occurs during aging. We also provide evidence that ETBR signaling regulates the levels of Cx43 in SGCs. Furthermore, our results document the use of an FDA approved compound to increase axon regeneration may be of interest to the broader readership, as there is currently no therapies to improve or accelerate nerve repair after injury. We agree that the detailed mechanisms operating downstream of ETBR will need to be elucidated. Answering these questions would first require cell specific KO of ETBR and Cx43 to confirm that this pathway is operating in SGCs to control axon regeneration. We would also need to identify how SGCs communicate with neurons to regulate axon regeneration, which is a large area of ongoing research that remains poorly understood. This extensive and highly complex set of experiments is beyond the scope of the current study. As we discussed in our response to reviewer #1 and #2 we attempted to perform numerous additional experiments to better define the role of ETBR signaling in SGCs in aging and have included additional results in Fig. 2B, Fig 3G-H,  Fig 5A-E, and Figure 4- Figure Supplement 1and Figure 5- Figure Supplement 1. We have expanded the

      Discussion to acknowledge the limitation of our study and to discuss possible mechanisms.  

      (2) Lack of causality and clear cellular and molecular mechanism. There are also some loose ends such as the role of connexin 43 in SGCs: how is it related to ET-1- ETBR signalling?  

      We thank the reviewer for this point and agree that the molecular mechanisms downstream of ETBR remain to be elucidated. However, we believe that our manuscript reports an interesting potential of an FDA-approved compound in promoting nerve repair. We focused on Cx43 downstream of ETBR signaling because decreased Cx43 expression in SGCs in ageing was previously established, but the mechanisms were not elucidated. Furthermore, it was reported that ET1 signaling in cultured astrocytes, which share functional similarities with SGCs, leads to the closure of gap junctions and reduction in Cx43 expression. Our study thus provides a mechanism by which ETBR signaling in SGCs regulates Cx43 expression. Whether Cx43 directly impact axon regeneration remains to be tested. Cell specific KO of Cx43 and ETBR would be required to answer this question. We have revised the Introduction and Discussion section extensively to provide a link between ETBR and Cx43 and to acknowledge the lack of causality in Cx43 in SGCs, as well as to provide additional potential mechanisms by which ETBR inhibition may promote nerve repair.

      Reviewer #2 (Recommendations For The Authors): 

      In addition to the points listed in the Public Review section, please consider the following comments: 

      (1) ETAR, which is high in mural cells, does not seem to be implicated in the reported proregenerative e]ects. Even so, can vasoconstriction be ruled out as an underlying cause of the age-dependent decline in axon regrowth potential and, more generally, in the e]ects of ET-1 inhibition on regeneration? This could be discussed. 

      We agree that we can’t exclude a role in vasoconstriction or e]ect on vascular permeability in the age-dependent decline in axon regrowth potential. However, our in vitro and ex vivo experiments, in which vascular related mechanisms are unlikely, suggest that vasoconstriction may not be a major contributor to the e]ects we observed.

      (2) The manuscript (e.g. line 287-288) would benefit from a discussion of the role that blood vessels play in the peripheral nervous system, and possibly CNS, repair. Vessels were shown to accompany regenerating fibers and instruct the reorganization of the nerve tissue to favor repair potentially through the release of pro-regenerative signals acting on stromal cells, glia, and other cellular components. Highlighting these processes will help put the current findings into perspective. 

      We agree and have revised the Discussion section to better explain the role of blood vessels in orientating Schwann cells migration and guiding axon regeneration.

      (3) The vast majority of the cells that are sequenced and shown in the UMAP in Figure 1C are from adult (3-month-old) mice [16,923 out of 18,098]. It would be useful to include the UMAP split (or color-coded) by timepoint to appreciate changes in cell clustering that may occur with aging.  

      We apologize for this misunderstanding, Figure 1C had all cells from all ages. However, the number of cells we obtained from the age group was insu]icient to perform in depth analysis of each cell type. We have thus revised this section and Figure 1, now only presenting the data from adult mice.  

      It is not discussed why fewer cells were sequenced at later stages. Additionally, I do not know how to interpret the double asterisks next to the labeling "18,098 samples" in Figure 1C. 

      Since our original sequencing of adult and aged mice using 10x yielded so few cells from the aged DRG, we tested and optimized a new technology for single cell preparation of DRG using Illumina Single Cell 3’ RNA Prep. This preparation creates templated emulsions using a vortex mixer to capture and barcode single-cell mRNA instead of a microfluidics system. This method yielded much better results for nuclei recovery from aged DRG, with more nuclei and better quality of nuclei. Thus, we now present in Figure 5 and Figure 5- Figure Supplement 1 the results from snRNA-sequencing of aged and adult DRG using the Illumina single cell kit. The results of the snRNA-sequencing show a decreased abundance of SGCs in aged mice, consistent with the results from our morphology analysis with EM. We were also able to perform SGCs-specific pathway analysis because of the increased number of nuclei captured in the aged SGCs, which we included in the manuscript.

      (4) The in vivo studies are designed to examine the e]ects of ETBR inhibition during the first phase of axon regrowth after nerve injury (1-3 days post-injury, dpi). Is there a reason why later stages have not been studied? It would be interesting to understand whether ETBR inhibition improves long-term recovery or is only e]ective at boosting the initial growth of axons through the lesion. It is possible that early inhibition will be enough for long-term recovery. If so, these experiments would define a sensitivity window with therapeutic value. 

      We agree that assessing functional recovery requires proper behavioral tests or morphological evaluations of reinnervation. To determine if Bosentan treatment has long-term e]ects on recovery, we administered Bosentan or vehicle for 3 weeks (daily for 1 week, and then once a week for the subsequent 2 weeks) after sciatic nerve crush. At 24 days after SNC, we assessed intraepidermal nerve fiber density (IENFD) in the injured paw and saw a trend towards increased fibers/mm in the treated animals (new Figure 3G,H). Future studies will examine how long-term Bosentan treatment a]ects functional recovery and innervation at later time points. Additionally, behavior assays will be needed to determine if these morphological changes relate to behavioral improvements using IENFD and behavior assays.

      (5) I am unsure if the gene expression analysis shown in Figure 6 fits well into this story. It is interesting per se and in line with previous work from this group showing the relevance of fatty acid metabolism in SGCs for axon regeneration. Nevertheless, without a mechanistic link to endothelin signaling and Cx43/gap junction modulation, the observations derived from DEG analysis are not well integrated with the rest and may be more distracting than helpful. One limitation is that there is no cell-type information for the DEGs due to the small number of cells recovered from aged mice. For instance, if ETBR inhibition rescued gene downregulation associated with fatty acid/cholesterol metabolism, then the DGE results would become more relevant for understanding the cellular basis of the pro-regenerative e]ect, which at this point remains quite speculative (lines 264-265; lines 318-319).  

      We agree and have added new snRNA sequencing data to replace these findings (see above response to point #4, new Figure 5 and Figure 5- Figure Supplement 1. The new data shows a decreased abundance of SGCs in aged mice, consistent with our TEM results. Pathway analysis revealed that aging triggers extensive transcriptional reprogramming in SGCs, reflecting heightened demands for structural integrity, cell junction remodeling, and glia–neuron interactions within the aged DRG microenvironment.  

      (6) It would be interesting to determine whether Bosentan increases SGC coverage of neuronal cell bodies in aged mice (Figures 6A-C). 

      We agree that this would be very interesting, but will require extensive EM analysis at di]erent time points and is beyond the scope of the current manuscript.

      (7) Finally, adding a summary model would help the readers. 

      We agree and have made a summary model, now presented in Figure 6F.

      Reviewer #3 (Recommendations For The Authors): 

      Longer time points post-injury and assessment of functional recovery after Bosentan would be of great value here. 

      We agree that assessing functional recovery requires proper behavioral tests or morphological evaluations of reinnervation. To determine if Bosentan treatment has long-term e]ects on recovery, we administered Bosentan or vehicle for 3 weeks (daily for 1 week, and then once a week for the subsequent 2 weeks) after sciatic nerve crush. At 24 days after SNC, we assessed intraepidermal nerve fiber density in the injured paw and saw a trend towards increased fibers/mm in the treated animals (Fig 3). While the results do not reach significance, we decided to include this new data as it provides evidence that Bosentan treatment may also improves long term recovery. Future studies will be required examine how long-term Bosentan treatment a]ects functional recovery and innervation at later time points. Additionally, behavior assays will be needed to determine if these morphological changes relate to behavioral improvements.

      It would be important to know how ET-1- ETBR signalling axis promotes the regeneration of axons:this remains unaddressed. What are the cells that are specifically involved? Endothelial cellsSGC- neurons- SC? There are no experiments addressing the role of any of these? 

      We agree that the molecular and cellular mechanisms by which ETBR signaling in SGCs promote axon regeneration remains to be elucidated.  Answering these questions would first require cell specific KO of ETBR and Cx43 to confirm that this pathway is operating in SGCs to control axon regeneration. We would also need to identify how SGCs communicate with neurons to regulate axon regeneration, which is a large area of ongoing research that remains poorly understood. While these are important experiments, because of numerous technical and temporal constrains, we believe they are beyond the scope of the current manuscript. 

      How does connexin 43 in SGCs related to ET-1- ETBR signalling? 

      The relation between connexin 43 and ETBR signaling stems from observations made in astrocytes. ET1 signaling in cultured astrocytes, which share functional similarities with SGCs, was shown to lead to the closure of gap junctions and the reduction in Cx43 expression. Because Cx43 expression, a major connexin expressed in SGCs as in astrocytes, was previously shown to be reduced at the protein level in SGCs from aged mice, we decided to explore it this ETBR-Cx43 mechanism also operates in SGCs. We have revised the Introduction and Discussion section extensively to acknowledge the lack of causality in Cx43 expression SGCs and to provide additional potential mechanisms by which ETBR inhibition may promote nerve repair.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript proposes that 5mC modifications to DNA, despite being ancient and widespread throughout life, represent a vulnerability, making cells more susceptible to both chemical alkylation and, of more general importance, reactive oxygen species. Sarkies et al take the innovative approach of introducing enzymatic genome-wide cytosine methylation system (DNA methyltransferases, DNMTs) into E. coli, which normally lacks such a system. They provide compelling evidence that the introduction of DNMTs increases the sensitivity of E. coli to chemical alkylation damage. Surprisingly they also show DNMTs increase the sensitivity to reactive oxygen species and propose that the DNMT generated 5mC presents a target for the reactive oxygen species that is especially damaging to cells. Evidence is presented that DNMT activity directly or indirectly produces reactive oxygen species in vivo, which is an important discovery if correct, though the mechanism for this remains obscure.

      Strengths:

      This work is based on an interesting initial premise, it is well-motivated in the introduction and the manuscript is clearly written. The results themselves are compelling.

      We thank the reviewer for their positive response to our study.  We also really appreciate the thoughtful comments raised.  We have addressed the comments raised as detailed below. 

      Weaknesses:

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specific points below.

      (1) As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently, the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been considered.

      We thank the reviewer for this interesting and insightful suggestion.  Our interpretation of our findings is that a subset of MMS-induced DNA damage, specifically 3mC, overlaps with the damage introduced by DNMTs and this accounts for increased sensitivity to MMS when DNMTs are expressed.  However, the idea that the introduction of 3mC by DNMT actually makes the DNA more liable to damage by MMS, potentially through increasing the level of ssDNA, is also a potential explanation, which could operate in addition to the mechanism that we propose.

      (2) The authors emphasise the non-additivity of the MMS + DNMT + alkB experiment but the interpretation of the result is essentially an additive one: that both MMS and DNMT are introducing similar/same damage and AlkB acts to remove it. The non-additivity noted would seem to be more consistent with the ssDNA model proposed in #1. More generally non-additivity would also be seen if the survival to DNA methylation rate is non-linear over the range of the experiment, for example if there is a threshold effect where some repair process is overwhelmed. The linearity of MMS (and H2O2) exposure to survival could be directly tested with a dilution series of MMS (H2O2).

      We thank the reviewer for this point.  As in the response to point #1, the reviewer’s hypothesis of increased potency of MMS, potentially through increased ssDNA, downstream of 3mC induction by DNMT, is a good one.  We have added a dose-response curve for DNMT-expressing cells to MMS to the revised version of the manuscript.  This shows that there is a non-linear response to MMS in the WT background.  Sensitivity is exacerbated by expression of DNMT and alkB mutation individually but there is also a strong non-additive effect that is particularly marked at low MMS concentrations where sensitivity is much higher in the double mutant than predicted from the two single mutants.  This is consistent with induction of DNA damage by DNMT that is repaired by alkB because alkB can be ‘overwhelmed’ even in WT backgrounds as the reviewer suggests.  However, it is also perfectly possible that the effect is due to increased levels of DNA damage induction in DNMT-expressing cells.  Both these results are compatible with our central hypothesis, namely that DNMT expression induces 3mC.  We have included these results along with discussion of them in the revised text in the results section:

      In order to investigate the non-additivity between DNMT expression and alkB mutation further, we investigated the effect of MMS over a range of concentrations for the different strains (Supplemental Figure 1A).  We quantified the non-additivity by comparing between the survival of alkB expressing DNMT to the predicted combined effect of either alkB mutation alone or DNMT expression alone(Supplemental Figure 1B).  Significantly reduced survival than expected was observed, most notably at low concentrations of MMS, which could be due to the saturation of the effect at high concentrations of MMS for alkB mutants expressing DNMT, where extremely high levels of sensitivity were observed.  The non-linear shape of the graph observed for WT cells expressing DNMTs further suggests that the ability of AlkB to repair the DNA is overwhelmed at high MMS concentrations even in the WT background.  These results are consistent with the idea that AlkB repairs a form of DNA damage from MMS that is more prevalent when DNMT is expressed.  This could be because DNMT induces 3mC, repaired by AlkB, and further 3mC is induced by MMS leading to much higher 3mC levels in the absence of AlkB activity.  Alternatively, 3mC induction by DNMT may lead to increased levels of ssDNA, particularly in alkB mutants, which could increase the risk of further DNA damage by MMS exposure and heighten sensitivity.  Either of these mechanisms are consistent with induction of 3mC by DNMT, and  indicate that the induction of DNA damage by DNMT expression has a fitness cost for cells when exposed to genotoxic stress in their environment. 

      (3) The substantial transcriptional changes induced by DNMT expression (Supplemental Figure 4) are a cause for concern and highlight that the ectopic introduction of methylation into a complex system is potentially more confounded than it may at first seem. Though the expression analysis shows bulk transcription properties, my concern is that the disruptive influence of methylation in a system not evolved with it adds not just consistent transcriptional changes but transcriptional heterogeneity between cells which could influence net survival in a stressed environment. In practice I don't think this can be controlled for, possibly quantified by single-cell RNA-seq but that is beyond the reasonable scope of this paper.

      We fully agree with the reviewer and, indeed, we are very interested in what is driving the transcriptional changes that we observed.  Work is currently underway in the lab to investigate this further but, as the reviewer suggests, is beyond the scope of this paper.  Importantly, we have used the transcriptional data to determine that the effect of DNMTs on ROS is unlikely to be due to failure of ROS-induced detoxification mechanisms by investigating the expression of oxyR regulated genes.  Nevertheless we have explicitly mentioned the concern raised by the reviewer in the revised manuscript as follows:

      “The substantial transcriptional responses could potentially affect how individual cells respond to genotoxic stress and thus could be contributing to some of the excess sensitivity to MMS and H2O2 in cells expressing DNMTs. However, the induction of oxyR regulated genes such as catalase was unaffected by 5mC (Supplementary Figure 4B).  Thus, the increased sensitivity to H2O2 is unlikely to be caused by failure of detoxification gene induction by DNMT expression.”

      (4) Figure 4 represents a striking result. From its current presentation it could be inferred that DNMTs are actively promoting ROS generation from H2O2 and also to a lesser extent in the absence of exogenous H2O2. That would be very surprising and a major finding with far-reaching implications. It would need to be further validated, for example by in vitro reconstitution of the reaction and monitoring ROS production. Rather, I think the authors are proposing that some currently undefined, indirect consequence of DNMT activity promotes ROS generation, especially when exogenous H2O2 is available. It would help if this were clarified.

      We thank the reviewer for picking this up.  In the discussion, we raise two possible explanations for why DNMT (even without H2O2) increases the ROS levels.  One idea is direct activity of DNMT, and one is through the product of DNMT activity (5mC) acting as a platform to generate more ROS from endogenous or exogenous sources.  Whilst we attempted to measure ROS from mSSSI activity in vitro, this experiment gave inconsistent results and therefore we cannot distinguish between these two possibilities.  However, we argued that direct activity is less likely, exactly as the reviewer points out.  We have clarified our discussion in the revised version, rewriting the entire section titled

      Oxidative stress as a new source of DNA damage induction by DNMT expression to more clearly set out these possibilities. 

      Reviewer #2 (Public review):

      5-methylcytosine (5mC) is a key epigenetic mark in DNA and plays a crucial role in regulating gene expression in many eukaryotes including humans. The DNA methyltransferases (DNMTs) that establish and maintain 5mC, are conserved in many species across eukaryotes, including animals, plants, and fungi, mainly in a CpG context. Interestingly, 5mC levels and distributions are quite variable across phylogenies with some species even appearing to have no such DNA methylation.

      This interesting and well-written paper discusses the continuation of some of the authors' work published several years ago. In that previous paper, the laboratory demonstrated that DNA methylation pathways coevolved with DNA repair mechanisms, specifically with the alkylation repair system. Specifically, they discovered that DNMTs can introduce alkylation damage into DNA, specifically in the form of 3-methylcytosine (3mC). (This appears to be an error in the DNMT enzymatic mechanism where the generation 3mC as opposed to its preferred product 5-methylcytosine (5mC), is caused by the flipped target cytosine binding to the active site pocket of the DNMT in an inverted orientation.) The presence of 3mC is potentially toxic and can cause replication stress, which this paper suggests may explain the loss of DNA methylation in different species. They further showed that the ALKB2 enzyme plays a crucial role in repairing this alkylation damage, further emphasizing the link between DNA methylation and DNA repair.

      The co-evolution of DNMTs with DNA repair mechanisms suggests there can be distinct advantages and disadvantages of DNA methylation to different species which might depend on their environmental niche. In environments that expose species to high levels of DNA damage, high levels of 5mC in their genome may be disadvantageous. This present paper sets out to examine the sensitivity of an organism to genotoxic stresses such as alkylation and oxidation agents as the consequence of DNMT activity. Since such a study in eukaryotes would be complicated by DNA methylation controlling gene regulation, these authors cleverly utilize Escherichia coli (E.coli) and incorporate into it the DNMTs from other bacteria that methylate the cytosines of DNA in a CpG context like that observed in eukaryotes; the active sites of these enzymes are very similar to eukaryotic DNMTs and basically utilize the same catalytic mechanism (also this strain of E.coli does not specifically degrade this methylated DNA) .

      The experiments in this paper more than adequately show that E. coli expression of these DNMTs (comparing to the same strain without the DNMTS) do indeed show increased sensitivity to alkylating agents and this sensitivity was even greater than expected when a DNA repair mechanism was inactivated. Moreover, they show that this E. coli expressing this DNMT is more sensitive to oxidizing agents such as H2O2 and has exacerbated sensitivity when a DNA repair glycosylase is inactivated. Both propensities suggest that DNMT activity itself may generate additional genotoxic stress. Intrigued that DNMT expression itself might induce sensitivity to oxidative stress, the experimenters used a fluorescent sensor to show that H2O2 induced reactive oxygen species (ROS) are markedly enhanced with DNMT expression. Importantly, they show that DNMT expression alone gave rise to increased ROS amounts and both H2O2 addition and DNMT expression has greater effect that the linear combination of the two separately. They also carefully checked that the increased sensitivity to H2O2 was not potentially caused by some effect on gene expression of detoxification genes by DNMT expression and activity. Finally, by using mass spectroscopy, they show that DNMT expression led to production of the 5mC oxidation derivatives 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) in DNA. 5fC is a substrate for base excision repair while 5hmC is not; more 5fC was observed. Introduction of non-bacterial enzymes that produce 5hmC and 5fC into the DNMT expressing bacteria again showed a greater sensitivity than expected. Remarkedly, in their assay with addition of H2O2, bacteria showed no growth with this dual expression of DNMT and these enzymes.

      Overall, the authors conduct well thought-out and simple experiments to show that a disadvantageous consequence of DNMT expression leading to 5mC in DNA is increased sensitivity to oxidative stress as well as alkylating agents.

      Again, the paper is well-written and organized. The hypotheses are well-examined by simple experiments. The results are interesting and can impact many scientific areas such as our understanding of evolutionary pressures on an organism by environment to impacting our understanding about how environment of a malignant cell in the human body may lead to cancer.

      We thank the reviewer for their response to our study, and value the time taken to produce a public review that will aid readers in understanding the key results of our study. 

      Reviewer #3 (Public review):

      Summary:

      Krwawicz et al., present evidence that expression of DNMTs in E. coli results in (1) introduction of alkylation damage that is repaired by AlkB; (2) confers hypersensitivity to alkylating agents such as MMS (and exacerbated by loss of AlkB); (3) confers hypersensitivity to oxidative stress (H2O2 exposure); (4) results in a modest increase in ROS in the absence of exogenous H2O2 exposure; and (5) results in the production of oxidation products of 5mC, namely 5hmC and 5fC, leading to cellular toxicity. The findings reported here have interesting implications for the concept that such genotoxic and potentially mutagenic consequences of DNMT expression (resulting in 5mC) could be selectively disadvantageous for certain organisms. The other aspect of this work which is important for understanding the biological endpoints of genotoxic stress is the notion that DNA damage per se somehow induces elevated levels of ROS.

      Strengths:

      The manuscript is well-written, and the experiments have been carefully executed providing data that support the authors' proposed model presented in Fig. 7 (Discussion, sources of DNA damage due to DNMT expression).

      Weaknesses:

      (1) The authors have established an informative system relying on expression of DNMTs to gauge the effects of such expression and subsequent induction of 3mC and 5mC on cell survival and sensitivity to an alkylating agent (MMS) and exogenous oxidative stress (H2O2 exposure). The authors state (p4) that Fig. 2 shows that "Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to WT C2523, supporting the conclusion that the expression of DNMTs increased the levels of alkylation damage." This is a confusing statement and requires revision as Fig. 2 does ALL cells shown in Fig. 2 are expressing DNMTs and have been treated with MMS. It is the absence of AlkB and the expression of DNMTs that that causes the MMS sensitivity.

      We thank the reviewer for this and agree that this needs to be clarified with regards to the figure presented and will do so in the revised manuscript. The key comparison is between the active and inactive mSSSI which shows increased sensitivity when active methyltransferases are expressed.  We have clarified this in the revised version of the manuscript as follows:

      “Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to cells expressing inactive M.SssI”

      (2) It would be important to know whether the increased sensitivity (toxicity) to DNMT expression and MMS is also accompanied by substantial increases in mutagenicity. The authors should explain in the text why mutation frequencies were not also measured in these experiments.

      This is an important point because it is not immediately obvious that increased sensitivity would be associated with increased mutagenicity (if, for example, 3mC was never a cause of innacurate DNA repair even in the absence of AlkB).  We have now added a Rif resistance assay which demonstrates increased mutagenesis in the presence of DNMT, and that this is exacerbated by loss of AlkB. This is now added as supplemental figure 2 and described in the manuscript as follows:

      “One potential consequence of DNMT activity in inducing DNA damage might be increased mutagenesis.  To test this we performed a rifampicin resistance mutagenesis assay, in the absence of MMS, to test whether DNMT induced damage was sufficient to lead to mutation rate increase.  Mutation rate was increased by DNMT expression (p=1.6e-12; two way anova; Supplemental Figure 2) and alkB mutation (two way anova) separately (p<1e-16).  Moreover, there was a significant interaction such that combined alkB mutation and DNMT expression led to a further increased mutation rate compared to the expectation from alkB mutation and DNMT expression separately (p = 7.9e-10; Supplemental Figure 2).  Importantly, DNMT induction alone would be expected to lead to increased mutations due to cytosine deamination(Sarkies, 2022a); however, there is a synergistic effect on mutations when this is combined with loss of AlkB function in alkB mutants. This is consistent with 3mC induction by DNMTs which is repaired by AlkB in WT cells but leads to mutations in alkB mutant cells.

      (3) Materials and Methods. ROS production monitoring. The "Total Reactive Oxygen Species (ROS) Assay Kit" has not been adequately described. Who is the Vendor? What is the nature of the ROS probes employed in this assay? Which specific ROS correspond to "total ROS"?

      The ROS measurement was with a kit from ThermoFisher: https://www.thermofisher.com/order/catalog/product/88-5930-74.  The probe is DCFH-DA.  This is a general ROS sensor that is oxidised by a large number of cellular reactive oxygen species hence we cannot attribute the signal to a single species.  Use of a technique with the potential to more precisely identify the species involved is something we plan to do in future, but is beyond what we can do as part of this study.  We have added a comment as to the specificity of the ROS sensor in the revised version as follows:

      “The ROS detection reagent in this system is DCFH-DA, a generalised ROS sensor that is not specific to any particular ROS molecule.”     

      (4) The demonstration (Fig. 4) that DNMT expression results in elevated ROS and its further synergistic increase when cells are also exposed to H2O2 is the basis for the authors' discussion of DNA damage-induced increases in cellular ROS. S. cerevisiae does not possess DNMTs/5mC, yet exposure to MMS also results in substantial increases in intracellular ROS (Rowe et al, (2008) Free Rad. Biol. Med. 45:1167-1177. PMC2643028). The authors should be aware of previous studies that have linked DNA damage to intracellular increases in ROS in other organisms and should comment on this in the text.

      We thank the reviewer for this point.  We note that the increased ROS that we observed occur in the presence of DNMTs alone and in the presence of H2O2, not in the presence of MMS; however, the point that DNA damage in general can promote increased ROS in some circumstances is well taken.  We have included a comment on this in the revised version as follows:

      “We believe this is a plausible mechanism to explain both increased ROS and increased sensitivity to oxidative stress when DNMT is expressed.  However, other explanations are possible, and it is notable that DNA damaging agents such as MMS can lead to ROS generation(Rowe et al., 2008).  A more detailed chemical and kinetic study of the ROS formation in DNMT-expressing cells would be needed to resolve these questions.”

    1. This op-ed addresses the issue with the exponential increase in publications and how this is leading to a lower quality of peer review which, in turn, is resulting in more bad science being published. It is a well-written article that tackles a seemingly eternal topic. This piece focussed more on the positives and potential actions which is nice to see as this is a topic that can become stuck in the problems. There are places throughout that would benefit from more clarity and at times there appears to be a bias towards publishers, almost placing blame on researchers. Very simple word changes or headings could immediately resolve any doubt here as I don't believe this is the intention of the article at all.

      Additionally, this article is very focussed on peer review (a positive) but I think that it would benefit from small additions throughout that zoom out from this and place the discussion in the context of the wider issues - for example you cannot change peer review incentives without changing the entire incentives around "service" activities including teaching, admin etc. This occurs to a degree with the discussion on other outputs, including preprints and data. Moreover, when discussing service type activities, there is data that reveals certain demographics deliberately avoid this work. Adding this element into the article would provide a much stronger argument for change (and do some good in the new current political climate).

      Overall, I thought this was a great piece when it was first posted online and does exactly what a good op-ed should - provoke thought and discussion. Below are some specific comments, in reading order. I do not believe that there are any substantial or essential changes required, particularly given that this is an op-ed article.

      -----

      Quote: "Academia is undergoing a rapid transformation characterized by exponential growth of scholarly outputs."

      Comment: There's an excellent paper providing evidence to this: https://direct.mit.edu/qss/article/5/4/823/124269/The-strain-on-scientific-publishing which would be a very positive addition

      Quote: "it’s challenging to keep up with the volume at which research publications are produced"

      Comment: Might be nice to add that this was a complaint dating back since almost the beginning of sharing research via print media, just to reinforce that this is a very old point.

      Quote: "submissions of poor-quality manuscripts"

      Comment: The use of "poor quality" here is unnecessary. Just because a submission is not accepted, it has no reflection on "quality". As such this does seem to needlessly diminish work rejected by one journal

      Quote: "Maybe there are too many poor quality journals too - responding to an underlying demand to publish low quality papers."

      Comment: This misses the flip side - poor quality journals encourage and actively drive low quality & outright fraudulent submissions due to the publisher dominance in the assessment of research and academics.

      Quote: "even after accounting for quality,"

      Comment: Quality is mentioned here but has yet to be clearly defined. What is "quality"? - how many articles a journal publishes? The "prestige" of a journal? How many people are citing the articles?

      Quote: "Researchers can – and do – respond to the availability by slicing up their work (and their data) into minimally publishable units"

      Comment: I fully agree that some researchers do exactly this. However, again, this seems to be blaming researchers for creating this firehose problem. I think this point could be reworded to not place so much blame or be substantiated with evidence that this is a widespread practice - my experience has been very mixed in that I've worked for people who do this almost to the extreme (and have very high self-citations) and also worked for people who focus on the science and making it as high quality and robust as possible. I agree many respond to the explosion of journals and varied quality in a negative manner but the journals, not researchers are the drivers here.

      Quote: "least important aspect of the expected contributions of scholars."

      Comment: I think it may be worth highlighting here that sometimes specific demographics (white males) actively avoid these kinds of service activities - there's a good study on this providing data in support of this. It adds an extra dimension into the argument for appropriate incentives and the importance & challenges of addressing this.

      Quote: "high quality peer review"

      Comment: Just another comment on the use of "quality'. This is not defined and I think when discussing these topics it is vital to be clear what one means by "high quality". For example, a high quality peer review that is designed as quality control would be detecting gross defects and fraud, preventing such work from being published (peer review does not reliably achieve this). In contrast, a high quality peer review designed to help authors improve their work and avoid hyperbole would be very detailed and collegial, not requesting large numbers of additional experiments.

      Quote: "conferring public trust in the oversight of science"

      Comment: I'm not convinced of this. Conveying peer review as a stamp of approval or QC leads to reduced trust when regular examples emerge with peer review failures - just look at Hydroxychloroquine and how peer review was used to justify that during COVID or the MMR/autism issues that are still on-going even after the work was retracted. I think this should be much more carefully worded, removed or expanded on to provide this perspective - this occurs slightly in the following sentence but it is very important to be clear on this point.

      Quote: "Researchers hold an incredible amount of market power in scholarly publishing"

      Comment: I like the next few paragraphs but, again, this seems to be blaming researchers when they in fact hold no/little power. I agree that researchers *could* use market pressure but this is entirely unrealistic when their careers depend on publishing X papers in X journal. An argument as to why science feels increasingly non-collaborative perhaps. Funders can have immediate and significant changes. Institutions adopting reward structures, such as teaching for example, would have significant impacts on researcher behaviour. Researchers are adapting to the demands the publication system creates - more journals, greater quantity and reduced quality whilst maintaining control over the assessment - eLife being removed from Wos/Scopus is a prime example of publishers (via their parent companies) preventing innovation or even rather basic improvements.

      Quote: "With preprint review, authors participate in a system that views peer review not as a gatekeeping hurdle to overcome to reach publication but as a participatory exercise to improve scholarship."

      Comment: This is framing that I really like; improving scholarship, not quality control.

      Quote: "buy"

      Comment: typo

      Quote: "adoption of preprint review can shift the inaccurate belief that all preprints lack review"

      Comment: Is this the right direction for preprints though? If we force all preprints to be reviewed and only value reviewed-preprints, then we effectively dismantle the benefits of preprints and their potential that we've been working so hard to build. A recent op-ed by Alice Fleerackers et al provided an excellent argument to this effect. More a question than a suggestion for anything to change.

      Quote: "between all of those stakeholders to work together without polarization"

      Comment: I disagree here - publishers have repeatedly shown that their only real interest is money. Working with them risks undermining all of the effort (financial, careers, reputation, time) that advocates for change put in. The OA movement should also highlight perfectly why this is such a bad route to go down (again). Publishers grip on preprint servers is a great example - those servers are hard to use as a reader, lack APIs and access to data, are not innovative or interacting with independent services. The community should make the rules and then publishers abide by and within them. Currently the publishers make all of the rules and dominate. Indeed, this is possibly the biggest ommision from this article - the total dominance of publishers across the entire ecosystem. You can't talk about change without highlighting that the publishers don't just own journals but the reference managers, the assessment systems, the databases etc. I may be an outlier on this point but for all of the people I interact with (often those at the bottom of the ladder) this is a strong feeling. Again, not a suggestion for anything to change and indeed the point of an op-ed is to stimulate thought and discussion so dissent is positive.

      Note that these annotations were made in hypothes.is and are available here, linked in-text for ease - comments are duplicated in this review.

    2. Summary of the essay

      In this essay, the author seeks to explain the ‘firehose’ problem in academic research, namely the rapid growth in the number of articles but also the seemingly concurrent decline in quality. The explanation, he concludes, lies in the ‘superstructure’ of misaligned incentives and feedback loops that primarily drive publisher and researcher behaviour, with the current publish or perish evaluation system at the core. On the publisher side, these include commercial incentives driving both higher acceptance rates in existing journals and the launch of new journals with higher acceptance rates. At the same time, publishers seek to retain reputational currency by maintaining consistency and therefore brand power of scarcer, legacy-prestige journals. The emergence of journal cascades (automatic referrals from one journal to another journal within the same publisher) and the introduction of APCs (especially for special issues) also contribute to commercial incentives driving article growth. On the researcher side, he argues that there is an apparent demand from researchers for more publishing outlets and simultaneous salami slicing by researchers because authors feel they have to distribute relatively more publications among journals that are perceived to be of lower quality (higher acceptance rates) in order to gain equivalent prestige to that of a higher impact paper. The state of peer review also impacts the firehose. The drain of PhD qualified scientists out of academia, compounded by a lack of recognition for peer review, further contributes to the firehose problem because there are insufficient reviewers in the system, especially for legitimate journals. Moreover, what peer review is done is no guarantee of quality (in highly selective journals as well as ‘predatory’). One of his conclusions is that there is not just a crisis in scholarly publishing but in peer review specifically and it is this crisis that will undermine science the most. Add AI into the mix of this publish or perish culture, and he predicts the firehose will burst.

      He suggests that the solution lies in researchers taking back power themselves by writing more but ‘publishing’ less. By writing more he means outputs beyond traditional journal publications such as policy briefs, blogs, preprints, data, code and so on, and that these should count as much as peer-reviewed publications. He places special emphasis on the potential role of preprints and on open and more collegiate preprint review acting as a filter upstream of the publishing firehouse. He ends with a call for more collegiality across all stakeholders to align the incentives and thus alleviate the pressure causing the firehose in the first place.

      General Comment

      I enjoyed reading the essay and think the author does a good job of exposing multiple incentives and competing interests in the system. Although discussion of perverse incentives has been raised in many articles and blog posts, the author specifically focuses on some of the key commercial drivers impacting publishing and the responses of researchers to those drivers. I found the essay compellingly written and thought provoking although it took me a while to work through the various layers of incentives.  In general, I agree with the incentives and drivers he has identified and especially his call for stakeholders to avoid polarization and work together to repair the system. Although I appreciate the need to have a focused argument I did miss a more in-depth discussion about the equally complex layers of incentives for institutions, funders and other organisations (such as Clarivate) that also feed the firehose.

      I note that my perspective comes from a position of being deeply embedded in publishing for most of my career. This will have also impacted what I took away from the essay and the focus of my comments below.

      Main comments

      1. I especially liked the idea of a ‘superstructure’ of incentives as I think that gives a sense of the size and complexity of the problem. At the same time, by focusing on publisher incentives and researchers’ response to them he has missed out important parts of the superstructure contributing to the firehose, namely the role of institutions and funders in the system. Although this is implicit, I think it would have been worth noting more, in particular:

        • He mentions institutions and the role of tenure and promotion towards the end but not the extent of the immense and immobilizing power this wields across the system (despite initiatives such as DORA and CoARA).

        • Most review panels (researchers) assessing grants for funders are also still using journal publications as a proxy for quality, even if the funder policy states journal name and rank should not be used

        • Many Institutions/Universities still rely on number and venue of publications. Although some notable institutions are moving away from this, the impact factor/journal rank is still largely relied on. This seems especially the case in China and India for example, which has shown a huge growth in research output. Although the author discusses the firehose, it would have been interesting to see a regional breakdown of this.

        • Libraries also often negotiate with publishers based on volume of articles – i.e they want evidence that they are getting more articles as they renegotiate a specific contract (e.g. Transformative agreements), rather than e.g. also considering the quality of service.

        • Institutions are also driven by rankings in a parallel way to researchers being assessed based on journal rank (or impact factor). How University Rankings are calculated is also often opaque (apart from the Leiden rankings) but publications form a core part. This further incentivises institutions to select researchers/faculty based on the number and venue of their publications in order to promote their own position in the rankings (and obtain funding)

      2. The essay is also about power dynamics and where power in the system lies. The implication in the essay is that power lies with the publishers and this can be taken back by researchers. Publishers do have power, especially those in possession of high prestige journals and yet publishers are also subject to the power of other parts of the system, such as funder and institutional evaluation policies. Crucially, other infrastructure organisations, such as Clarivate, that provide indexing services and citation metrics also exert a strong controlling force on the system, for example:

        • Only a subset of journals are ever indexed by Clarivate. And funders and Institutions also use the indexing status of a journal as a proxy of quality. A huge number of journals are thus excluded from the evaluation system (primarily in the arts and humanities but also many scholar-led journals from low and middle income countries and also new journals). This further exacerbates the firehose problem because researchers often target only indexed journals. I’d be interested to see if the firehose problem also exists in journals that are not traditionally indexed (although appreciate this is also likely to be skewed by discipline)

        • Indexers also take on the role of arbiters of journal quality and can choose to delist or list journals accordingly. Listing or delisting has a huge impact on the submission rates to journals that can be worth millions of dollars to a publisher, but it is often unclear how quality is assessed and there seems to be a large variance in who gets listed or not.

        • Clarivate are also paid large fees by publishers to use their products, which creates a potential conflict of interest for the indexer as delisting journals from major publishers could potentially cause a substantial loss of revenue if they withdraw their fees. Also Clarivate relies on publishers to create the journals on which their products are based which may also create a conflict if Clarivate wishes to retain the in-principle support of those publishers.

        • The delisting of elife recently, even though it is an innovator and of established quality, shows the precariousness of journal indexing.

      3. All the stakeholders in the system seem to be essentially ‘following the money’ in one way or another – it’s just that the currency for researchers, institutions, publishers and others varies. Publishers – both commercial and indeed most not-for profit -  follow the requirements of the majority of their ‘customers’  (and that’s what authors, institutions, subscribers etc are in this system) in order to ensure both sustainability and revenue growth. This may be a legacy of the commercialisation of research in the 20th Century but we should not be surprised that growth is a key objective for any company. It is likely that commercial players will continue to play an important role in science and science communication; what needs to be changed are the requirements of the customers.

      4. The root of the problem, as the author notes, is what is valued in the system, which is still largely journal publications. The author’s solution is for researchers to write more – and for value to be placed on this greater range of outputs by all stakeholders. I agree with this sentiment – I am an ardent advocate for Open Science. And yet, I also think the focus on outputs per se and not practice or services is always going to lead to the system being gamed in some way in order to increase the net worth of a specific actor in the system. Preprints and preprint review itself could be subject to such gaming if value is placed on e.g. the preprint server or the preprint-review platform as a proxy of preprint and then researcher quality.

      5. I think the only way to start to change the system is to start placing much more value on both the practices of researchers (as well as outputs) and on the services provided by publishers. Of course saying this is much easier than implementing it.

      Other comments

      1. A key argument is that higher acceptance rates actually create a perverse incentive for researchers to submit as many manuscripts as possible because they are more likely to get accepted in journals with higher acceptance rates. I disagree that higher acceptance rates per se are the main incentive for researchers to publish more. More powerful is the fact that those responsible for grants and promotion continue to use quantity of journal articles as a proxy for research quality.

      2. Higher acceptance rates are not necessarily an indicator of low quality or a bad thing if it means that null, negative and inconclusive results are also published

      3. The author states that Journal Impact Factors might have been an effective measure of quality in the past.  I take issue with this because the JIF has, as far as I know, always been driven by relatively few outliers (papers with very high citations) and I don’t know of evidence to show that this wasn’t also true in the past. It also makes the assumption that citations = quality.

      4. The author asks at one point “Why would field specialization need a lower threshold for publication if the merits of peer review are constant? ” I can see a case for lower thresholds, however, when the purpose of peer review is primarily to select for high impact, rather than rigour, of the science conducted. A similar case might be made for multidisciplinary research, where peer reviewers tend to assess an article from their discipline’s perspective and reject it because the part that is relevant to them is not interesting enough… Of course, this all points to the inherent problems with peer review (with which I agree with the author)

      5. The author puts his essay in appropriate context, drawing on a range of sources to support his argument. I particularly like that he tried to find source material that was openly available.

      6. He cites 2 papers by Bjoern Brembs to substantiate the claim that there is potentially poorer review in higher prestige journals than in lower ranked journals. These papers were published in 2013 and 2018 and the conclusions relied, in part, on the fact that higher ranked journals had more retractions. Apart from a potential reporting bias, given the flood of retractions across multiple journals in more recent years, I doubt this correlation now exists?

      7. The author works out submission rates from the published acceptance rates of journals. The author acknowledges this is only approximate and discusses several factors that could inflate or deflate it. I can add a few more variables that could impact the estimate, including: 1) the number of articles a publisher/journal rejects before articles are assigned to any editor (e.g. because of plagiarism, reporting issues or other research integrity issues), 2) the extent to which articles are triaged and rejected by editors before peer review (e.g. because it is out of scope or not sufficiently interesting to peer review); the number of articles rejected after peer review;  and 4) the extent to which authors independently withdraw an article at any stage of the process. When publishers publish acceptance rates, they don’t make it clear what goes into the numerator or the denominator and there are no community standards around this. The author rightly notes this process is too opaque.

      Catriona J. MacCallum

      As is my practice, I do not wish to remain anonymous. Please also note that I work for a large commercial publisher and am writing this review in an independent capacity such that this review reflects my own opinion, which are not necessarily those of my employer.

    3. This is a well written and clear enough piece that may be helpful for a reader new to the topic. To people familiar with the field there is not so much which is new here. The final recommendation is not well expressed. As currently put it is, I think, wrong. But it is a provocative idea. I comment section by section below.

      The first paragraphs repeat well established facts that there are too many papers. Seppelt et al’s contribution is missing here. It also reproduces the disengenuous claim, by a publisher’s employee, that publishers ‘only’ respond to demand. I do not think that is true. They create demand. They encourage authors to write and submit papers, as anyone who has been emailed by MDPI recently can testify. Why repeat something which is so inaccuate?

      The section on ‘upstream of the nozzle’ is rather confusing. I think the author is trying to establish if more work is being submitted. But this cannot be deduced from the data presented. No trends are given. Rejection rates will be a poor guide if the same paper is being rejected by several journals. I was also confused by the sources used to track growth in papers – why not just use Dimensions data? The final paragraph again repeats well known facts about the proliferation of outlets and salami slicing. Thus far the article has not introduced new arguments.

      Minor points in this section:

      • there are some unsupported claims. Eg ‘This is a practice that is often couched within the seemingly innocuous guise of field specialty journals.’

      • I also do not understand the logic of this rather long sentence: ‘The expansion of journals with higher acceptance rates alters the rational calculus for researchers - all things being equal higher acceptance rates create a perverse incentive to submit as many manuscripts as possible since the underlying probability of acceptance is simply higher than if those same publications were submitted to a journal with a lower acceptance rate, and hence higher prestige.’ I suggest it be rephrased

      The section on peer review (Who’s testing the water) is mostly a useful review of the issues. But there are some problems which need addressing. Bizarrely, when discussing whether there enough scientists, it fails to mention Hanson et al’s global study, despite linking to it’s preprint in the opening lines. Instead the author adopts a parochial North American approach and refers only to PhDs coming from the US. It is not adequate to take trends in one country to cannot explain an international publishing scene. These are not the ‘good data’ the author claims. Likewise the value of data on doctorates not going onto a post-doc hinges on how many post-docs there are. That trend is not supplied. This statement ‘Almost everyone getting a doctorate goes into a non-university position after graduation’ may be true, but no supporting data are supplied to justify it. Nor do we know what country, or countries, the author is referring to.

      The section ‘A Sip from the Spring’ makes the mistaken claim that researchers hold market power. This is not true. Researchers institutions, their libraries and governments are the main source of publisher income. It is here that the key proposal for improvement is made: researcher can write more and publish less. But if the problem is that there is too much poorly reviewed literature then this cannot be the solution. Removing all peer review, would mean there is even more material to read whose appearance is not slowed up by peer review at all. If peer review is becoming inadequate, evading it entirely is hardly a solution.

      This does not mean we should not release pre-prints. The author is right to advocate them, but the author is mistaken to think that this will reduce publishing pressures. The clue is in their name ‘pre-print’. Publication is intended.

      Missing from the author’s argument is recognition of the important role that communities of researchers form, and the roles that journals play in providing venues for conversation, disagreement and disucssion. They provide a filter. Yes researchers produce other material than publications as the author states: ‘grant proposals, editorials, policy briefs, blog posts, teaching curricula and lectures, software code and documentation, dataset curation, and labnotes and codebooks.’ I would add email and whatsapp messages to that list. But adding all that to our reading lists will not reduce the volume of things to be read. It must increase it. And it would make it harder to marshall and search all those words.

      But the idea is provocative nonetheless. Running through this paper, and occasionally made explicit, is the fact that publishers earn billions from their ‘service’ to academia. They have a strong commercial interest in our publishing more, and in competing with each other to produce a larger share of the market. If writing more, and publishing less, means we need to find ways of directing our thoughts so that they earn less money for publishers, then that could bring real change to the system.

      A minor point: the fire hose analogy is fully exploited and rather laboured in this paper. But it is a North American term and image, that does not travel so easily.

    4. A few months back, Upstream editor Martin Fenner suggested that I submit my Upstream blog post titled, Drinking from the Firehose? Write More and Publish Less, for peer-review as a sort of experiment for Upstream through MetaROR. MetaROR, a relative newcomer to the scholarly communication community, provides the review and curate steps in the "publish-review-curate" model for meta-research.

      While I do not consider myself a meta-researcher (scholars who conduct research on research) many of my positions on science policy have implications on the field (especially, those on transparency, openness, and reproducibility). I think the main call in my blog post for reform in scholarly communication – namely, to stop publishing in traditional journals as much and start rewarding a broader swath of scholarly activities like data sharing – is particularly appealing to meta-researchers who rely on non-publication outputs for their work. So, I submitted. The article was openly reviewed, and MetaROR provided an editorial assessment. Here, I reply to the reviewers and contribute to the curation of the original post.

      The reviews are very high-quality - in fact, they are some of the most well-reasoned reviews I've received in the 20 years I've been a scholar. If MetaROR represents the future of peer-review through the publish-review-curate model, scholarly communication is about to get a whole lot better. You can read the open reviews of my blog post here. The revised version of the editorial is here.

      Like traditional peer-review, each individual reviewer provided their feedback independently of the others and the handling editor did not curate the reviews. I prefer when editors do such curation since it helps to organize the response in a way that reduces redundancy. This is one of the main benefits of the group-based peer review systems - such as PREreview's Live Review. Also, there was no easy way (or at least not an obvious one) to export the reviews in plaintext from MetaROR so I could respond point-by-point in software of my choice. Below is an attempt to organize my response roughly around the major criticisms and suggestions in the review. Because this was an opinion piece and not research, I'm not going to respond to every point anyway – nearly all of which I would accept and revise accordingly had this been a research article.

      Too Easy on the Publishers, Too Hard on Researchers

      All three reviewers expressed some dismay over how light my criticism of the publishers was in my blog piece. I do not disagree. The reviewers rightfully point out that the publishers play outsized role in the inequity created in the scholarly communication space. However, I am choosing not to revise here much as the essay was already too long - it would have taken a tome to articulate my criticism of the publishers. That's out of scope. However, I revised the first paragraph in the conclusion to state:

      The publishers are incentivized to avoid any other form of reform - this is the rational option that publishers choose in response to the apparent demand from researchers - as Ciavarella rightly pointed out.

      Two of the reviewers also thought I was too harsh on researchers. I don't think that I was overly harsh. All three agree with me that researchers have some market role here but disagree with the extent to which they can exert influence. One reviewer claims researchers have no market power (to which I respectfully disagree). I've clarified in the paper that: 'the power any individual researcher has here is small. Collective action is needed.' I reject that researchers are blameless for the status quo - complacency empowers the publishers. Unfortunately, it's also baked into the superstructure of the reward system that is perpetuated by publisher-controlled market forces. I also added the following sentiment along these lines when discussing market-power of researchers:

      It's free to share and read research without the need for costly, anticompetitive gatekeeping. Leveraging that freedom is an untapped source of market power.

      Focus More on Institutions and Funders and Communities

      Two of the three reviewers thought I needed to draw more attention to the roles, demands, and influence that academic institutions, publisher consortia, libraries, indexing services, scholarly societies, and grassroots research organizations have in this ecosystem. I agree with all these points - and had Clarivate's irresponsible delisting of eLife in the Web of Science happened before I wrote the original piece, I would have highlighted that as one reviewer suggested.

      No New Arguments or Analysis

      The reviewers felt that, while well-articulated, the arguments I was espousing are not novel. First, I think it is worthy to renew the idea that we should be more selective in choosing what to publish in journals. Focusing on quality over quantity and valuing activities beyond journal publications should be repeated often until it's common practice.

      One comment called for more data and analysis, and another wanted some additional research cited. I think that's a great idea and I hope the reviewers can do that work or perhaps the open review will inspire others to do so.

      In response to the criticism that preprints themselves both presuppose an eventual traditional publication and that they could be gamed, I revised that section accordingly:

      There is risk of gaming preprints and preprint review just as there is in traditional publishing, such as by placing value on a paper for where it appears or how it was reviewed without considering its quality or contribution to science.

      One reviewer misunderstood my point about preprints altogether:

      Removing all peer review, would mean there is even more material to read whose appearance is not slowed up by peer review at all. If peer review is becoming inadequate, evading it entirely is hardly a solution. This does not mean we should not release pre-prints. The author is right to advocate them, but the author is mistaken to think that this will reduce publishing pressures. The clue is in their name ‘pre-print’. Publication is intended.

      I am absolutely not arguing for tossing out peer review. I strongly believe peer review is valuable but currently broken. Moreover, I reject that peer review needs to happen behind the gatekeeping of publishers. I revised to clarify here and added a footnote based on this reviewer's latter observation.

      Peer-review remains a critical check for pollutants in the waters - but the prevailing model needs significant reform. The traditional opaque, uncompensated system has eroded the quality, transparency, timeliness, and appropriateness of peer review due to competing priorities and a lack of appropriately aligned incentive structures. Novel models of peer review including, publish-review-curate and preprint review, and compensated review - ideally all done transparently and with conflicts of interest declared out in the open. At the same time, not all manuscripts need review to have value and most preprints with value (even those with reviews) should not be published in journals.

      New footnote: The term 'preprint' is evolving - what was once a moniker for non-peer reviewed manuscript intended to eventually become reviewed and published (or more likely, rejected) now scopes-in other forms including publish-curate-review and manuscripts with preprint reviews. A new labeling and metadata system is desperately needed to highlight the state of review of a particular manuscript in a record of versions. Version control systems and badging are ubiquitous in the open-source software community and could be easily adopted here.

      Volume is Volume is Volume

      Probably the most important critique among the set of reviews points out an apparent recursion in the logic of the thesis that I need to clarify: you can't solve the firehose problem by writing more, as that just adds more volume to the flow. My revision to the conclusion clarifies my intent: what I'm proposing is to stop sending so many papers to journals for publication and to choose preprints more often for reading, reviewing, and writing. At the same time the system should, maintain or increase non-publication scholarly outputs and reward those too.

      "Write-More" here is a placeholder for all the non-publication writing scholars do and should get credit for from their institutions and fields. Again, I happen to focus on writing because that's what I care about in this editorial and it would take volumes to pontificate on all the other services and activities that happen within the academy that are not properly rewarded.

      Summary

      Having my blog post peer-reviewed through MetaROR was a positive experience and I recommend the service. However, my post was still just an editorial – my opinions and thoughts – not research. Had this been a research article, however, the reviews as presented would have been a very good roadmap to improving the paper. For MetaROR, I have two suggestions: 1) the editorial assessment could be improved by organizing the key points and 2) create a way to have all reviews downloadable in plaintext for ease of importing into an editor.

      Acknowledgments

      Special thanks are owed to the reviewers, Catriona MacCallum, Dan Brockington, and Jonny Coates, the MetaROR handling editor Ludo Waltman, and to Upstream Editor and Front Matter founder Martin Fenner for the crazy idea to peer-review a blog post.

      Disclosure

      The opinions expressed here are my own and may not represent those of my employer, my associates, or the reviewers. I have no conflicts of interest to disclose.

      This author response was previously published on Upstream.

    1. Author response:

      Evidence reducibility and clarity

      Reviewer 1:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, were both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulosclerosis over several months. Because of concerns about incomplete KO, the authors generated podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism, the authors performed global proteomics and find that spliceosome proteins are downregulated. They confirm this by using long-range sequencing. These results suggest a novel role for these pathways in podocytes.

      Thank you

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling are linked to the spiceosome is not addresed.

      We do not think the paper is descriptive as we used non-biased phospho and total proteomics in the DKO cells to uncover the alterations in the spliceosome (that have not been previously described) that were detrimental. However, we are happy to look further into the underlying mechanism.

      We would propose:

      (1) Stimulating/inhibiting insulin/IGF signalling pathways in the Wild-type and DKO knockout cells and check expression levels and/or phosphorylation status of splice factors (including those in Figure 3E) and those revealed by phospho-proteomic data; a variety of inhibitors of insulin/IGF1 pathways could also be used along the pathways that are shown in Fig 2.

      (2) Looking at the RNaseq data bioinformatically in more detail – the introns/exons that move up or down are targets of the splice factors involved; most splice factors binding sequences are known, so it should be possible to ask bioinformatically – from the sequences around the splice sites of the exons and introns that move in the DKO, which splice factors binding sites are seen most frequently? To uncover splice factors/RNA-binding proteins (RBPs) that are involved in the insulin signaling we will use a software named MATT which was specifically designed to look for RNA-binding motifs (PMID 30010778). In brief, using the long-sequencing data, we will test 250 nt sequences flanking the splice sites of all regulated splicing events (intronic and exonic) against all RNA- binding proteins in the CISBP-RNA database (PMID 23846655) using MATT. This will result in a list of RBPs potentially involved in the insulin signaling. We will validate these by activating insulin signaling (similar to Figures 2 B,C) and probe whether the RBPs are activated (e.g. phosphorylated or change in expression) or we will manipulate expression of the candidate RBPs and measure how they affect the insulin signaling.

      (3) Examining the phospho and total proteomic data for IGF1R and Insulin receptor knockout alone podocytes (which we have already generated) and analysing these in more detail and include this data set to elucidate the relative importance of both receptors to spliceosome function.

      The phenotype of the mouse is only superficially addressed. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness.

      We apologise for not making clear but we did assess the level of receptor knockdown in the animal and cell models.  The in vivo model showed variable and non-complete levels of insulin receptor and IGF1 receptor podocyte knock down (shown in supplementary figure 1B). This is why we made the in vitro  floxed podocyte cell lines in which we could robustly knockdown both the insulin receptor and IGF1 receptor (shown in Figure 2A)

      The mouse experiments would be improved if the serum creatinines were measured to provide some idea how severe the kidney injury is.

      We can address this:

      We have further urinary Albumin:creatinine ratio (uACR) data at 12, 16 and 20 weeks. We also have more blood tests of renal function that can be added. There is variability in creatinine levels which is not uncommon in transgenic mouse models (probably partly due to variability in receptor knock down with cre-lox system). This is part of rationale of developing the robust double receptor knockout cell models where we knocked out both receptors by >80%.

      An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't work, an explanation in the text would suffice.

      We would consider  over express SF3BF4 in the Wild type and DKO cells and assess the effects on spliceosome if deemed necessary.  However, we think it is unlikely to rescue the phenotype as so many other spliceosome components are downregulated in the DKO cells.

      As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      We have some detail on this and can add to the manuscript. However it is not extensive as not a major driver of this work.

      Lastly, the authors should caveat the cell experiments by discussing the ramifications of studying the 50% of the cells that survive vs the ones that died.

      Thank you, we appreciate this and this was the rationale behind cells being studied after 2 days differentiation before significant cell loss in order to avoid the issue of studying the 50% of cells that survive.

      Reviewer 2:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.<br /> Specific comments.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The six figures are generally well-designed, bars/superimposed dot-plots.

      Thank you

      Evaluation.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      We did this and will add this information to the methods/figure legend.

      Specific comments.

      (1) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable.

      Graphs can be changed to SD rather than SEM.

      (2) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for typical for this method; a reference could be helpful.

      This is an excellent question. We had 30% overlap as the parameters used for analysis were very stringent. We suspect we could get more than 30% by being less stringent, which still be considered as similar events if requested. Our methods were based on FLAIR analysis (PMID: 32188845)

      (3) Please explain "adjusted p value of 0.01." It is not clear how was it adjusted. The number of differentially-expressed proteins between the two cell types was 4842.

      We used the Benjamini-Hochberg method to adjust our data. We think the reviewer is referring to the transcriptomic data and not the proteomic data.

      Minor comments

      Page numbers in the text would help the reviewer communicate more effectively with the author.

      We will do this

      Reviewer 3:

      These investigators have previously shown important roles for either insulin receptor (IR) or insulin-like growth factor receptor (IGF1R) in glomerular podocyte function. They now have studied mice with deletion of both receptors and find significant podocyte dysfunction. They then made a podocyte cell line with inducible deletion of both receptors and find abnormalities in transcriptional efficiency with decreased expression of spliceosome proteins and increased transcripts with impaired splicing or premature termination.

      The studies appear to be performed well and the manuscript is clearly written.

      Thank you

      Referees cross-commenting

      I am in agreement with Reviewer 1 that the studies are overly descriptive and do not provide sufficient mechanism and the lack of more investigation of the in vivo model is a significant weakness.

      Please see our responses to reviewer 1 above.

      Significance

      Reviewer 1:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild.

      Thank you. The receptor  KO in the mice is unlikely to be complete (Please see comments above and Supplementary Figure 1b). There are many examples of KO models targeting other tissues showing that complete KO of these receptors seems difficult to achieve , particularly in reference to the IGF1 receptor. In the brain (which is also terminally differentiated cells PMID:28595357 (barely 50% iof IGF1R knockdown was achieved in the target cells). Ovarian granulosa cells PMID:28407051 -several tissue specific drivers tried but couldn't achieve any better than 80%. The paper states that 10% of IGF1R is sufficient for function in these cells so they conclude that their knockdown animals are probably still responding to IGF1. Finally, in our recent IGF1R podocyte knockdown model we found Cre levels were important for excision of a single floxed gene (PMID: 38706850) hence we were not surprised that trying to excise two floxed genes (insulin receptor and IGF1 receptor) was challenging. This is the rationale for making the double receptor knockout cell lines to understand process / biology in more detail.

      Reviewer 2:

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The figures are generally well-designed, bars/superimposed dot-plots.

      Evaluation.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      Thank you we will do this.

      Reviewer 3:

      There are a number of potential issues and questions with these studies.

      (1) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post natal development or number of glomeruli?

      Thank you we will add in further phenotyping data. We do not think there was a major developmental phenotype as  albuminuria did not become significantly different until several months of age. We could have used a doxycycline inducible model but we know the excision efficiency is much less than the podocin-cre driven model SUPP FIGURE 1. This would likely give a very mild (if any) phenotype and not reveal the biology adequately.

      (2) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo.

      Thank you for this we are happy to employ Immunohistochemistry (IHC) and immunofluorescence (IF) using spliceosome antibodies on tissue sections from DKO and control mice to examine spliceosome changes. However, as the DKO results in podocyte loss, there may not be that many DKO podocytes still present in the tissue sections. This will be taken into consideration.

      (3) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions.

      Thank you. We have full total and phospho-proteomic data sets from single insulin receptor and IGF1 receptor knockout cell lines that we will investigate for this point.

      (4) There are not studies investigating signaling mechanisms mediating the spliceosome abnormalities.

      Thank you as outlined as above to reviewer 1 point 1 we are very happy to investigate insulin / IGF signalling pathways in more detail.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02946

      Corresponding author(s): Margaret, Frame

      Roza, Masalmeh

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for recognizing the significance of our work and for their constructive feedback and suggestions, most of which we have implemented in our revised manuscript.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1

      Evidence, reproducibility and clarity

      Review of Masalmeh et al. Title: "FAK modulates glioblastoma stem cell energetics..."

      Previous studies have implicated FAK and the related tyrosine kinase PYK2 in glioblastoma growth, cell migration, and invasion. Herein, using a murine stem cell model of glioblastoma, the authors used CRISPR to inactivate FAK, FAK-null cells selected and cloned, and lentiviral re-expression of murine FAK in the FAK-null cells (termed FAK Rx) was accomplished. FAK-/- cells were shown to possess epithelial characteristics whereas FAK Rx cells expressed mesenchymal markers and increased cell migration/invasion in vitro. Comparisons between FAK-/- and FAK Rx cells showed that FAK re-expressed increased mitochondrial respiration and amino acid uptake. This was associated with FAK Rx cells exhibiting filamentous mitochondrial morphology (potentially an OXPHOS phenotype) and decreased levels of MTFR1L S235 phosphorylation (implicated in mito morphology fragmentation). Mito and epithelial cell morphology of FAK-/- cells was reversed by treatment with Rho-kinase inhibitors that also increased mito metabolism and cell viability. Last, FAK-dependent glioblastoma tumor growth was shown by comparisons of FAK-/- and FAK Rx implantation studies.

      The studies by Masalmeh provide interesting findings associating FAK expression with changes in mitochondrial morphology, energy metabolism, and glutamate uptake. According to the authors model, FAK expression is supporting a glioblastoma stem cell like phenotype in vitro and tumor growth in vivo. What remains unclear is the mechanistic connection to cell changes and whether or not these are be dependent on intrinsic FAK activity or as the Frame group has previously published, potentially FAK nuclear localization. The associations with MTFR1L phosphorylation and effects by Rho kinase inhibition are likely indirect and remind this reviewer of long-ago studies with FAK-null fibroblasts that exhibit epithelial characteristics, still express PYK2, exhibited elevated RhoA GTPase activity. Some of these phenotypes were linked to changes in RhoGEF and RhoGAP signaling with FAK and/or Pyk2. At a minimum, it would be informative to know whether Pyk2 signaling is relevant for observed phenotypes and whether the authors can further support their associations with FAK-targeted or FAK-Pyk2-targeted inhibitors or PROTACs.

      Some questions that would enhance potential impact. 1. Cell generation. Please describe the analysis of FAK-/- clones in more detail. The "low viability" phenotype needs further explanation with regard to clonal expansion and growth characteristics?

      Response:

      • We included a better description and a supplementary figure in our revised manuscript to indicate that we have examined several FAK -/- clones and confirmed that our observations were not due to clonal variation; multiple clones displayed similar morphological changes (Figure S1D). We also show that the elongated mesenchymal-like morphology was observed at 48 h after nucleofecting the cells with the FAK‑expressing vector, before beginning G418 selection to enrich for cells expressing FAK (Figure S1C). We also included experiments to acutely modulate FAK signalling (detaching and seeding cells on fibronectin) (Figure S2D, E, F and Figure S3) to exclude the possibility that the profound effects are due to protocols/selection we used for generating FAK-deleted cells.
      • Regarding the term “low viability”, we have clarified in the text that there is no significant difference in cell number (Figure S1A) or ‘cell viability’ when it is assessed by trypan blue exclusion (a non-mitochondria-dependent read-out) (Figure S1B) between FAK-expressing FAK Rx and FAK-/- cells cultured for three days under normal conditions. Therefore, we agree the term ‘cell viability’ in this context could be confusing and have replace "cell viability” with “metabolic activity as measured by Alamar Blue.” in Figure 1D and Figure 5B, and the corresponding text in the original manuscript. This wording more accurately reflects the data.

      Figure 1F: need further support of MET change upon FAK KO and EMT reversion.

      Response: We have added a heatmap (Figure S1E) illustrating the changes in protein expression of core-enriched EMT/MET genes products (by proteomics) after FAK gene deletion (EMT genes as defined in Howe et al., 2018) ; this strengthens the conclusion that the MET reversion morphological phenotype is accompanied by recognised MET protein changes.

      Fig. 2: Need further support if FAK effects impact glycolysis or oxidative phosphorylation in particular as implicated by the stem cell model.

      Response: We show that FAK impacts both glycolysis (Figure 2A, 2E, and 2F) and mitochondrial oxidative phosphorylation on the basis of the oxygen consumption rate (OCR) (Figure 2B, and 2D), showing both are contributing pathways to FAK-dependent energy production. We have clarified this in the text.

      Is there a combinatorial potential between FAKi and chemotherapies used for glioblastoma. Need to build upon past studies.

      Response: Yes, previous studies suggest that inhibiting FAK can sensitize GBM cells to chemotherapy (Golubovskaya et al., 2012; Ortiz-Rivera et al., 2023). We have included a paragraph in the discussion section to make sure this is clearer. Although it is not the subject of this study, we appreciate it is useful context.

      The notation of changes in glucose transporter expression should be followed up with regard to the potential that FAK-expressing cells may have different uptake of carbon sources and other amino acids. Altered uptake could be one potential explanation for increase glycolysis and glutamine flux.

      Response: We agree with the reviewer that glucose uptake could be contributing and we include data that 2 glucose transporters are indeed FAK-regulated namely Glucose transporter 1 (GLUT1, encoded by Slc2a1 gene) and Glucose transporter 3 (GLUT 3, encoded by Slc2a3 gene) (shown in Figure S2B and C).

      It would be helpful to support the confocal microscopy of mitos with EM.

      Response:

      We are concerned (and in our experience) that Electron microscopy (EM) may introduce artefacts during sample preparation. In contrast, immunofluorescence sample preparation is less susceptible to artefacts. The SORA system we used is not a conventional point-scanning confocal microscope, but is a super-resolution module based on a spinning disk confocal platform (CSU-W1; Yokogawa) using optical pixel reassignment with confocal detection. This method enhances resolution in all dimensions with resolution in our samples measured at 120nm. This has been instructive in defining a new level of changes in mitochondrial morphology upon FAK gene deletion.

      Lack of FAK expression with increased MTFR1 phosphorylation is difficult to interpret.

      Response: We do not directly show that this phosphorylation event is causal in our experiments; however, we think it important to document this change since it has been published that phosphorylation of MTFR1 has been causally linked to the mitochondrial morphology we observed in other systems (Tilokani et al., 2022).

      Need to have better support between loss of FAK and the increase in Rho signaling. Use of Rho kinase inhibitors is very limited and the context to FAK (and or Pyk2) remains unclear. Past studies have linked integrin adhesion to ECM as a linkage between FAK activation and the transient inhibition of RhoA GTP binding. Is integrin signaling and FAK involved in the cell and metabolism phenotypes in this new model?

      Response: To better support the antagonistic effect of FAK on Rho-kinase (ROCK) signalling, we included a new experiment in which the integrin-FAK signalling pathway has been disrupted by treating FAK WT cells with an agent that causes detachment from the substratum, Accutase, and growing the cells in suspension in laminin-free medium. We present ROCK activity data, as judged by phosphorylated MLC2 at serine 19 (pMLC2 S19), relating this to induced FAK phosphorylation at Y397 (a surrogate for FAK activity) that is supressed after integrin disengagement. These measurements have been compared with conditions whereby integrin-FAK signalling is activated by growing the cells on laminin coated surfaces. We observed a time-dependent decrease in pFAK(Y397) levels (normalised to total FAK) in suspended cells compared to those spread on laminin, while pMLC2(S19) levels increased in a reciprocal manner over time in detached cells relative to spread cells (S4A and B). There is therefore an inverse relationship between integrin-FAK signalling and ROCK-MLC2 activity, consistent with findings from FAK gene deletion experiments. In the former case, we do not rely on gene deletion cell clones.

      Significance

      The studies by Masalmeh provide interesting findings associating FAK expression with changes in mitochondrial morphology, energy metabolism, and glutamate uptake. According to the authors model, FAK expression is supporting a glioblastoma stem cell like phenotype in vitro and tumor growth in vivo. What remains unclear is the mechanistic connection to cell changes and whether or not these are be dependent on intrinsic FAK activity or as the Frame group has previously published, potentially FAK nuclear localization. The associations with MTFR1L phosphorylation and effects by Rho kinase inhibition are likely indirect and remind this reviewer of long-ago studies with FAK-null fibroblasts that exhibit epithelial characteristics, still express PYK2, exhibited elevated RhoA GTPase activity. Some of these phenotypes were linked to changes in RhoGEF and RhoGAP signaling with FAK and/or Pyk2. At a minimum, it would be informative to know whether Pyk2 signaling is relevant for observed phenotypes and whether the authors can further support their associations with FAK-targeted or FAK-Pyk2-targeted inhibitors or PROTACs.

      __Response: __

      Deleting the gene encoding FAK in mouse embryonic fibroblasts leads to elevated Pyk2 expression (Sieg, 2000). However, in the GBM stem cell model we used here, Pyk2 was not expressed (determined by both transcriptomics and proteomics). We have included Figure S1E to show that PYK2 expression was undetectable in FAK -/- and FAK Rx cells at the RNA level (Figure S1F). We conclude that there is no compensatory increase in Pyk2 upon FAK loss in these cells. In the transformed neural stem cell model of GBM, we do not consistently or robustly detect nuclear FAK.

      Review #2

      Masalmeh and colleagues employ a neural stem/progenitor cell-based glioma model (NPE cells) to investigate the role of Focal Adhesion Kinase (FAK) in GBM, with a focus on potential links between the regulation of morphological/adhesive and metabolic GBM cell properties. For this, the authors employ wt cells alongside newly generated FAK-KO and -reexpressing cells, as well as pharmacological interventions to probe the relevance of specific signaling pathways. The authors´ main claims are that FAK crucially modulates glioma cell morphology, cell-cell and cell-substrate interactions and motility, as well as their metabolism, and that these effects translate to changes to relevant in vivo properties such as invasion and tumor growth.

      My main issues are with the model chosen by the authors.

      As per the methods section, generation of FAK-KO and -"Rx" NPE cells entailed protracted selection/expansion processes, which may have resulted in inadvertent selection for cellular/molecular properties unrelated to the desired one (loss or gain of FAK expression) and which may have had cascading effects on NPE cells. The authors nonetheless repeatedly claim the parameters they quantify, such as mitochondrial or cytoskeletal properties or metabolic features, to have directly resulted from FAK loss or reintroduction. Examples of such causal inferences are to be found in lines 123, 134/135, 165, 181. Such causal claims are, in my view, unsupported.

      Acute perturbation of FAK expression/activity, genetically or pharmacologically, followed by a rapid assessment of the processes under investigation, would be needed to begin to assess causality, even if acute genetic perturbations may be technically challenging as sufficient gene expression reduction or restoration to physiologically relevant levels may be hard to achieve.

      Response:

      We would like to first comment on the model we used here, which we think will clarify the validity of our approach. The model is a transformed stem cell model of GBM that was published in (Gangoso et al., Cell, 2021) and is now used regularly in the GBM field. As mentioned in the response to Reviewer 1, we have added text (page 4 and 5 in the revised manuscript) and a new supplementary figure (Figure S1D) clarifying that the morphological changes we observed were consistent across multiple FAK -/- clones, showing this was not due to any inter-clonal variability. We also added images showing that the morphological changes were apparent at 48 h after nucleofecting FAK -/- cells with the FAK‑expressing vector specifically (not the empty vector), prior to starting G418 selection to enrich for FAK‑expressing cells (Figure S1C), addressing the worry that clonal variation and selection was the cause of the FAK-dependent phenotypes we observed. We believe that our model provides a type of well controlled, clean genetic cancer cell system of a type that is commonly used in cancer cell biology, allowing us to attribute phenotypes to individual proteins.

      We have also carried out a more acute treatment by using the FAK inhibitor VS4718 to perturb FAK kinase activity and assessed the effects on glycolysis and glutamine oxidation after 48h treatment (Figure S2D, E and F). We found that treating the transformed neural stem cells (parental population) with FAK inhibitor (300nM VS4718) decreases glucose incorporation into glycolysis intermediates and glutamine incorporation into TCA cycle intermediates, consistent with a role for FAK’s kinase activity in maintaining glycolysis and glutamine oxidation.

      The employed pharmacological modulation of ROCK activity is the only approach that, given the presumably acute nature of the treatment, may have allowed the authors to probe the proposed functional links. The methods section of the manuscript does not however comprise details as to the duration of these treatments, which leaves open the possibility of long-term treatment having been carried out (data shown in Figure 5B refers to 72hr treatment).

      __Response: __

      We have added the duration of the treatment to the Methods section and Figure Legends, to clarify that cells were treated with ROCK inhibitors for 24h, before assessing the effects on mictochondria (Figure 4C, D, S4C and D) and glutamine oxidation (Figure 5A, and S5). For metabolic activity by AlamarBlue assay, cells were treated with ROCK inhibitors for 72h (Figure 5B).

      Even in the case of ROCK inhibitor experiments, it is however unclear if and how the effects on cell morphology and adhesion, mitochondrial organization and metabolic activity may be connected to each other and, if at all, to FAK expression.

      Given the above uncertainties due to the nature of the model and experimental approaches, it is hard to assess the reliability and thus the relevance of the findings.

      Response:

      FAK suppresses ROCK activity (as judged by pMLC2 S19, Figure 4A and B). Treating FAK -/- cells with two different ROCK inhibitors restored mesenchymal-like cell morphology, mitochondrial morphology and glutamine oxidation. As mentioned above, to strengthen our evidence for the antagonistic role of FAK in ROCK-MLC2 signalling, we have now introduced an experiment whereby integrin-FAK signalling was disrupted through treatment with a detachment agent (Accutase), and subsequently maintaining the cells in suspension in laminin-free medium. We assessed pMLC2 S19 levels (a measure of ROCK activity) relating this to FAK phosphorylation that is supressed after integrin disengagement. These results were evaluated relative to spread wild type cells growing on laminin where Integrin-FAK signalling was active (Figure S4A and B). We observed an inverse relationship between Integrin-FAK signalling and ROCK-MLC2 activity in keeping with our conclusions (Figure 4A and B).

      Experimental support for the ability of cell-substrate interaction modulation to concomitantly impact cellular metabolism and motility/invasion would be significant both in terms of advancing our understanding of glioma cell biology and of its translational potential, but the evidence being provided is at best compatible with the proposed model.

      Response: We carried out a new experiment to support the ability of cell-substrate interaction modulation to impact metabolism; specifically, we inhibited cell-substrate interactions by plating the cells on Poly-2-hydroxyethyl methacrylate (Poly 2-HEMA)-coated dishes. This suppressed FAK phosphorylation at Y397, as expected, with concomitant reduction in glutamine utilisation in the TCA cycle (Figure S3A, B and C).

      My background/expertise is in developmental and adult neurogenesis, in vivo modelling of gliomagenesis and cell fate control/reprogramming, with a focus on molecular mechanisms of differentiation and quantitative aspects of lineage dynamics; molecular details of the control of cellular metabolism, cell-cell adhesion and cytoskeletal dynamics are not core expertise of mine.

      We appreciate this reviewer’s expertise are not necessarily in the cancer cell biology and genetic intervention aspects of our study. We hope that the explanations we have provided satisfy the reviewer that our conclusions are valid.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved. 

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a

      specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors. 

      Overall, the authors achieved their aims of demonstrating how common factors

      (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties. 

      We appreciate the comments and helpful suggestions. We now also include FLiSimBA simulation code in Python in addition to Matlab to make it more accessible to the community.

      One advantage of FLiSimBA is that the simulation package is flexible and adaptable, allowing users to input parameters based on the specific sensors, hardware, and autofluorescence measurements for their biological and optical systems. We used parameters based on a FRETbased sensor, measured autofluorescence from mouse tissue, and measured dark count/after pulse of our specific GaAsP PMT in this manuscript as examples. In Discussion and Materials and methods, we now emphasize this advantage and further clarify how these parameters can be adapted to diverse tissues, imaging systems, and sensors based on individual experiments. We further explain that these input parameters will not affect the conclusions of our study, but the specific input parameters would alter the quantitative thresholds.

      Reviewer #2 (Public review): 

      Summary: 

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging. 

      Strengths: 

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations. 

      Weaknesses: 

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given. 

      We appreciate the comments and constructive feedback. Our revision based on the reviewer’s suggestions has made our manuscript clearer and more user friendly. We originally described the detail of the fitting methods in Materials and methods. Given the importance of these methodological details for evaluating the conclusions of this study, we have moved the description of the fitting method from Materials and methods to Results. In addition, we provide further clarification and more details of the rationale of using these different methods of lifetime estimates in Discussion to aid users in choosing the best metric for evaluating fluorescence lifetime data.

      More specifically, we modified our writing to highlight the following.

      (1) In Results, we describe that lifetime histograms were fitted to Equation 3 with the GaussNewton nonlinear least-square fitting algorithm and the fitted P<sub1</sub> was used as lifetime estimation.

      (2) In Results, we clarify that our simulation of multiplexed imaging was modeled with two sensors, each displaying a single exponential decay, but the two sensors have different decay constants. We also describe that Equation 3 with the Gauss-Newton nonlinear least-square fitting algorithm was used to deconvolve the two multiplexed exponential signals (Fig. 8)

      Reviewer #3 (Public review): 

      Summary: 

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible. 

      Strengths: 

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible. 

      Weaknesses: 

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a twocomponent discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data). 

      We thank the reviewer for the constructive feedback. We agree that the FLiSimBA users should ensure that the right decay equations are used to describe the fluorescent sensors. In this study, we used a FRET-based PKA sensor FLIM-AKAR to provide proof-of-principle demonstration of the capability of FLiSimBA. The donor fluorophore of FLIM-AKAR, truncated monomeric enhanced GFP, displays a single exponential decay. FLIM-AKAR, a FRET-based sensor, displays a double exponential decay. The time constants of the two exponential components were determined and reported previously (Chen, et al, Neuron (2017)).  Thus, a double exponential decay equation with known τ<sub>1</sub> and τ<sub>2</sub> was used for both simulation and fitting. The goodness of fit is now provided in Supplementary Fig. 1 for both simulated and experimental data. In addition to referencing our prior study characterizing the double exponential decay model of FLIM-AKAR in Materials and methods, we have emphasized in Discussion the versality of FLiSimBA to adapt to different sensors, tissues, and analysis methods, and the importance of using the right mathematical models to describe the fluorescence decay of specific sensors. 

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.  

      In the original Fig 2C, the sensor fluorescence was much higher than the contributions from autofluorescence, afterpulse, and background signals, resulting in minimal effects of these other factors, as the reviewer noted. This original figure was based on photon counts from single neurons expressing FLIM-AKAR. For the rest of the manuscript, photon counts were based on whole fields of view (FOV). Since the FOV includes cells that do not express fluorescent sensors, the influence of autofluorescence, dark currents, and background is much more pronounced, as shown in Fig. 2B. 

      Both approaches – using photon counts from the whole FOV or from individual neurons – have their justifications. Photon counts from the whole FOV simulate data from fluorescence lifetime photometry (FLiP), whereas photon counts from individual neurons simulate data from fluorescence lifetime imaging microscopy (FLIM). However, the choice of approach does not affect the conclusions of the manuscript, as a range of photon count values are simulated. To maintain consistency throughout the manuscript, we have revised the photon counts in this figure (now Supplementary Fig. 1C) to match those from the whole FOV.

      Additionally, we have made some modifications in our analyses of Supplementary Fig. 1C and Fig. 2B, detailed in the “FLIM analysis” section of Materials and methods. For instance, to minimize system artifact interference at the histogram edges, we now use a narrower time range (1.8 to 11.5 ns) for fitting and empirical lifetime calculation.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors report how autofluorescence was measured from "imaged brain slices from mice at postnatal 15 to 19 days of age without sensor expression." However, it remains unclear how many acute slices and animals were used (for example, were all 15um x 15um FOV from a single slice) and if mouse age affects autofluorescence quantification. Furthermore, would in vivo measurements have different autofluorescence conditions given that blood flow would be active? It would help if the authors more clearly explained how reliable their autofluorescence measurement is by clarifying how they obtained it, whether this would vary across brain areas, and whether in vitro vs in vivo conditions would affect autofluorescence. 

      We have added description in Materials and methods that for autofluorescence ‘Fluorescence decay histograms from 19 images of two brain slices from a single mouse were averaged.’ We have added in Discussion that users should carefully ‘measure autofluorescence that matches the age, brain region, and data collection conditions (e.g., ex vivo or in vivo) of their tissue…’, and emphasize that FLiSimBA offers customization of inputs, and it is important for users to adapt the inputs such as autofluorescence to their experimental conditions. We also clarify in Discussion that the change of input parameters such as autofluorescence across age and brain region would not affect the general insights from this study, but will affect quantitative values.

      (2) Does sensor expression level issues arise more with in-utero electroporation compared to AAV-based delivery of biosensors? A brief comment on this in the discussion may help as most users in the field today may be using AAV strategies to deliver biosensors.

      In our experience, in-utero electroporation results in higher sensor expression than AAV-based delivery, and so pose less concern for expression-level dependence. However, both delivery methods can result in expression level dependence, especially with a sensor that is not bright. We have added in Discussion ‘For a sensor with medium brightness delivered via in utero electroporation, adeno-associated virus, or as a knock-in gene, the brightness may not always fall within the expression level-independent regime.’

      (3) Figure 1. Should the x-axis on the top figures be "Time (ns)" instead of "Lifetime (ns)"?

      Similarly in Figure 8A&B, wouldn't it make more sense to have the x-axis be Time not Lifetime?

      The x-axis labels in Fig. 1 and Fig. 8A-8B have been changed to ‘Time (ns)’.   

      (4) Figure 2b: why is the empirical lifetime close to 3.5ns? Shouldn't it be somewhere between

      2.14 and 0.69? 

      In our empirical lifetime calculation, we did not set the peak channel to have a time of 0.0488 ns (i.e. the laser cycle 12.5 ns divided by 256 time channels). Rather, we set the first time channel within a defined calculation range (i.e. 1.8 ns in Supplementary Fig. 1B) to have a time of 0.0488 ns (i.e.). Thus, the empirical lifetime exceeds 2.14 ns and depends on the time range of the histogram used for calculation. 

      For Fig. 2B and Supplementary Fig. 1C, we have now adjusted the range to 1.8-11.5 ns to eliminate FLIM artifacts at the histogram edges in our experimental data, resulting in an empirical lifetime around 2.255 ns. In contrast, the range for calculating the empirical lifetime of simulated data in the rest of the study (e.g. Fig. 4D) is 0.489-11.5 ns, yielding a larger lifetime of ~3.35 ns. 

      We have clarified these details and our rationale in Materials and methods.

      (5) Figure 2b: how come the afterpulse+background contributes more to the empirical lifetime than the autofluorescence (shorter lifetime). This was unclear in the results text why autofluorescence photons did not alter empirical lifetime as much as did the afterpulse/background.

      With a histogram range from 1.8 ns to 11.5 ns used in Fig. 2B, the empirical lifetime for FLIM-AKAR sensor fluorescence, autofluorescence, and background/afterpulse are: 2-2.3 ns, around 1.69 ns, and around 4.90 ns. The larger difference of background/afterpulse from FLIM-AKAR sensor fluorescence leads to larger influence of afterpulse+background than autofluorescence. We have added an explanation of this in Results.

      (6) One overall suggestion for an improvement that could help active users of lifetime biosensors understand the consequences would be to show either a real or simulated example of a "typical experiment" conducted using FLIM-AKAR and how an incorrect interpretation could be drawn as a consequence of these artifacts. For example, do these confounds affect experiments involving comparisons across animals more than within-subject experiments such as washing a drug onto the brain slice, and the baseline period is used to normalize the change in signal? I think this type of direct discussion will help biosensor users more deeply grasp how these factors play out in common experiments being conducted.

      We have added the following in Discussion, ‘…While this issue is less problematic when the same sample is compared over short periods (e.g. minutes), It can lead to misinterpretation when fluorescence lifetime is compared across prolonged periods or between samples when comparison is made across chronic time periods or between samples with different sensor expression levels. For example, apparent changes in fluorescence lifetime observed over days, across cell types, or subcellular compartments may actually reflect variations in sensor expression levels rather than true differences in biological signals (Fig. 6), Therefore, considering biologically realistic factors in FLiSimBA is essential, as it qualitatively impacts the conclusions.’

      Reviewer #2 (Recommendations for the authors): 

      The paper would be improved with more detail on the fitting methods, and the use of state-of-theart methods. Consult for example the introduction of this paper where many methods are listed: https://www.mdpi.com/1424-8220/22/19/7293

      We have moved the description of the Gauss-Newton nonlinear least-square fitting algorithm from Materials and methods to Results to enhance clarity. We appreciate the reviewer’s suggestion to combine FLiSimBA with various analysis methods. However, the primary focus of our manuscript is to call for attention of how specific contributing factors in biological experiments influence FLIM data, and to provide a tool that rigorously considers these factors to simulate FLIM data, which can then be used for fitting. Therefore, we did not expand the scope of our manuscript. Instead, we have added in the Discussion that ‘‘FLiSimBA can be used to test multiple fitting methods and lifetime metrics as an exciting future direction for identifying the best analysis method for specific experimental conditions’, citing relevant references.

      I would also improve the content of the GitHub repository as it is very hard to identify to source code used for simulation and fitting. 

      We have reorganized and relabeled our GitHub repository and now have three folders labeled as ‘Simulation_inMatlab’, ‘DataAnalysis_inMatlab’, and ‘SimulationAnalysis_inPython’. We also updated the clarification of the contents of each folder in the README file.

      Reviewer #3 (Recommendations for the authors): 

      (1) P. 10 "For example, to detect a P1 change of 0.006 or a lifetime change of 5 ps with one sample measurement in each comparison group, approximately 300,000 photons are needed." If I am reading the graphs in Figures 3B and C, this sentence is talking about the red line. However, the intersection of 0.006 in the MDD of P1 in 3B and red is not 3E5 photons. And the intersection of 0.005 ns and red in 3C is not 3E5 photons either. Are you sure you are talking about n=1? Maybe the values are correct for the blue curve with n=5.

      Thank you for catching our error. We have corrected the text to ‘with five sample measurements’.

      (2) Figure 2 (B) legend: It would be helpful to specify what is being compared in the legend. For example, consider revising "* p < 0.05 vs sensor only; n.s. not significant vs sensor + autoF; # p < 0.05 vs sensor + autoF. Two-way ANOVA with Šídák's multiple comparisons test" to "* p <0.05 for sensor + auto F (cyan) vs sensor only; n.s. not significant for final simulated data (purple) vs sensor + autoF; # p < 0.05 for final simulated data (purple) vs sensor + autoF. Twoway ANOVA with Šídák's multiple comparisons test".

      We’ve made the change and thanks for the suggestion to make it clearer.

      (3) Figure 2 (c) Can you please show the same Two-way ANOVA test values for Experimental vs. Sensor only and for Experimental vs. Sensor + autoF? Currently, the value (n.s.) is marked only for Experimental vs. Final simulation. Given that the experimental data are sparse (compared to the simulations), it seems likely that there may be no significant difference among the 3 different simulations regarding how well they match the experimental data. Also, can you specify the P1 and P2 of the experimental data  used to generate the simulated data on this panel? Also, what is the reason why P1=0.5 was used for panels A and B, instead of the value matching the experimental value?

      As the reviewer suggested, we have included statistical tests in the figure (now Supplementary Fig. 1C). Please see our response to the Public Review of Reviewer 3’s comments as well as our changes in Materials and Methods on other changes and their rationale for this figure. We have now specified the P<sub>1</sub> value of the experimental data used to generate the simulated data on this panel both in Figure Legends and Materials and Methods. Based on the suggestion, we have now used the same P<sub>1</sub> value in Fig. 2B.

    1. Be clear about the consequences of using AI to generate pornographic images. Tell students that they may see apps to create nude pictures advertised on platforms like TikTok. Though they may be curious or think it's funny (because the pictures aren't "real"), using AI to generate nude pictures of someone is harassment and illegal. It doesn't just harm the victim—law enforcement could get involved. Victims should tell a trusted adult, report to authorities, and can also report the incident to CyberTipline.org.

      This part really stands out as a necessary and urgent conversation. With how normalized AI tools have become on platforms like TikTok, I can see how some students might not fully grasp the seriousness of using them to generate explicit content. It’s not a harmless joke—it’s a form of harassment with real legal and emotional consequences. As someone who spends a lot of time online, I think we all need to take more responsibility in calling out this kind of behavior and making sure people know where to get help,

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3 (Public Review)

      Summary:

      In this paper the authors examined the effects of strip cropping, a relatively new agricultural technique of alternating crops in small strips of several meters wide, on ground beetle diversity. The results show an increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, unbalanced and taxonomically unspecific yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch. Moreover, after the first round of reviews, the authors have done a great job at rewriting the paper to make it less overstated, more relevant to the data at hand and more solid in the findings. Many of the weaknesses noted in the first review have been dealt with. The overall structure of the paper is good, with a clear introduction, hypotheses, results section and discussion.

      We are grateful for this positive feedback. We are glad that our extensive revision after extensive review from three reviewers has paid off in addressing earlier weakness of our manuscript.

      Weaknesses:

      The weaknesses that remain are mainly due to a difficult dataset and choices that could have stressed certain aspects more, like the relationship between strip cropping and intercropping. The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similar to intercropping, a technique which has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness.

      Unfortunately, the authors do not go into this in the introduction or otherwise and simply state that they consider strip cropping a form of intercropping.

      We agree with the reviewer that a mechanistic understanding on how intercropping and strip cropping differ would be very interesting. However, we also feel that this topic is somewhat beyond the scope of the current manuscript. We are already planning work to elucidate mechanisms that may explain the pest and suppressive effects of strip cropping.

      I also do not like the exclusive focus on percentages, as these are dimensionless. I think more could have been done to show underlying structure in the data, even after rarefaction.

      While we generally agree with this point raised by the reviewer, for our heterogeneous dataset it was difficult to come up with meaningful units with dimensions. Therefore, we believe that percentages are the most suitable approach to present readers a fair comparison of the treatments.

      A further weakness is a limited embedding into the larger scientific discourses other than providing references. But this may be a matter of style and/or taste

      We believe our manuscript to be well-embedded within the relevant scientific discourse, but as indicated by reviewer 3 this might indeed be a matter of style/taste. Without exact examples it is difficult for us to judge this point.

      Reviewer #3 (Recommendations for the authors): 

      Suggestion for title: "Strip cropping shows promising preliminary increases in ground beetle community diversity compared to monocultures"

      We agree that the title could indeed be nuanced. We incorporated the suggested title, except for the word “preliminary”, as we felt that this is slightly misplaced for a 4-year study conducted at 4 locations.

      line 26: the word previous may be confusing to readers, as it suggests previous research on beetles or insects. I think it would be better to use for instance "related" or "productivity focused research"

      We agree that this wording might be confusing, and changed it to “other studies showed”.

      Line 84-85: this is vague. can you make explicit what you are trying to answer here?

      We made “biodiversity metric changes” more explicit, and changed the sentence accordingly.

      Line 88-89: I think this would fit better with the first question in line 83-84, so I suggest placing it upwards. Also, I think you mean abundant instead of common. Common suggests commonness in the entire population. Abundant suggests found often in this study. While these definitions may very much overlap, they are distinctly different.

      We have moved this sentence up and changed “common” to “abundant”. To make the result section more in line with this section, we also moved the section on the relationship between crop configuration and abundant genera up.  

      Line 146: defining rareness of species should be in the methods section. Also "following" would be better than "according"

      We now added a sentence on how we examine habitat preferences and rarity in the methods section (line 316-317). We also changed “according to” to “following”.

      Line 291: it is called being "flush" with the soil surface. This expression is not much used by non-native speakers, but is regularly encountered in studies on pitfalls, so the authors could decide to change the sentence using the proper English vernacular.

      Suggestion incorporated.

      Line 322-327, this method could do with a reference

      This method is a relatively standard calculation to calculate relative changes and to center variation around zero. Nevertheless, we added a reference to a paper that used the same method.

      Line: 333-335. I would still like to see a reference for this method.

      This methodology has not been described in literature to the best of our knowledge. As we compared two crops within strip cropping with their respective monoculture references, we compare one strip cropping field with two monocultural fields. Here we took a conservative approach by comparing the strip crop field with the monoculture with the highest richness and activity density, to see if strip cropped fields outperformed monocultures with diverse ground beetle communities.

      Line 364-366. references?

      We have added references for these R packages.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors claim that they can use a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) to cause slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) It is highly unclear what, if anything, transpires in the brain with non-invasive stimulation. To cite one example of many, a rigorous study in rats and human cadavers, compellingly showed that traditional parameters of transcranial electrical stimulation lead to no change in brain activity due to the attenuation by the soft tissue and skull (Mihály Vöröslakos et al Nature Communications 2018): https://www.nature.com/articles/s41467-018-02928-3. It would be very useful to demonstrate via invasive neurophysiological recordings that the parameters used in the current study do indeed lead to any kind of change in brain activity. Of course, this particular study uses a different non-invasive stimulation protocol.

      Thank you for raising the important issue regarding the actual neurophysiological effects of non-invasive brain stimulation. Unfortunately, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints, while studies on cadavers or rodents would not fully resolve our question. Indeed, the authors of the cited study (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human brain and cadavers due to alterations in electrical conductivity that occur in postmortem tissue. Huang and colleagues addressed the difficulties in reaching direct evidence of non-invasive brain stimulation (NIBS) effects in a review published in Clinical Neurophysiology in 2017. They conclude that the use of EEG to assess brain response to TMS has great potential for a less indirect demonstration of plasticity mechanisms induced by NIBS in humans.

      To address this challenge, we conducted Experiments 3 and 4, which respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner using TMS-EEG and fMRI. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      We acknowledge that further exploration of this aspect would be highly valuable, and we agree that it is worth discussing both as a technical limitation and as a potential direction for future research. We therefore, modify the discussion accordingly (main text, lines 280-289).

      “Although we studied TMS and tACS propagation through the E-field modeling and observed an increase in the precuneus gamma oscillatory activity, excitability and connectivity with the hippocampi, we cannot exclude that our results might reflect the consequences of stimulating more superficial parietal regions other than the precuneus nor report direct evidence of microscopic changes in the brain after the stimulation. Invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. Studies on cadavers or rodents would not fully resolve our question due to significant differences between them (i.e. rodents do not have an anatomical correspondence while cadavers have an alterations in electrical conductivity occurring in postmortem tissue). However, further exploration of this aspect in future studies would help in the understanding of γtACS+iTBS effects.”

      (2) If there is any brain activity triggered by the current stimulation parameters, then it is extremely difficult to understand how this activity can lead to enhancing memory. The brain is complex. There are hundreds of neuronal types. Each neuron receives precise input from about 10,000 other neurons with highly tuned synaptic strengths. Let us assume that the current protocol does lead to enhancing (or inhibiting) simultaneously the activity of millions of neurons. It is unclear whether there is any activity at all in the brain triggered by this protocol, it is also unclear whether such activity would be excitatory, or inhibitory. It is also unclear how many neurons, let alone what types of neurons would change their activity. How is it possible that this can lead to memory enhancement? This seems like using a hammer to knock on my laptop and hope that the laptop will output a new Mozart-like sonata.

      Thank you for your comment. As you correctly point out, we still do not have precise knowledge of which neurons—and to what extent—are activated during non-invasive brain stimulation in humans. However, this challenge is not limited to brain stimulation but applies to many other therapeutic interventions, including psychiatric medications, without limiting their use.

      Nevertheless, a substantial body of research has investigated the mechanisms underlying the efficacy of TMS and tACS in producing behavioral after-effects, primarily through its ability to induce long-term potentiation (Bliss & Collingridge, The Journal of Physiology, 1993a; Ridding & Rothwell, Nature Reviews Neuroscience, 2007; Huang et al., Clinical Neurophysiology, 2017; Koch et al., Neuroimage 2018; Koch et al., Brain 2022; Jannati et al., Neuropsychopharmacology, 2023; Wischnewski et al., Trends in Cognitive Science, 2023; Griffiths et al., Trends in Neuroscience, 2023).

      We acknowledge that we took this important aspect for granted. We consequently expanded the introduction accordingly (main text, lines 48-60).

      “Repetitive transcranial magnetic stimulation (rTMS) and transcranial alternating current stimulation (tACS) are two forms of NIBS widely used to enhance memory performances (Grover et al., 2022; Koch et al., 2018; Wang et al., 2014). rTMS, based on the principle of Faraday, induces depolarization of cortical neuronal assemblies and leads to after-effects that have been linked to changes in synaptic plasticity involving mechanisms of long-term potentiation (LTP) (Huang et al., 2017; Jannati et al., 2023). On the other hand, tACS causes rhythmic fluctuations in neuronal membrane potentials, which can bias spike timing, leading to an entrainment of the neural activity (Wischnewski et al., 2023). In particular, the induction of gamma oscillatory a has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      (3) Even if there is any kind of brain activation, it is unclear why the authors seem to be so sure that the precuneus is responsible. Are there neurophysiological data demonstrating that the current protocol only activates neurons in the precuneus? Of note, the non-invasive measurements shown in Figure 3 are very weak (Figure 3A top and bottom look very similar, and Figure 3C left and right look almost identical). Even if one were to accept the weak alleged differences in Figure 3, there is no indication in this figure that there is anything specific to the precuneus, rather a whole brain pattern. This would be the kind of minimally rigorous type of evidence required to make such claims. In a less convincing fashion, one could look at different positions of the stimulation apparatus. This would not be particularly compelling in terms of making a statement about the precuneus. But at least it would show that the position does matter, and over what range of distances it matters, if it matters.

      Thank you for your feedback. Our assumption that the precuneus plays a key role in the observed effects is based on several factors:

      (1) The non-invasive stimulation protocol was applied to an individually identified precuneus for each participant. Given existing evidence on TMS propagation, we can reasonably assume that the precuneus was at least a mediator of the observed effects (Ridding & Rothwell, Nature Reviews Neuroscience 2007). For further details about target identification and TMS and tACS propagation, please refer to the MRI data acquisition section in the main text and Biophysical modeling and E-field calculation section in the supplementary materials.

      (2) To investigate the effects of the neuromodulation protocol on cortical responses, we conducted a whole-brain analysis using multiple paired t-tests comparing each data point between different experimental conditions. To minimize the type I error rate, data were permuted with the Monte Carlo approach and significant p-values were corrected with the false discovery rate method (see the Methods section for details). The results identified the posterior-medial parietal areas as the only regions showing significant differences across conditions.

      (3) To control for potential generalized effects, we included a control condition in which TMS-EEG recordings were performed over the left parietal cortex (adjacent to the precuneus). This condition did not yield any significant results, reinforcing the cortical specificity of the observed effects.

      However, as stated in the Discussion, we do not claim that precuneus activity alone accounts for the observed effects. As shown in Experiment 4, stimulation led to connectivity changes between the precuneus and hippocampus, a network widely recognized as a key contributor to long-term memory formation (Bliss & Collingridge, Nature 1993). These connectivity changes suggest that precuneus stimulation triggered a ripple effect extending beyond the stimulation site, engaging the broader precuneus-hippocampus network.

      Regarding Figure 3A, it represents the overall expression of oscillatory activity detected by TMS-EEG. Since each frequency band has a different optimal scaling, the figure reflects a graphical compromise. A more detailed representation of the significant results is provided in Figure 3B. The effect sizes for gamma oscillatory activity in the delta T1 and T2 conditions were 0.52 and 0.50, respectively, which correspond to a medium effect based on Cohen’s d interpretation.

      We add a paragraph in the discussion to improve the clarity of the manuscript regarding this important aspect (lines 193-198).

      “Given the existing evidence on TMS propagation and the computation of the Biophysical model with the Efield, we can reasonably assume that the individually identified PC was a mediator of the observed effects (Ridding and Rothwell, 2007). Moreover, we observed specific cortical changes in the posteromedial parietal areas, as evidenced by the whole-brain analysis conducted on TMS-EEG data and the absence of effect on the lateral posterior parietal cortex used as a control condition.”

      (4) In the absence of any neurophysiological documentation of a direct impact on the brain, an argument in this type of study is that the behavioral results show that there must be some kind of effect. I agree with this argument. This is also the argument for placebo effects, which can be extremely powerful and useful even if the mechanism is unrelated to what is studied. Then let us dig into the behavioral results.

      Hoping to have already addressed your concern regarding the neurophysiological impact of the stimulation on the brain, we would like to emphasize that the behavioral results were obtained controlling for placebo effects. This was achieved by having participants perform the task under different stimulation conditions, including a sham condition.

      4a. There does not seem to be any effect on the STMB task, therefore we can ignore this.

      4b. The FNAT task is minimally described in the supplementary material. There are no experimental details to understand what was done. What was the size of the images? How long were the images presented for? Were there any repetitions of the images? For how long did the participants study the images? Presumably, all the names and occupations are different? What were the genders of the faces? What is chance level performance? Presumably, the same participant saw different faces across the different stimulation conditions. If not, then there can be memory effects across different conditions that are even more complex to study. If yes, then it would be useful to show that the difficulty is the same across the different stimuli.

      We thank you for signaling the lack in the description of FNAT task. We added the information required in the supplementary information (lines 93-101).

      “Each picture's face size was 19x15cm. In the learning phase, faces were shown along with names and occupations for 8 seconds each (totaling approximately 2 minutes). During immediate recall, the faces were displayed alone for 8 seconds. In the delayed recall and recognition phase, pictures were presented until the subject provided answers. We used a different set of stimuli for each stimulation condition, resulting in a total of 3 parallel task forms balanced across conditions and session order. All parallel forms comprised 6 male and 6 female faces; for each sex, there were 2 young adults (around 30 years old), 2 middle-aged adults (around 50 years old), and 2 elderly adults (around 70 years old). Before the experiments, we conducted a pilot study to ensure no differences existed between the parallel forms of the task.”

      The chance level in the immediate and delayed recall is not quantifiable since the participants had to freely recall the name and the occupation without a multiple choice. In the recognition, the chance level was around 33% (since the possible answers were 3).

      4c. Although not stated clearly, if I understand FNAT correctly, the task is based on just 12 presentations. Each point in Figure 2A represents a different participant. Unfortunately, there is no way of linking the performance of individual participants across the conditions with the information provided. Lines joining performance for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it is difficult to get too excited about these 12 measurements. For example, take Figure 3A immediate condition TOTAL, arguably the largest effect in the whole paper. It seems that on average, the participants may remember one more face/name/occupation.

      Thank you for the suggestion. We added graphs showing lines linking the performance of individual participants across conditions to improve clarity, please see Fig.2 revised. We apologize for the lack of clarity in the description of the FNAT. As you correctly pointed out, we used the percentage based on the single association between face, name and occupation (12 in total). However, each association consisted of three items, resulting in a total of 36 items to learn and associate – we added a paragraph to make it more explicit in the manuscript (lines 425-430).

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      In the example you mentioned, participants were, on average, able to correctly recall and associate three more items compared to the other conditions. While this difference may not seem striking at first glance, it is important to consider that we assessed memory performance after a single, three-minute stimulation session. Similar effects are typically observed only after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022). Moreover, memory performance changes are often measured by a limited set of stimuli due to methodological constraints related to memory capacity. For example, Rey Auditory Verbal learning task, requiring to learn and recall 15 words, is a typical test used to detect memory changes (Koch et al., Neuroimage, 2018; Benussi et al., Brain stimulation 2021; Benussi et al., Annals of Neurology, 2022). 

      4d. Block effects. If I understand correctly, the experiments were conducted in blocks. This is always problematic. Here is one example study that articulated the big problems in block designs (Li et al TPAMI 2021):https://ieeexplore.ieee.org/document/9264220

      Thank you for the interesting reference. According to this paper, in a block design, EEG or fMRI recordings are performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design where both TMS-EEG and fMRI were conducted in resting state on different days according to the different stimulation conditions.

      4e. Even if we ignore the lack of experimental descriptions, problems with lack of evidence of brain activity, the minimalistic study of 12 faces, problems with the block design, etc. at the end of the day, the results are extremely weak. In FNAT, some results are statistically significant, some are not. The interpretation of all of this is extremely complex. Continuing with Figure 3A, it seems that the author claims that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. I am struggling to interpret such a result. When separating results by name and occupation, the results are even more perplexing. There is only one condition that is statistically significant in Figure 3A NAME and none in the occupation condition.

      Thank you again for your feedback. Hoping to have thoroughly addressed your initial concerns in our previous responses, we now move on to your observations regarding the behavioral results, assuming you were referring to Figure 2A. The main finding of this study is the improvement in long-term memory performance, specifically the ability to correctly recall the association between face, name, and occupation (total FNAT), which was significantly enhanced in both Experiments 1 and 2. However, we also aimed to explore the individual contributions of name and occupation separately to gain a deeper understanding of the results. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall. We understand that this may have caused some confusion. We consequently modified the manuscript in the (lines 97-99; 107-111; 425-430) to make it clearer and moved the graph relative to FNAT NAME and OCCUPATION from fig.2 in the main text to fig. S4 in supplementary information.

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18; p =0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86; p =0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall reveald that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      Regarding the stimulation conditions, your concerns about the performance pattern (iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham) are understandable. However, this new protocol was developed precisely in response to the variability observed in behavioral outcomes following non-invasive brain stimulation, particularly when used to modulate memory functions (Corp et al., 2020; Pabst et al., 2022). As discussed in the manuscript, it is intended as a boost to conventional non-invasive brain stimulation protocols, leveraging the mechanisms outlined in the Discussion section.

      (5) In sum, it would be amazing to be able to use non-invasive stimulation for any kind of therapeutic purpose as the authors imagine. More work needs to be done to convince ourselves that this kind of approach is viable. The evidence provided in this study is weak.

      We hope our response will be carefully considered, fostering a constructive exchange and leading to a reassessment of your evaluation.

      Reviewer #2 (Public review):

      Summary:

      The manuscript "Dual transcranial electromagnetic stimulation of the precuneus-hippocampus network boosts human long-term memory" by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      Thank you for your comments. As you pointed out, we did not include a condition where γtACS was applied alone. This decision was based on the findings of Guerra et al. (Brain Stimulation 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. However, you raise an important aspect that should be further discussed, we modified the limitation section accordingly (lines 290-297).

      “We did not assess the effects of γtACS alone. This decision was based on the findings of Guerra et al. (Guerra et al., 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. While examining the effects of γtACS alone could help isolate its specific contribution to this target and memory function, extensive research has shown that achieving a cognitive enhancement aftereffect with tACS alone typically requires around 20–25 minutes of stimulation (Grover et al., 2023).”

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      Thank you for your comment and for raising this interesting point. As you correctly noted, the protocol we used has a duration of three minutes, a choice based on previous studies demonstrating its greater efficacy with respect to single stimulation from a neurophysiological point of view. Specifically, these studies have shown that the combined stimulation enhanced gamma-band oscillations and increased cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) are all associated with memory formation and encoding processes, we decided to apply the co-stimulation immediately before it to enhance the efficacy. We added this paragraph to the manuscript rationale (lines 48-60).

      “Repetitive transcranial magnetic stimulation (rTMS) and transcranial alternating current stimulation (tACS) are two forms of NIBS widely used to enhance memory performances (Grover et al., 2022; Koch et al., 2018; Wang et al., 2014). rTMS, based on the principle of Faraday, induces depolarization of cortical neuronal assemblies and leads to after-effects that have been linked to changes in synaptic plasticity involving mechanisms of long-term potentiation (LTP) (Huang et al., 2017; Jannati et al., 2023). On the other hand, tACS causes rhythmic fluctuations in neuronal membrane potentials, which can bias spike timing, leading to an entrainment of the neural activity (Wischnewski et al., 2023). In particular, the induction of gamma oscillatory a has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      Regarding the question of whether stimulation could also benefit recall, the answer is yes. We can speculate that repeating the stimulation before recall might provide an additional boost. This is supported by evidence showing that both the precuneus and gamma oscillations are involved in recall processes (Flanagin et al., Cerebral Cortex 2023; Griffiths et al., Trends in Neurosciences 2023). Furthermore, previous research suggests that reinstating the same brain state as during encoding can enhance recall performance (Javadi et al., The Journal of Neuroscience 2017). We added this consideration to the discussion (lines 305-311).

      “Future studies should further investigate the effects of stimulation on distinct memory processes. In particular, stimulation could be applied before retrieval (Rossi et al., 2001), to better elucidate its specific contribution to the observed enhancements in memory performance. Additionally, it would be worth examining whether repeated stimulation - administered both before encoding and before retrieval - could produce a boosting effect. This is especially relevant in light of findings showing that matching the brain state between retrieval and encoding can significantly enhance memory performance (Javadi et al., 2017).”

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      The stimulation protocol was chosen based on previous studies (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022).  Gamma tACS sinusoid frequency wave was set at 70 Hz while iTBS consisted of ten bursts of three pulses at 50 Hz lasting 2 s, repeated every 10 s with an 8 s pause between consecutive trains, for a total of 600 pulses total lasting 190 s (see iTBS+γtACS neuromodulation protocol section). In particular, the theta iTBS has been inspired by protocols used in animal models to elicit LTP in the hippocampus (Huang et al., Neuron 2005). Consequently, neither Theta iTBS nor the gamma frequency of tACS were personalized. The increase in gamma oscillations was referred to the patient’s baseline and did not correspond to the administrated tACS frequency.

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      Thank you for the suggestion. We analyzed theta oscillations, finding no changes.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their partial contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we revised the manuscript accordingly (lines 97-98; 107-111; 425-430).

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18 ;p=0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86;p=0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall revealed that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      We also moved the data regarding the specific contribution of name and occupation recall in the supplementary information (fig.S4) and further specified how we computed the score in the score (lines 102-104).

      “The score was computed by deriving an accuracy percentage index dividing by 12 and multiplying by 100 the correct association sum. The partial recall scores were computed in the same way only considering the sum of face-name (NAME) and face-occupation (OCCUPATION) correctly recollected.”

      Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they found that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate the neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increase gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting-state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for the treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments (with the only caveat that I am not an expert in fMRI functional connectivity measures and DTI). It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They are also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      Thank you for the observation. As you mentioned, our power analysis was based on our previous study investigating the same neuromodulation protocol with a corresponding experimental design. The relatively small sample could be considered a possible limitation of the study which we will add to the discussion. A fundamental future step will be to replay these results on a larger population, however, to strengthen our results we performed the sensitivity analysis you suggested.

      In detail, we performed a sensitivity analysis for repeated-measures ANOVA with α=0.05 and power(1-β)=0.80 with no sphericity correction. For experiment 1, a sensitivity analysis with 1 group and 3 measurements showed a minimal detectable effect size of f=0.524 with 20 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η<sup>2</sup>=0.274 corresponding to f=0.614; the ANOVA on FNAT delayed performance revealed an effect size of η<sup>2</sup>=0.236 corresponding to f=0.556. For experiment 2, a sensitivity analysis for total FNAT immediate performance (1 group and 3 measurements) showed a minimal detectable effect size of f=0.797 with 10 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η<sup>2</sup>=0.448 corresponding to f=0.901. The sensitivity analysis for total FNAT delayed performance (1 group and 6 measurements) showed a minimal detectable effect size of f=0.378 with 10 participants. In our paper, the ANOVA on total FNAT delayed performance revealed an effect size of η<sup>2</sup>=0.484 corresponding to f=0.968. Thus, the sensitivity analysis showed that both experiments were powered enough to detect the minimum effect size computed in the power analysis. We have now added this information to the manuscript and we thank the reviewer for her/his suggestion in the statistical analysis and results section (lines 99-100; 127-128; 130-131; 543-545).

      “The sensitivity analysis showed a minimal detectable effect size of  η<sup>2</sup>=0.215 with 20 participants.”

      “The sensitivity analysis showed a minimal detectable effect size of  η<sup>2</sup>=0.388 with 10 participants.”

      “The sensitivity analysis showed a minimal detectable effect size of η<sup>2</sup>=0.125 with 10 participants.”

      “Since we do not have an a priori effect size for experiment 1 and 2, we performed a sensitivity power analysis to ensure that these experiments were able to detect the minimum effect size with 80% power and alpha level of 0.05.”

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      Thank you again for the suggestions. Your observations are correct, we used a slightly different statistical depending on our hypothesis. Here are the details:

      In experiment 1, we used a repeated-measure ANOVA with one factor “stimulation condition” (iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS). Following the significant effect of this factor we performed post-hoc analysis with Bonferroni correction.

      In experiment 2, we used a repeated-measures with two factors “stimulation condition” and “time”. As expected, we observed a significant effect of condition, confirming the result of experiment 1, but not of time. Thus, this means that the neuromodulatory effect was present regardless of the time point. However, to explore whether the effects of stimulation condition were present in each time point we performed some explorative t-tests with no correction for multiple comparisons since this was just an explorative analysis.

      In experiment 3, we used the same approach as experiment 1. However, since we had a specific hypothesis on the direction of the effect already observed in our previous study, i.e. increase in spectral power (Maiella et al., Scientific Report 2022), our tests were 1-tailed.

      For the p-values, we corrected the manuscript reporting the exact values for every result.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      Thank you for your comment. We fully agree with your observation, which is why this aspect has been considered in the study's limitations. To address your concern, we add this sentence to the limitation discussion (lines 299-301).

      “Consequently, these findings do not allow precise inferences regarding the specific mechanisms by which dual iTBS and γtACS of the precuneus modulate learning and memory.”

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      Thank you for your interesting observation. We argue that the intervention primarily targeted long-term associative memory formation, as our findings demonstrated effects only on FNAT. However, as you correctly pointed out, we cannot exclude the possibility that the stimulation may also influence short-term verbal associative memory. We add this aspect when discussing the absence of significant findings in the STM task (lines 205-210).

      “Visual short-term associative memory, measured by STBM performance, was not modulated by any experimental condition. Even if we cannot exclude the possibility that the stimulation could have influenced short-term verbal associative memory, we expected this result since short-term associative memory is known to rely on a distinct frontoparietal network while FNAT, used to investigate long-term associative memory, has already been associated with the neural activity of the PC and the hippocampus (Parra et al., 2014; Rentz et al., 2011).”

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

      Thank you for your comment and for raising these important points.

      We hypothesized that co-stimulation would enhance encoding processes. Previous studies have shown that co-stimulation can enhance gamma-band oscillations and increase cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) have all been associated with encoding processes, we decided to apply co-stimulation before the encoding phase, to boost it. We enlarged the introduction to specify the link between neural mechanisms and the psychological process of the encoding (lines 55-60).

      “In particular, the induction of gamma oscillatory activity has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      We applied the co-stimulation immediately before the learning phase to maximize its potential effects. While we observed a significant increase in gamma oscillatory activity lasting up to 20 minutes, we cannot determine whether the behavioral effects we observed would have been the same with a co-stimulation applied 20 minutes before learning. Based on existing literature, a reduction in the efficacy of co-stimulation over time could be expected (Huang et al., Neuron 2005; Thut et al., Brain Topography 2009). However, we hypothesize that multiple stimulation sessions might provide an additional boost, helping to sustain the effects over time (Thut et al., Brain Topography 2009; Koch et al., Neuroimage 2018; Koch et al., Brain 2022).

      Regarding the absence of further forgetting in both stimulation conditions, we think that the clinical and demographical characteristics of the sample (i.e. young and healthy subjects) explain the almost absence of forgetting after one week.

      Reviewer #1 (Recommendations for the authors):

      To address the concerns, the authors should:

      (1) Include invasive neuronal recordings (e.g., in rats or monkeys if not possible in humans) demonstrating that the current stimulation protocol leads to direct changes in brain activity.

      We understand the interest of the first reviewer in the understanding of neurophysiological correlates of the stimulation protocol, however, we are skeptical about this request as we think it goes beyond the aims of the study. As already mentioned in the response to the reviewer, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. At the same time, studies on cadavers or rodents would not fully resolve the question. Indeed, the authors of the study cited by the reviewer (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human cadavers due to alterations in electrical conductivity that occur in postmortem tissue. Huang and colleagues addressed the difficulties in reaching direct evidence of non-invasive brain stimulation (NIBS) effects in a review published in Clinical Neurophysiology in 2017. They conclude that the use of EEG to assess brain response to TMS has a great potential for a less indirect demonstration of plasticity mechanisms induced by NIBS in humans.

      It is exactly to meet the need to investigate the changes in brain activity after the stimulation protocol that we conducted Experiments 3 and 4. These experiments respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner using TMS-EEG and fMRI. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      Acknowledging the reviewer's point of view, we modified the manuscript accordingly, discussing this aspect both as a technical limitation and as a potential direction for future research (main text, lines 280-289).

      “Although we studied TMS and tACS propagation through the E-field modeling and observed an increase in the precuneus gamma oscillatory activity, excitability and connectivity with the hippocampi, we cannot exclude that our results might reflect the consequences of stimulating more superficial parietal regions other than the precuneus nor report direct evidence of microscopic changes in the brain after the stimulation. Invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. Studies on cadavers or rodents would not fully resolve our question due to significant differences between them (i.e. rodents do not have an anatomical correspondence while cadavers have an alterations in electrical conductivity occurring in postmortem tissue). However, further exploration of this aspect in future studies would help in the understanding of γtACS+iTBS effects.”

      (2) Address all the technical questions about the experimental design.

      We addressed all the technical questions about the experimental design.

      (3) Repeat the experiments with randomized trial order and without a block design.

      The experiments were conducted with randomized trial order and we did not use a block design.

      (4) Add many more faces to the study. It is extremely difficult to draw any conclusion from merely 12 faces. Ideally, there would be lots of other relevant memory experiments where the authors show compelling positive results.

      We understand your perplexity about drawing conclusions from 12 faces, however, this is not the case. As we explained in the response reviewer, the task we implemented did not rely on the recall of merely 12 faces. Instead, participants had to correctly learn, associate and recall 12 faces, 12 names and 12 occupations for a total of 36 items. To improve the clarity of the manuscript, we added a paragraph to make this aspect more explicit (lines 425-430).

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      The behavioral changes we observed are similar to those who are typically observed after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022, Benussi et al., Annals of Neurology, 2022). Moreover, memory performance changes are often measured by a limited set of stimuli due to methodological constraints related to memory capacity. For example, Rey Auditory Verbal learning task, requiring to learn and recall 15 words, is a typical test used to detect memory changes (Koch et al., Neuroimage, 2018; Benussi et al., Brain stimulation 2021; Benussi et al., Annals of Neurology, 2022). 

      (5) Provide a clear explanation of the apparent randomness of which results are statistically significant or not in Figure 3. But perhaps with many more experiments, a lot more memory evaluations, many more stimuli, and addressing all the other technical concerns, either the results will disappear or there will be a more interpretable pattern of results.

      We provided explanations for all the concerns shown by the reviewer.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (2) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their partial contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we revised the manuscript accordingly (lines 97-98; 107-111; 425-430).

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18; p=0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86; p =0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall revealed that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      We also moved the data regarding the specific contribution of name and occupation recall in the supplementary information (fig.S4) and further specified how we computed the score in the score (lines 102-104).

      “The score was computed by deriving an accuracy percentage index dividing by 12 and multiplying by 100 the correct association sum. The partial recall scores were computed in the same way only considering the sum of face-name (NAME) and face-occupation (OCCUPATION) correctly recollected.”

      Reviewer #3 (Recommendations for the authors):

      A very small detail, in the caption for Figure 2A, OCCUPATION is described as being shown on the 'left' but it should be 'right'.

      We corrected this error.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Phytophathogens including fungal pathogens such as F. graminearum remain a major threat to agriculture and food security. Several agriculturally relevant fungicides including the potent Quinofumelin have been discovered to date, yet the mechanisms of their action and specific targets within the cell remain unclear. This paper sets out to contribute to addressing these outstanding questions.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      The paper is generally well-written and provides convincing data to support their claims for the impact of Quinofumelin on fungal growth, the target of the drug, and the potential mechanism. Critically the authors identify an important pyrimidine pathway dihydroorotate dehydrogenase (DHODH) gene FgDHODHII in the pathway or mechanism of the drug from the prominent plant pathogen F. graminearum, confirming it as the target for Quinofumelin. The evidence is supported by transcriptomic, metabolomic as well as MST, SPR, molecular docking/structural biology analyses.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      Whilst the study adds to our knowledge about this drug, it is, however, worth stating that previous reports (although in different organisms) by Higashimura et al., 2022 https://pmc.ncbi.nlm.nih.gov/articles/PMC9716045/ had already identified DHODH as the target for Quinofumelin and hence this knowledge is not new and hence the authors may want to tone down the claim that they discovered this mechanism and also give sufficient credit to the previous authors work at the start of the write-up in the introduction section rather than in passing as they did with reference 25? other specific recommendations to improve the text are provided in the recommendations for authors section below.

      We appreciate the reviewer's suggestion. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of previous work on quinofumelin by Higashimura et al., 2022 in the discussion section to more effectively contextualize their contributions. Moreover, we have made revisions and provided responses in accordance with the recommendations.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors aim to identify the mode of action/molecular mechanism of characterized a fungicide, quinofumelin, and its biological impact on transcriptomics and metabolomics in Fusarium graminearum and other Fusarium species. Two sets of data were generated between quinofumelin and no treatment group, and differentially abundant transcripts and metabolites were identified. The authors further focused on uridine/uracil biosynthesis pathway, considering the significant up- and down-regulation observed in final metabolites and some of the genes in the pathways. Using a deletion mutant of one of the genes and in vitro biochemical assays, the authors concluded that quinofumelin binds to the dihydroorotate dehydrogenase.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      Omics datasets were leveraged to understand the physiological impact of quinofumelin, showing the intracellular impact of the fungicide. The characterization of FgDHODHII deletion strains with supplemented metabolites clearly showed the impact of the enzyme on fungal growth.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      Some interpretation of results is not accurate and some experiments lack controls. The comparison between quinofumelin-treated deletion strains, in the presence of different metabolites didn't suggest the fungicide is FgDHODHII specific. A wild type is required in this experiment.

      Potential Impact: Confirming the target of quinofumelin may help understand its resistance mehchanism, and further development of other inhibitory molecules against the target.

      The manuscript would benefit more in explaining the study rationale if more background on previous characterization of this fungicide on Fusarium is given.

      We appreciate the reviewer's suggestion. Under no treatment with quinofumelin, mycelial growth remains normal and does not require restoration. In the presence of quinofumelin treatment, the supplementation of downstream metabolites in the de novo pyrimidine biosynthesis pathway can restore mycelial growth that is inhibited by quinofumelin. The wild-type control group is illustrated in Figure 4. Figure 5b depicts the phenotypes of the deletion mutants. With respect to the relationship among quinofumelin, FgDHODHII, and other metabolites, quinofumelin specifically targets the key enzyme FgDHODHII in the de novo pyrimidine biosynthesis pathway, disrupting the conversion of dihydroorotate to orotate, which consequently inhibits the synthesis downstream metabolites including uracil. In our previous study, quinofumelin not only exhibited excellent antifungal activity against the mycelial growth and spore germination of F. graminearum, but also inhibited the biosynthesis of deoxynivalenol (DON). We have added this part to the introduction section.

      Reviewer #3 (Public review):

      Summary:

      The manuscript shows the mechanism of action of quinofumelin, a novel fungicide, against the fungus Fusarium graminearum. Through omics analysis, phenotypic analysis, and in silico approaches, the role of quinofumelin in targeting DHODH is uncovered.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      The phenotypic analysis and mutant generation are nice data and add to the role of metabolites in bypassing pyrimidine biosynthesis.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      The role of DHODH in this class of fungicides has been known and this data does not add any further significance to the field. The work of Higashimura et al is not appreciated well enough as they already showed the role of quinofumelin upon DHODH II.

      There is no mention of the other fungicide within this class ipflufenoquin, as there is ample data on this molecule.

      We appreciate the reviewer's suggestion. We sincerely appreciate the reviewer's insightful comment regarding the work of Higashimura et al. We agree that their investigation into the role of quinofumelin in DHODH II inhibition provides critical foundational insights for this field. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of their work in the discussion section to more effectively contextualize their contributions. The information regarding action mechanism of ipflufenoquin against filamentous fungi was added in discussion section.

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the DHODH gene had been identified as a target earlier, could the authors perform blast experiments with this gene instead and let us know the percentage similarity between the FgDHODHII gene and the Pyricularia oryzae class II DHODH gene in the report by Higashimura et al., 2022.

      BLAST experiment revealed that the percentage similarity between the FgDHODHII gene and the class II DHODH gene of P. oryzae was 55.41%. We have added the description ‘Additionally, the amino acid sequence of the FgDHODHII exhibits 55.41% similarity to that of DHODHII from Pyricularia oryzae, as previously reported (Higashimura et al., 2022)’ in section Results.

      (2) Abstract:

      The authors started abbreviating new terms e.g. DEG, DMP, etc but then all of a sudden stopped and introduced UMP with no full meaning of the abbreviation. Please give the full meaning of all abbreviations in the text, UMP, STC, RM, etc.

      We have provided the full meaning for all abbreviations as requested.

      (3) Introduction section:

      The introduction talks very little about the work of other groups on quinofumelin. Perhaps add this information in and reference them including the work of Higashimura et al., 2022 which has done quite significant work on this topic but is not even mentioned in the background

      We have added the work of other groups on quinofumelin in section introduction.

      (4) General statements:

      Please show a model of the pyrimidine pathway that quinofumelin attacks to make it easier for the reader to understand the context. They could just copy this from KEGG

      We have added the model (Fig. 7).

      (5) Line 186:

      The authors did a great job of demonstrating interactions with the Quinofumelin and went to lengths to perform MST, SPR, molecular docking, and structural biology analyses yet in the end provide no details about the specific amino acid residues involved in the interaction. I would suggest that site-directed mutagenesis studies be performed on FgDHODHII to identify specific amino acid residues that interact with Quinofumelin and show that their disruption weakens Quinofumelin interaction with FgDHODHII.

      Thank you for this insightful suggestion. We fully agree with the importance of elucidating the interaction mechanism. At present, we are conducting site-directed mutagenesis studies based on interaction sites from docking results and the mutation sites of FgDHODHII from the resistant mutants; however, due to the limitations in the accuracy of existing predictive models, this work remains ongoing. Additionally, we are undertaking co-crystallization experiments of FgDHODHII with quinofumelin to directly and precisely reveal their interaction pattern

      (6) Line 76:

      What is the reference or evidence for the statement 'In addition, quinofumelin exhibits no cross-resistance to currently extensively used fungicides, indicating its unique action target against phytopathogenic fungi.

      If two fungicides share the same mechanism of action, they will exhibit cross resistance. Previous studies have demonstrated that quinofumelin retains effective antifungal activity against fungal strains resistant to commercial fungicides, indicating that quinofumelin does not exhibit cross-resistance with other commercially available fungicides and possesses a novel mechanism of action. Additionally, we have added the relevant inference.

      (7) Line 80-82:

      Again, considering the work of previous authors, this target is not newly discovered. Please consider toning down this statement 'This newly discovered selective target for antimicrobial agents provides a valuable resource for the design and development of targeted pesticides.'

      We have rewritten the description of this sentence.

      (8) Line 138: If the authors have identified DHODH in experimental groups (I assume in F. graminearum), what was the exact locus tag or gene name in F. graminearum, and why not just continue with this gene you identified or what is the point of doing a blast again to find the gene if the DHODH gene if it already came up in your transcriptomic or metabolic studies? This unfortunately doesn't make sense but could be explained better.

      The information of FgDHODHII (gene ID: FGSG_09678) has been added. We have revised this part.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 40:

      Please add a reference.

      We have added the reference

      (2) Line 47:

      Please add a reference.

      We have added the reference.

      (3) Line 50:

      The lack of target diversity in existing fungicides doesn't necessarily serve as a reason for discovering new targets being more challenging than identifying new fungicides within existing categories, please consider adjusting the argument here. Instead, the authors can consider reasons for the lack of new targets in the field.

      We have revised the description.

      (4) Line 63:

      Please cite your source with the new technology.

      We have added the reference.

      (5) Line 68:

      What are you referring to for "targeted medicine", do you have a reference?

      We have revised the description and the reference.

      (6) Line 74:

      One of the papers referred to "quinoxyfen", what are the similarities and differences between the two? Please elaborate for the readership.

      Quinoxyfen, similar to quinofumelin, contains a quinoline ring structure. It inhibits mycelial growth by disrupting the MAP kinase signaling pathway in fungi (https://www.frac.info). In addition, quinoxyfen still exhibits excellent antifungal activity against the quinofumelin-resistant mutants (the findings from our group), indicating that action mechanism for quinofumelin and quinoxyfen differ.

      (7) Line 84:

      Please introduce why RNA-Seq was designed in the study first. What were the groups compared? How was the experiment set up? Without this background, it is hard to know why and how you did the experiment.

      According to your suggestions, we have added the description in Section Results. In addition, the experimental process was described in Section Materials and methods as follows: A total of 20 mL of YEPD medium containing 1 mL of conidia suspension (1×105 conidia/mL) was incubated with shaking (175 rpm/min) at 25°C. After 24 h, the medium was added with quinofumelin at a concentration of 1 μg/mL, while an equal amount of dimethyl sulfoxide was added as the control (CK). The incubation continued for another 48 h, followed by filtration and collection of hyphae. Carry out quantitative expression of genes, and then analyze the differences between groups based on the results of DESeq2 for quantitative expression.

      (8) Figures:

      The figure labeling is missing (Figures 1,2,3 etc). Please re-order your figure to match the text

      The figures have been inserted.

      (9) Line. 97:

      "Volcano plot" is a common plot to visualize DEGs, you can directly refer to the name.

      We have revised the description.

      (10) Figure 1d, 1e:

      Can you separate down- and up-regulated genes here? Does the count refer to gene number?

      The expression information for down- and up-regulated genes is presented in Figure 1a and 1b. However, these bubble plots do not distinguish down- and up-regulated genes. Instead, they only display the significant enrichment of differentially expressed genes in specific metabolic pathways. To more clearly represent the data, we have added the detailed counts of down- and up-regulated genes for each metabolic pathway in Supplementary Table S1 and S2. Here, the term "count" refers to differentially expressed genes that fall within a certain pathway.

      (11) Line 111:

      Again, no reasoning or description of why and how the experiment was done here.

      Based on the results of KEGG enrichment analysis, DEMs are associated with pathways such as thiamine metabolism, tryptophan metabolism, nitrogen metabolism, amino acid sugar and nucleotide sugar metabolism, pantothenic acid and CoA biosynthesis, and nucleotide sugar production compounds synthesis. To specifically investigate the metabolic pathways involved action mechanism of quinofumelin, we performed further metabolomic experiments. Therefore, we have added this description according the reviewer’s suggestions.

      (12) Figure 2a:

      It seems many more metabolites were reduced than increased. Is this expected? Due to the antifungal activity of this compound, how sick is the fungus upon treatment? A physiological study on F. graminearum (in a dose-dependent manner) should be done prior to the omics study. Why do you think there's a stark difference between positive and negative modes in terms of number of metabolites down- and up-regulated?

      Quinofumelin demonstrates exceptional antifungal activity against Fusarium graminearum. The results indicate that the number of reduced metabolites significantly exceeds the number of increased metabolites upon quinofumelin treatment. Mycelial growth is markedly inhibited under quinofumelin exposure. Prior to conducting omics studies, we performed a series of physiological and biochemical experiments (refer to Qian Xiu's dissertation https://paper.njau.edu.cn/openfile?dbid=72&objid=50_49_57_56_49_49&flag=free). Upon quinofumelin treatment, the number of down-regulated metabolites notably surpasses that of up-regulated metabolites compared to the control group. Based on the findings from the down-regulated metabolites, we conducted experiments by exogenously supplementing these metabolites under quinofumelin treatment to investigate whether mycelial growth could be restored. The results revealed that only the exogenous addition of uracil can restore mycelial growth impaired by quinofumelin.

      Quinofumelin exhibits an excellent antifungal activity against F. graminearum. At a concentration of 1 μg/mL, quinofumelin inhibits mycelial growth by up to 90%. This inhibitory effect indicates that life activities of F. graminearum are significantly disrupted by quinofumelin. Consequently, there is a marked difference in down- and up-regulated metabolites between quinofumelin-treated group and untreated control group. The detailed results were presented in Figures 1 and 2.

      (13) Figure 2e:

      This is a good analysis. To help represent the data more clearly, the authors can consider representing the expression using fold change with a p-value for each gene.

      To more clearly represent the data, we have incorporated the information on significant differences in metabolites in the de novo pyrimidine biosynthesis pathway, as affected by quinofumelin, in accordance with the reviewer’s suggestions.

      (14) Line 142:

      Please indicate fold change and p-value for statistical significance. Did you validate this by RT-qPCR?

      We validated the expression level of the DHODH gene under quinofumelin treatment using RT-qPCR. The results indicated that, upon treatment with the EC50 and EC90 concentrations of quinofumelin, the expression of the DHODH gene was significantly reduced by 11.91% and 33.77%, respectively (P<0.05). The corresponding results have been shown in Figure S4.

      (15) Line 145:

      It looks like uracil is the only metabolite differentially abundant in the samples - how did you conclude this whole pathway was impacted by the treatment?

      The experiments involving the exogenous supplementation of uracil revealed that the addition of uracil could restore mycelial growth inhibited by quinofumelin. Consequently, we infer that quinofumelin disrupts the de novo pyrimidine biosynthesis pathway. In addition, as uracil is the end product of the de novo pyrimidine biosynthesis pathway, the disruption of this pathway results in a reduction in uracil levels.

      (16) Figure 3:

      What sequence was used as the root of the tree? Why were the species chosen? Since the BLAST query was Homo sapiens sequence, would it be good to use that as the root?

      FgDHODHII sequence was used as the root of the tree. These selected fungal species represent significant plant-pathogenic fungi in agriculture production. According to your suggestion, we have removed the BLAST query of Homo sapiens in Figure 3.

      (17) Figure 4:

      How were the concentrations used to test chosen?

      Prior to this experiment, we carried out concentration-dependent exogenous supplementation experiments. The results indicated that 50 μg/mL of uracil can fully restore mycelial growth inhibited by quinofumelin. Consequently, we chose 50 μg/mL as the testing concentration.

      (18) Line 164:

      Why do you hypothesize supplementing dihydroorotate would restore resistance? The metabolite seemed accumulated in the treatment condition, whereas downstream metabolites were comparable or even depleted. The DHODH gene expression was suppressed. Would accumulation of dihydroorotate be associated with growth inhibition by quinofumelin? Please include the hypothesis and rationale for the experimental setup.

      DHODH regulates the conversion of dihydroorotate to orotate in the de novo pyrimidine biosynthesis pathway. The inhibition of DHODH by quinofumelin results in the accumulation of dihydroorotate and the depletion of the downstream metabolites, including UMP, uridine and uracil. Consequently, downstream metabolites were considered as positive controls, while upstream metabolite dihydroorotate served as a negative control. This design further demonstrates DHODH as action target of quinofumelin against F. graminearum. In addition, the accumulation of dihydroorotate is not associated with growth inhibition by quinofumelin; however, but the depletion of downstream metabolites in the de novo pyrimidine biosynthesis pathway is closely associated with growth inhibition by quinofumelin.

      (19) Line 168:

      I'm not sure if this conclusion is valid from your results in Figure 4 showing which metabolites restore growth.

      o minimize the potential influence of strain-specific effects, five strains were tested in the experiments shown in Figure 4. For each strain, the first row (first column) corresponds to control condition, while second row (first column) represents treatment with 1 μg/mL of quinofumelin, which completely inhibits mycelial growth. The second row (second column) for each strain represents the supplementation with 50 μg/mL of dihydroorotate fails to restore mycelial growth inhibited by quinofumelin. In contrast, the second row (third column, fourth column, fifth colomns) for each strain demonstrated that the supplementation of 50 μg/mL of UMP, uridine and uracil, respectively, can effectively restore mycelial growth inhibited by quinofumelin.

      (20) Figure 5a:

      The fact you saw growth of the deletion mutant means it's not lethal. However, the growth was severely inhibited.

      Our experimental results indicate that the growth of the deletion mutant is lethal. The mycelial growth observed originates from mycelial plugs that were not exposed to quinofumelin, rather than from the plates amended with quinofumelin.

      (21) Figure 5b:

      Would you expect different restoration of growth in the presence of quinofumelin vs. no treatment? The wild type control is missing here. Any conclusions about the relationship between quinofumelin, FgDHODHII, and other metabolites in the pathway?

      Under no treatment with quinofumelin, mycelial growth remains normal and does not require restoration. In the presence of quinofumelin treatment, the supplementation of downstream metabolites in the de novo pyrimidine biosynthesis pathway can restore mycelial growth that is inhibited by quinofumelin. The wild-type control group is illustrated in Figure 4. Figure 5b depicts the phenotypes of the deletion mutants. With respect to the relationship among quinofumelin, FgDHODHII, and other metabolites, quinofumelin specifically targets the key enzyme FgDHODHII in the de novo pyrimidine biosynthesis pathway, disrupting the conversion of dihydroorotate to orotate, which consequently inhibits the synthesis downstream metabolites including uracil.

      (22) Figure 6b:

      Lacking positive and negative controls (known binder and non-binder). What does the Kd (in comparison to other interactions) indicate in terms of binding strength?

      We tested the antifungal activities of publicly reported DHODH inhibitors (such as leflunomide and teriflunomide) against F. graminearum. The results showed that these inhibitors exhibited no significant inhibitory effects against the strain PH-1. Therefore, we lacked an effective chemical for use as a positive control in subsequent experiments. Biacore experiments offers detailed insights into molecular interactions between quinofumelin and DHODHII. As shown in Figure 6b, the left panel illustrates the time-dependent kinetic curve of quinofumelin binding to DHODHII. Within the first 60 s after quinofumelin was introduced onto the DHODHII surface, it bound to the immobilized DHODHII on the chip surface, with the response value increasing proportionally to the quinofumelin concentration. Following cessation of the injection at 60 s, quinofumelin spontaneously dissociated from the DHODHII surface, leading to a corresponding decrease in the response value. The data fitting curve presented on the right panel indicates that the affinity constant KD of quinofumelin for DHODHII is 6.606×10-6 M, which falls within the typical range of KD values (10-3 ~ 10-6 M) for protein-small molecule interaction patterns. A lower KD value indicates a stronger affinity; thus, quinofumelin exhibits strong binding affinity towards DHODHII.

      Reviewer #3 (Recommendations for the authors):

      The authors should add information about the other molecule within this class, ipflufenoquin, and what is known about it. There are already published data on its mode of action on DHODH and the role of pyrimidine biosynthesis.

      We have added the information regarding action mechanism of ipflufenoquin against filamentous fungi in discussion section.

      The work of Higashimura et al is not appreciated well enough as they already showed the role of quinofumelin upon DHODH II.

      We sincerely appreciate the reviewer's insightful comment regarding the work of Higashimura et al. We agree that their investigation into the role of quinofumelin in DHODH II inhibition provides critical foundational insights for this field. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of their work in the discussion section to more effectively contextualize their contributions.

      It is unclear how the protein model was established and this should be included. What species is the molecule from and how was it obtained? How are they different from Fusarium?

      The three-dimensional structural model of F. graminearum DHODHII protein, as predicted by AlphaFold, was obtained from the UniProt database. Additionally, a detailed description along with appropriate citations has been incorporated in the ‘Manuscript’ file.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive feedback on the work. The reviewer has raised two weaknesses and in the following we discuss how those can be addressed.  

      Weaknesses:

      The impact of the article is limited by using a network with discrete time- steps, and only a small number of time steps from stimulus to reward. They assume that each time step is on the order of hundreds of ms. They justify this by pointing to some slow intrinsic mechanisms, but they do not implement these slow mechanisms is a network with short time steps, instead they assume without demonstration that these could work as suggested. This is a reasonable first approximation, but its validity should be explicitly tested.

      Our goal here was to give a proof of concept that online random feedback is sufficient to train an RNN to estimate value. Indeed, it is important to show that the idea works in a model where the slow mechanisms are explicitly implemented. However, this is a non-trivial task and desired to be addressed in future works.  

      As the delay between cue and reward increases the performance decreases. This is not surprising given the proposed mechanism, but is still a limitation, especially given that we do not really know what a is the reasonable value of a single time step.

      In reply to this comment and the other reviewer's related comment, we have conducted two sets of additional simulations, one for examining incorporation of eligibility traces, and the other for considering (though not mechanistically implementing) behavioral time-scale synaptic plasticity (BTSP). We have added their results to the revised manuscript as Appendix. We think that the results addressed this point to some extent while how longer cue-reward delay can be learnt by elaboration of the model remains as a future issue.

      Reviewer #2 (Public Review):

      We thank the reviewer for the positive feedback on the work. The reviewer gave comments on our revisions, and here we discuss how those can be addressed.

      Comments on revisions: I would still want to see how well the network learns tasks with longer time delays (on the order of 100 or even 1000 timesteps). Previous work has shown that random feedback struggles to encode longer timescales (see Murray 2019, Figure 2), so I would be interested to see how that translates to the RL context in your model.

      We would like to note that in Murray et al 2019 the random feedback per se appeared not to be primarily responsible for the difficulty in encoding longer timesclaes. In the Figure 2d (Murray 2019), the author compared his RFLO (random feedback local online) and BPTT with two intermediate algorithms, which incorporated either one of the two approximations made in RFLO: i) random feedback instead of symmetric feedback, and ii) omittance of non-local effect (i.e., dependence of the derivative of the loss with respect to a given weight on the other weights). The performance difference between RFLO and BPTT was actually mostly explained by ii), as the author mentioned "The results show that the local approximation is essentially fully responsible for the performance difference between RFLO and BPTT, while there is no significant loss in performance due to the random feedback alone. (Line 6-8, page 7 of Murray, 2019, eLife)".

      Meanwhile, regarding the difference in the performance of the model with random feedback vs the model with symmetric feedback in our settings, actually it appeared (already) in the case with 6 time-steps or less (the biologically constrained model with random feedback performed worse: Fig. 6J, left).

      In practice, our model, either with random or symmetric feedback, would not be able to learn the cases with very long delays. This is indeed a limitation of our model. However, our model is critically different from the model of Murray 2019 in that we use RL rather than supervised learning and we use a scalar bootstrapped (TD) reward-prediction-error rather than the true output error. We would think that these differences may be major reasons for the limited learning ability of our model.

      Regarding the feasibility of the model when tasks involve longer time delays: Indeed this is a problem and the other reviewers have also raised the same point. Our model can be extended by incorporating either a kind of eligibility trace (similar one to those contained in RFLO and e-prop) or behavioral time-scale synaptic plasticity (BTSP), and we have added the results of simulations incorporating each to the revised manuscript as Appendix. But how longer cue-reward delay can be learnt by elaboration of the model remains as a future issue.

      Reviewer #3 (Public Review):

      Comments on revisions: Thank you for addressing all my comments in your reply.

      We are happy to learn that all concerns raised by the reviewer in the previous round were addressed adequately. We agree with the reviewer that there are several ways the work can be improved.

      The various points raised by the reviewers at weaknesses are desired to be taken up in future works.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Suggestions:

      Although this study has an impressive dataset, I felt that some parts of the discussion would benefit from further explanation, specifically when discussing the differences in female aggression direction between groups with different sex compositions. In the discussion is suggested that males buffer female-on-female aggression and that they 'support' lower-ranking females (see line 212), however, the study only tested the sex composition of the group and does not provide any evidence of this buffering. Thus, I would suggest adding more information on how this buffering or protection from males might manifest (for example, listing male behaviours that might showcase this protection) or referencing other studies that support this claim. Another example of this can be found in lines 223-224, which suggests that females choose lower-ranking individuals when they are presented with a larger pool of competitors; however, in lines 227-228, it's stated that this result contradicts previous work in baboons, which makes the previous claim seem unjustified. I recommend adding other examples from studies that support the results of this paper and adding a line that addresses reasons why these differences between gorillas and baboons might be caused (for example, different social dynamics or ecological constraints). In addition, I suggest the inclusion of physiological data such as direct measures of energy expenditure, caloric intake, or hormone levels, as it would strengthen the claims made in the second paragraph of the discussion. However, I understand this might not be possible due to data or time constraints, so I suggest adding more robust justification on why lactation and pregnancy were used as a proxy for energetic need. In the methods (lines 127-128), it is unclear which phase of the pregnancy or lactation is more energetically demanding. I would also suggest adding a comment on the limitations of using reproductive state to infer energetic need. Lastly, if the data is available, I believe it would be interesting to add body size and age of the females or the size difference between aggressor and target as explanatory variables in the models to test if physiological characteristics influence female-on-female aggression.

      Male support:

      We have now added more references (Watts 1994, 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influences the likelyhood to receive aggression.

      Number of competitors and choice of weaker competitors:

      We added a very relevant reference in humans, showing that people choose weaker competitors when they have they can choose. We removed the example to baboons because it used sex ratio and the relevance to our study was not that straightforward.

      Reproductive state as a proxy for energetic needs:

      We now mention clearly that reproductive state is an indirect measure of energetic needs.

      We rephrased our methods to: “Lactation is often considered more energetically demanding than pregnancy as a whole but the latest stages of pregnancy are highly energetically demanding, potentially even more than lactation”

      Unfortunately, we do not have access to physiological and body size data. Regarding female age, for many females, ages are estimates with errors up to a decade, and thus, we choose not to use them as a reliable predictor. Having accurate values for all these variables, would indeed be very valuable and improve the predicting power of our study.

      Recommendations for writing and presentation:

      Overall, the manuscript is well-organised and well-written, but there are certain areas that could improve in clarity. In the introduction, I believe that the term 'aggression heuristic' should be introduced earlier and properly defined in order to accommodate a broader audience. The main question and aims of the study are not stated clearly in the last paragraph of the introduction. In the methods, I think it would improve the clarity to add a table for the classification of each type of agonistic interactions instead of naming them in the text. For example, a table that showcase the three intensity categories (severe, mild and moderate), than then dives into each behaviour (e.g. hit, bite, attack, etc.) and a short description of these behaviours, I think this would be helpful since some of the behaviours mentioned can be confusing (what's the difference between attack, hit and fight?). In addition, in line 104, it states that all interactions were assigned equal intensity, which needs to be explained.

      We now define aggression heuristics in both the abstract and the first paragraph of the introduction. We have also explained aggressive interactions that their nature was not obvious from their names. Hopefully, these explanations make clear the differences among the recorded behaviours.

      We have now specified that the “equal intensity” refers to avoidances and displacements used to infer power relationships: “We assigned to all avoidance/displacement interactions equal intensity, that is, equal influence to the power relationship of the interacting individuals”

      Minor corrections:

      (1) In line 41, there is a 1 after 'similar'. I am unsure if it's a mistake or a reference.

      We corrected the typo.

      (2) In lines 68-69, there is mention of other studies, but no references are provided.

      We added citations as suggested.

      (3) Remove the reference to Figure 1 (line 82) from the introduction; the figure should be referenced in the text just before the image, however, your figure is in a different section.

      We removed the reference as suggested.

      (4) Line 98 and 136, it's written 'ad libtum' but the correct spelling is 'ad libitum'.

      We corrected the typo.

      (5) Figure 3, remove the underscores between the words in the axis titles.

      We removed the underscores.

      Reviewer #2 (Recommendations for the authors):

      Here, I have outlined some specific suggestions that require attention. Addressing these comments will enhance the readability and enhance the quality of the manuscript.

      (1) L69. Add citation here, indicating the studies focusing on aggression rates.

      We added citations as suggested.

      (2) L88. The study periods used in this study and the authors' previous study (Reference 11) are different. So please add one table as Table 1 showing the details info on the sampling efforts and data included in their analysis of this study. For example, the study period, the numbers of females and males, sampling hours, the number of avoidance/displacement behaviors used to calculate individual Elo-ratings, and the number of mild/moderate/severe aggressive interactions, etc.

      We have now added another table, as suggested (new Table 1) and we have also made clear that we used the hierarchies presented in detail in (Smit & Robbins 2025).

      (3) L103. If readers do not look over Reference 25 on purpose, they do not know what the authors want to talk about and why they mention the optimized Elo-rating method. Clarify this statement and add more content explaining the differences between the two methods, or just remove it.

      We rephrased the text and in response to the previous comment, we clearly state that there are more details about our approach in Smit & Robbins 2025. At the end of the relevant sentence, we added the following parenthesis “(see “traditional Elo rating method”; we do not use the “optimized Elorating method” as it yields similar results and it is not widely used)” and we removed the sentence referring to the optimized Elo-rating method.

      (4) L110. Here, the authors stated that the individual with the standardized Elo-score 1 was the highest-ranking. L117, the "aggression direction" score of each aggressive interaction was the standardized Elo-score of the aggressor, subtracting that of the recipient. So, when the "aggression direction" score was 1, it should mean that the aggressor was the highest-ranking and the recipient was the lowest-ranking female. This is not as the authors stated in L117-120 (where the description was incorrectly reversed). Please clarify.

      The highest ranking individual has indeed Elo_score equal to 1 and we calculated the interaction score (or "aggression direction score") of each aggressive interaction by subtracting the standardized Elo-score of the aggressor from that of the recipient (Elo_recepient – Elo_aggressor). So, when the aggressor is the lowest-ranking female (Elo_score=0) and the recipient the highestranking female one (Elo_score=1), the "aggression direction score" is 1-0 = 1.

      (5) Regarding point 3 of the Public Review, please also revise/expand the paragraph L193-208 in the Discussion section accordingly.

      Please see our response to the public review. We have enriched the results section, added pairwise comparisons in a new table (Table 2) and modified the discussion accordingly.

      (6) Table 1. It's not clear why authors added the column 'Aggression Rate' but did not provide any explanation in the Methods/Results section. How did they calculate the correlation between each tested variable and the "overall adult female aggression rates"? Correlating the number of females in the first trimester of female pregnancy with the female aggression rates in each study group? What did the correlation coefficients mean? L202-204 may provide some hints as to why the authors introduced the Aggression Rate. But it should be made clear in the previous text.

      We now added more details in the legend of the table to make our point clear: “To highlight that aggression rates can increase due to increase in interactions of different score, we also include the effect of some of the tested variables on overall adult female aggression rates, based on results of linear mixed effects models from (Smit & Robbins 2024).”  We did not include detailed methods to calculate those results because they are detailed in (Smit & Robbins 2024). We find it valuable to show the results of both aggression rates and aggression directionality according to the same predictor variables as a means to clarify that aggression rates and aggression directionality are not always coordinated to one another (they do not always change in a consistent manner relative to one another).

      (7) L166.This is not rigorous. Please rephrase. There is only one western gorilla group containing only one resident male included in the analysis.

      We have toned down our text: “Our results did not show any significant difference between femalefemale aggression patterns within the one western and four mountain gorillas groups”

      (8) L167. I don't think the interaction scores in the third trimester of female pregnancy were significantly higher than those in the first trimester. The same concern applies in L194-195.

      We have now added a new table with post hoc pairwise comparisons among the different reproductive states that clarifies that.

      (9) L202. There is no column 'Aggression rates' in Table 1 of Reference 11.

      We have rephrased to make clear that we refer to Table 1 of the present study.

      (10) L204-205. Reference 49. Maybe not a proper citation here. This claim requires stronger evidence or further justification. Additionally, please rephrase and clarify the arguments in L204208 for better readability and precision.

      We have added three more references and rephrased to clarify our argument.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 41: The word "similar" is misspelled.

      We corrected the typo.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub> , as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub> and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.” Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Figure S7. (C) Scatter plots of DNA sequence alignments between validation and training sets for Human-MANE, mouse, honeybee, zebrafish, and Arabidopsis. Each dot represents an alignment, with the x-axis showing alignment identity and the y-axis showing alignment coverage. Alignments exceeding 80% for both identity and coverage are highlighted in the redshaded region and were excluded from the test sets.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. Author Response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work uses a novel, ethologically relevant behavioral task to explore decision-making paradigms in C. elegans foraging behavior. By rigorously quantifying multiple features of animal behavior as they navigate in a patch food environment, the authors provide strong evidence that worms exhibit one of three qualitatively distinct behavioral responses upon encountering a patch: (1) "search", in which the encountered patch is below the detection threshold; (2) "sample", in which animals detect a patch encounter and reduce their motor speed, but do not stay to exploit the resource and are therefore considered to have "rejected" it; and (3) "exploit", in which animals "accept" the patch and exploit the resource for tens of minutes. Interestingly, the probability of these outcomes varies with the density of the patch as well as the prior experience of the animal. Together, these experiments provide an interesting new framework for understanding the ability of the C. elegans nervous system to use sensory information and internal state to implement behavioral state decisions.

      Strengths:

      The work uses a novel, neuroethologically-inspired approach to studying foraging behavior

      The studies are carried out with an exceptional level of quantitative rigor and attention to detail

      Powerful quantitative modeling approaches including GLMs are used to study the behavioral states that worms enter upon encountering food, and the parameters that govern the decision about which state to enter

      The work provides strong evidence that C. elegans can make 'accept-reject' decisions upon encountering a food resource

      Accept-reject decisions depend on the quality of the food resource encountered as well as on internally represented features that provide measurements of multiple dimensions of internal state, including feeding status and time

      Reviewer #2 (Public review):

      This study provides an experimental and computational framework to examine and understand how C. elegans make decisions while foraging environments with patches of food. The authors show that C. elegans reject or accept food patches depending on a number of internal and external factors.

      The key novelty of this paper is the explicit demonstration of behavior analysis and quantitative modeling to elucidate decision-making processes. In particular, the description of the exploring vs. exploiting phases, and sensing vs. non-sensing categories of foraging behavior based on the clustering of behavioral states defined in a multi-dimensional behavior-metrics space, and the implementation of a generalized linear model (GLM) whose parameters can provide quantitative biological interpretations.

      The work builds on the literature of C. elegans foraging by adding the reject/accept framework.

      Reviewer #3 (Public review):

      Summary:

      In this study by Haley et al, the authors investigated explore-exploit foraging using C. elegans as a model system. Through an elegant set of patchy environment assays, the authors built a GLM based on past experience that predicts whether an animal will decide to stay on a patch to feed and exploit that resource, instead of choosing to leave and explore other patches.

      Strengths:

      I really enjoyed reading this paper. The experiments are simple and elegant, and address fundamental questions of foraging theory in a well-defined system. The experimental design is thoroughly vetted, and the authors provide a considerable volume of data to prove their points. My only criticisms have to do with the data interpretation, which I think are easily addressable.

      Weaknesses:

      History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seem odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      It seems more likely that the worm simply has some memory of chemosensation and relative satiety, both of which increase on patches and decrease while off of patches. The magnitudes are likely a function of patch density. That being said, I leave it up to the reader to decide how best to interpret the data.

      Model design: We agree with the reviewer that past experience is not likely to be discretized into the exact parameters of our model. We have added to our manuscript to further clarify this point (lines 645-647). Investigating the mechanisms behind this behavior is beyond the scope of this project but is certainly an exciting trajectory for future C. elegans research.

      osm-6

      The argument is that osm-6 animals can't sense food very well, so when they sense it, they enter the exploitation state by default. That is what they appear to do, but why? Clearly they are sensing the food in some other way, correct? Are ciliated neurons the only way worms can sense food? Don't they also actively pump on food, and can therefore sense the food entering their pharynx? I think you could provide further insight by commenting on this. Perhaps your decision model is dependent on comparing environmental sensing with pharyngeal sensing? Food intake certainly influences their decision, no? Perhaps food intake triggers exploitation behavior, which can be over-run by chemo/mechanosensory information?

      osm-6 behavior: We thank the reviewer for pointing out the need to further elaborate on a mechanistic hypothesis to explain the behavior of osm-6 sensory mutants. We agree with the reviewer’s speculation that post-ingestive and other non-ciliary sensory cues likely drive detection of food. We have added additional commentary to our manuscript to state this (lines 529-538).

      Impact

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed most of my concerns.

      Reviewer #3 (Recommendations for the authors):

      The authors provide a considerable amount of processed data (great, thank you!), but it would be even better if they provided the raw data of the worm coordinates, and when and where these coordinates overlapped with patches. This is the raw data that was ultimately used for all the quantifications in the paper, and would be incredibly useful to readers who are interested in modeling the data themselves.

      This should not be prohibitive.

      Data Availability: We thank the reviewer for pointing out this need. We are uploading all processed data (e.g. worm coordinates relative to the arena and patches) to a curated data storage server. We have updated our data availability statement to state this (lines 684-688).

      Search vs. sample & sensing vs. non-sensing.

      The different definitions of behaviors in Figures 2H-K are a bit confusing. I think the confusion stems in part from the changing terms and color associations in Figures 2 H-K. Essentially the explore density in Figure 2 H is split into two densities based on the two densities (sensing vs. non-responding) observed in Figure 2I. In turn, the sensing density in Figure 2I is split into two densities (explore vs exploit) based on the two densities observed in Figure 2 H. But the way the figures are colored, yellow means search (Figure 2H) and non-responding (Figure 2I), green means exploit (Figure 2H) which includes sensing and non-responding, but also exclusively sensing (Figure 2I), and blue consistently means exploit in both figures. It might help to use two different color codes for Figures 2H and 2I, and then in 2J you define search as explore AND non-responding, sample as explore AND sensing, and exploit as exploit.

      Color schema: While we understand the confusion, we believe that introducing additional colors may also present some misunderstandings. We have decided to leave the figure as it is.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Two important factors in visual performance are the resolving power of the lens and the signal-to-noise ratio of the photoreceptors. These both compete for space: a larger lens has improved resolving power over a smaller one, and longer photoreceptors capture more photons and hence generate responses with lower noise. The current paper explores the tradeoff of these two factors, asking how space should be allocated to maximize eye performance (measured as encoded information).

      Your summary is clear, concise and elegant. The competition is not just for space, it is for space, materials and energy. We  now emphasise that we are considering these three costs in our rewrites of the Abstract and the first paragraph of the Discussion.  

      Strengths:

      The topic of the paper is interesting and not well studied. The approach is clearly described and seems appropriate (with a few exceptions - see weaknesses below). In most cases, the parameter space of the models are well explored and tradeoffs are clear.  

      Weaknesses:

      Light level

      The calculations in the paper assume high light levels (which reduces the number of parameters that need to be considered). The impact of this assumption is not clear. A concern is that the optimization may be quite different at lower light levels. Such a dependence on light level could explain why the model predictions and experiment are not in particularly good agreement. The paper would benefit from exploring this issue.

      Thank you for raising this point. We briefly explained in our original Discussion, under Understanding the adaptive radiation of eyes (Version 1, Iines 756 – 762), how our method can be modified to investigate eyes adapted for lower light levels. We have some thoughts on how eyes might be adapted. In general, transduction rates are increased by increasing D, reducing f, increasing d<sub>rh</sub> and increasing L . In addition, d<sub>rh</sub> is increased to allow for a larger D within the constraint of eye radius/corneal surface area, and to avoid wasteful oversampling (the changes in D, f and d<sub>rh</sub> increase acceptance angle ∆ρ). We suspect that in eyes optimised for the efficient use of space, materials and energy the increases in L will be relatively small, first because  increasing D, reducing f and increasing d<sub>rh</sub> are much more effective at increasing transduction rate than increasing L. Second, increasing sensitivity by reducing f decreases the cost Vo whereas increasing sensitivity by increasing L increases the cost V<sub>ph</sub>. This disadvantage, together with exponential absorption, might explain why L is only 10% - 20% longer in the apposition eyes of nocturnal bees (Somanathan et al, J. comp. Physiol. A195, 571583, 2009). Because this line of argument is speculative and enters new territory, we have not included it in our revised version. We already present a lot of new material for readers to digest, and we agree with referee 2 that “It is possible to extend the theory to other types of eyes, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory”. Nonetheless, we take your point that some of the eyes in our data set might be adapted for lower light levels, and we have rewritten the Discussion section, How efficiently do insects allocate resources within their apposition eyes accordingly. On line 827 – 843 we address the assumption that eyes are adapted for full daylight,  and also take the opportunity  to mention two more reasons for increasing the eye parameter p: namely increasing image velocity (Snyder, 1979), and constructing  bright zones that increase the detectability of small targets (van Hateren et al., 1989; Straw et al., 2006).

      Discontinuities

      The discontinuities and non-monotonicity of the optimal parameters plotted in Figure 4 are concerning. Are these a numerical artifact? Some discussion of their origin would be quite helpful.

      Good points, we now address the discontinuities in the Results, where they are first observed (lines 311 - 319) 

      Discrepancies between predictions and experiment

      As the authors clearly describe, experimental measurements of eye parameters differ systematically from those predicted. This makes it difficult to know what to take away from the paper. The qualitative arguments about how resources should be allocated are pretty general, and the full model seems a complex way to arrive at those arguments. Could this reflect a failure of one of the assumptions that the model rests on - e.g. high light levels, or that the cost of space for photoreceptors and optics is similar? Given these discrepancies between model and experiment, it is also hard to evaluate conclusions about the competition between optics and photoreceptors (e.g. at the end of the abstract) and about the importance for evolution (end of introduction).

      Your misgivings boil down to two issues: what use is a model that fails to fit the data, and do we need a complicated model to show something that seems to be intuitively obvious?  Our study is useful because it introduces new approaches, methods, factors and explanations which advance our analysis and understanding of eye design and evolution. Your comments make it clear that we failed to get this message across and we have revised the manuscript accordingly. We have rewritten the Abstract and the first paragraph of the Discussion to emphasise the value of our new measure of cost, specific volume, by including more of its practical advantages. In particular, our use of specific volume 1) opens the door to the morphospace of all eyes of given type and cost. 2) This allows one to construct performance surfaces across morphospace that not only identify optima, but by evaluating the sub-optimal cast light on efficiency and adaptability. 3) Shows that photoreceptor energy costs have a major impact on design and efficiency, and 4) allows us to calculate and compare the capacities and efficiencies of compound eyes and simple eyes using a superior measure of cost. It is also possible that your dissatisfaction was deepened by disappointment. The first sentence of our original Abstract said that the goal of design is to maximize performance, so you might have expected to see that eyes are optimised.  Given that optimization provides cast iron proof that a system is designed to be efficient, and previous studies of coding by fly LMCs (Laughlin, 1981; Srinivasan et al., 1982 & van Hateren 1992) validated Barlow’s Efficient Coding Hypothesis by showing that coding is optimised, your expectation is reasonable. However, our investigation of how the allocation of resources to optics and photoreceptors affects an eye’s performance, efficiency and design does not depend a priori  on finding optima, therefore we have removed the “maximized”. Our revised Abstract now says, “to improve performance”.  

      In short, our study illustrates an old adage in statistics “All models fail to fit, but some are useful”. As is often the case, the way in which our model fails is useful. In the original version of the Results and Discussion, we argued that the allocation of resources is efficient, and identified factors that can, in principle, explain the scattering of data points. Indeed, our modelling identifies two of these deficiencies; a lack of data on species-specific energy usage, and the need for models that quantify the relationship between the quality of the captured image and the behavioural tasks for which an eye might be specialised. Thus, by examining the model’s failings we identify critical factors and pose new questions for future research.  We have rewritten the Discussion section How efficiently do insects allocate resources…. to make these points. We hope that these revisions will convince you that we have established a starting point for definitive studies, invented a vehicle that has travelled far enough to discover new territory, and shown that it can be modified to cope with difficult terrain.

      Turning to the need for a complicated model, because the costs and benefits depend on elementary optics and geometry, we too thought that there ought to be a simple model. However, when we tried to formulate a simple set of equations that approximate the definitive findings of our more complicated model we discovered that this is not as straightforward as we thought.  Many of the parameters in our model interact to determine costs and benefits, and many of these interactions are non-linear (e.g. the volumes of shells in spheres involve quadratic and cubic terms, and information depends on the log of a square root). So, rather than hold back publication of our complicated model, we decided to explain how it works as clearly as we can and demonstrate its value.

      In response to your final comment, “it is hard to evaluate conclusions about the competition between optics and photoreceptors (e.g. at the end of the abstract) and about the importance for evolution (end of introduction)”, we stand by our original argument. There must be competition in an eye of fixed cost, and because competition favours a heavy investment in photoreceptors, both in theory and in practice, it  is a significant factor in eye design. A match between investments in optics and photoreceptors is predicted by theory and observed in fly NS eyes, therefore this is a design principle. As for evolution, no one would deny that it is important to view the adaptive radiation of eyes through a cost-benefit lens. Our lens is the first to view the whole eye, optics and photoreceptor array, and the first to treat the costs of space, materials and energy. Although the view through our lens is a bit fuzzy, it reveals that costs, benefits and trade-offs are important. Thus we have established a promising starting point for a new and more comprehensive cost-benefit approach to understanding eye design and evolution.  As for the involvement of genes, when there are heritable changes in phenotype genes must be involved and if, as we suggest, efficient resource allocation is beneficial, the developmental mechanisms responsible for allocating resources to optics and photoreceptor array will be playing a formative role in eye evolution.

      Reviewer #2 (Public Review):

      Summary:

      In short, the paper presents a theoretical framework that predicts how resources should be optimally distributed between receptors and optics in eyes.

      Strengths:

      The authors build on the principle of resource allocation within an organism and develop a formal theory for optimal distribution of resources within an eye between the receptor array and the optics. Because the two parts of eyes, receptor arrays and optics, share the same role of providing visual information to the animal it is possible to isolate these from resource allocation in the rest of the animal. This allows for a novel and powerful way of exploring the principles that govern eye design. By clever and thoughtful assumptions/constraints, the authors have built a formal theory of resource allocation between the receptor array and the optics for two major types of compound eye as well as for camera-type eyes. The theory is formalized with variables that are well characterized in a number of different animal eyes, resulting in testable predictions.

      The authors use the theory to explain a number of design features that depend on different optimal distribution of resources between the receptor array and the optics in different types of eyes. As an example, they successfully explain why eye regions with different spatial resolution should be built in different ways. They also explain differences between different types of eyes, such as long photoreceptors in apposition compound eyes and much shorter receptors in camera type eyes. The predictive power in the theory is impressive.

      To keep the number of parameters at a minimum, the theory was developed for two types of compound eye (neural superposition, and apposition) and for camera-type eyes. It is possible to extend the theory to other types of eyes, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory.

      The paper extends a previous theory, developed by the senior author, that develops performance surfaces for optimal cost/benefit design of eyes. By combining this with resource allocation between receptors and optics, the theoretical understanding of eye design takes a major leap and provides entirely new sets of predictions and explanations for why eyes are built the way they are.

      The paper is well written and even though the theory development in the Results may be difficult to take in for many biologists, the Discussion very nicely lists all the major predictions under separate headings, and here the text is more tuned for readers that are not entirely comfortable with the formalism of the Results section. I must point out though that the Results section is kept exemplary concise. The figures are excellent and help explain concepts that otherwise may go above the head of many biologists.

      We are heartened by your appreciation of our manuscript - it persuaded us not to undertake extensive revisions – thank you.

      Reviewer #3 (Public Review):

      Summary:

      This is a proposal for a new theory for the geometry of insect eyes. The novel costbenefit function combines the cost of the optical portion with the photoreceptor portion of the eye. These quantities are put on the same footing using a specific (normalized) volume measure, plus an energy factor for the photoreceptor compartment. An optimal information transmission rate then specifies each parameter and resource allocation ratio for a variable total cost. The elegant treatment allows for comparison across a wide range of species and eye types. Simple eyes are found to be several times more efficient across a range of eye parameters than neural superposition eyes. Some trends in eye parameters can be explained by optimal allocation of resources between the optics and photoreceptors compartments of the eye.

      Strengths:

      Data from a variety of species roughly align with rough trends in the cost analysis, e.g. as a function of expanding the length of the photoreceptor compartment.

      New data could be added to the framework once collected, and many species can be compared.

      Eyes of different shapes are compared.

      Weaknesses:

      Detailed quantitative conclusions are not possible given the approximations and simplifying assumptions in the models and poor accounting for trends in the data across eye types.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: Panel E defines the parameters described in panel d. Consider swapping the order of those panels (or defining D and Delta Phi in the figure legend for d). Order follows narrative, eye types then match 

      We think that you are referring to Figure 1. We modified the legend.

      Lines 143-145: How does a different relative cost impact your results?

      Thank you for raising this question. Because our assumption that relative costs are the same is our starting point, and for optics it is not an obvious mistake, we do not raise your question here. We address your question where you next raise it because, for photoreceptors the assumption is obviously wrong.  We now emphasise that our method for accounting for photoreceptor energy costs can be applied to other costs. 

      Lines 187-190: Same as above - how do your results change if this assumption is not accurate?

      We have revised our manuscript to emphasise that we are dealing with the situation in which our initial assumption (costs per unit volume are equal) breaks down. On (lines 203 - 208) we write “ However, this assumption breaks down when we consider specific metabolic rates. To enable and power phototransduction, photoreceptors have an exceptionally high specific metabolic rate (energy consumed per gram, and hence unit volume, per second) (Laughlin et al., 1998; Niven et al., 2007; Pangršič et al., 2005). We account for this extra cost by applying an energy surcharge, S<sub>E</sub>. To equate…. 

      We also revised part of the Discussion section, Specific volume is a useful measure of cost to make it clear that we are able take account for situations in which the costs per unit volume are not equal, and we give our treatment of photoreceptor energy costs as an example of how this is done. On lines 626 - 640 we say  

      Cost estimates can be adjusted for situations in which costs per unit volume are not equal, as illustratedby our treatment of photoreceptor energy consumption.  To support transduction the photoreceptor array has an exceptionally high metabolic rate (Laughlin et al., 1998; Niven et al., 2007; Pangršič et al., 2005). We account forthis higher energy cost by using the animal’s specific metabolic rate (power per unit mass and hence power per unit volume) to convert an array’s power consumption into an equivalent volume (Methods). Photoreceptor ion pumps are the major consumers of energy and the smaller contribution of pigmented glia (Coles, 1989) is included in our calculation of the energy tariff K<sub>E</sub>. (Methods) The higher costs of materials and their turnover in the photoreceptor array can be added the energy tariff K<sub>E</sub> but given the magnitude of the light-gated current (Laughlin et al., 1998) the relative increase will be very small. Thus for our intents and purposes the effects of these additional costs are covered by our models. For want of sufficient data…”.

      Reviewer #2 (Recommendations For The Authors):

      A few comments for consideration by the authors:

      (1) In the abstract, Maybe give another example explaining why other eyes should be different to those of fast diurnal insects.

      This worthwhile extrapolation is best kept to the Discussion.

      (2) Would it be worthwhile mentioning that the photopigment density is low in rhabdoms compared to vertebrate outer segments? This will have major effects on the relative size of retina and optics.

      Thank you, we now make this good point in the Discussion (lines 698-702).

      (3) It took me a while to understand what you mean by an energy tariff. For the less initiated reader many other variables may be difficult to comprehend. A possible remedy would be to make a table with all variables explained first very briefly in a formal way and then explained again with a few more words for readers less fluent in the formalism.

      A very useful suggestion. We have taken your advice (p.4).

      (4) The "easy explanation" on lines 356-357 need a few more words to be understandable.

      We have expanded this argument, and corrected a mistake, the width of the head front to back is not 250 μm, it is 600 μm (lines 402-407)

      (5) Maybe devote a short paragraph in the Discussion to other types of eye, such as optical superposition eyes and pinhole eyes. This could be done very shortly and without formalism. I'm sure the authors already have a good idea of the optimal ratio of receptor arrays and optics in these eye types.

      We do not discuss this because we have not found a full account of the trade-offs and their  effects on costs and benefits. We hope that our analysis of apposition and simple eyes will encourage people to analyse the relationships between costs and benefits in other eye types. To this end we pointed out in the Discussion that recent advances in imaging and modelling could be helpful.

      (6)  Could the sentence on lines 668-671 be made a little clearer?

      “Efficiency is also depressed by increasing the photoreceptor energy tariff K<sub>E</sub>, and in line with the greater impact of photoreceptor energy costs in simple eyes, the reduction in efficiency is much greater in simple eyes (Figure 8b).0.

      We replaced this sentence with “In both simple and apposition eyes efficiency is reduced by increasing the photoreceptor energy tariff K<sub>E</sub>. This effect is much greater in simple eyes, thus as found for reductions in photoreceptor length (Figure 7b),K<sub>E</sub> has more impact on the design of simple eyes” (lines749 – 752).

      (7)  I have some reservations about the text on lines 789-796. The problem is that optics can do very little to improve the performance of a directional photoreceptor where delrho should optimally be very wide. Here, membrane folding is the only efficient way to improve performance (SNR). The option to reduce delrho for better performance comes later when simultaneous spatial resolution (multiple pixels) is introduced.

      Yes, we have been careless. We have rewritten this paragraph to say (lines 920-931)

      “Two key steps in the evolution of eyes were the stacking of photoreceptive membranes to absorb more photons, and the formation of optics to intercept more photons and concentrate them according to angle of incidence to form an image (Nilsson, 2013, 2021). Our modelling of well-developed image forming eyes shows that to improve performance stacked membranes (rhabdomeres) compete with optics for the resources invested in an eye, and this competition profoundly influences both form and function. It is likely that competition between optics and photoreceptors was shaping eyes as lenses evolved to support low resolution spatial vision. Thus the developmental mechanisms that allocate resources within modern high resolution eyes (Casares & MacGregor, 2021), by controlling cell size and shape, and as our study emphasises, gradients in size and shape across an eye, will have analogues or homologues in more ancient eyes. Their discovery….” (lines 920-931

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for major revisions:

      While the approach is novel and elegant, the results from the analysis of insect morphology do not broadly support the optimization argument and hardly constrain parameters, like the energy tariff value, at all. The most striking result of the paper is the flat plateau in information across a broad range of shape parameters and the length, and resolution trend in Figure 5.

      At no point in the Results and Discussion do we argue that resource allocation is optimized. Indeed, we frequently observe that it is not. Our mistake was to start the Abstract by observing that animals evolve to minimise costs. We have rewritten the Abstract accordingly.

      The information peaks are quite shallow. This might actually be a very important and interesting result in the paper - the fact that the information plateaus could give the insect eye quite a wide range of parameters to slide between while achieving relatively efficient sensing of the environment. Instead of attempting to use a rather ad hoc and poorly supported measure of energetics in PR cost, perhaps the pitch could focus on this flexibility. K<sub>E</sub> does not seem to constrain eye parameters and does not add much to the paper.

      We agree, being able to construct performance surfaces across morphospace is an important advance in the field of eye design and evolution, and the performance surface’s flat top has interesting implications for the evolution of adaptations. Encouraged by your remarks, we have rewritten the Abstract and the introductory paragraph of the Discussion to draw attention to these points. 

      We are disappointed that we failed to convince you that our energy tariff, K<sub>E</sub> , is no better than a poorly supported ad hoc parameter that does not add much to the paper. In our opinion a resource allocation model that ignores photoreceptor energy consumption is obviously inadequate because the high energy cost of phototransduction is both wellknown and considered to be a formative factor in eye evolution (Niven and Laughlin, 2008). One of the advantages of modelling is that one can assess the impact of factors that are known to be present, are thought to be important, but have not been quantified. We followed standard modelling practice by introducing a cost that has the same units as the other costs and, for good physiological reasons, increases linearly with the number of microvilli, according to K<sub>E</sub>. We then vary this unknown cost parameter to discover when and why it is significant. We were pleased to discover that we could combine data on photoreceptor energy demands and whole animal metabolic rates to establish the likely range of K<sub>E</sub>. This procedure enabled us to unify the cost-benefit analyses of optics and photoreceptors, and to discover that realistic values of K<sub>E</sub> have a profound impact on the structure and performance of an efficient eye. We hope that this advance will encourage people to collect the data needed to evaluate K<sub>E</sub>.To emphasise the importance of K<sub>E</sub> and dispel doubts associated with the failure of the model to fit the data, we have revised two sections:  Flies invest efficiently in costly photoreceptor arrays in the Results, and How efficiently do insects allocate resources within their apposition eyes?  in the Discussion. These rewrites also explain why it is impossible for us to infer K<sub>E</sub> by adjusting its value so that the model’s predictions fit the data.

      The graphics after Figure 3 are quite dense and hard to follow. None of the plateau extent shown in Fig 3 is carried through to the subsequent plots, which makes the conclusions drawn from these figures very hard to parse. If the peak information occurs on a flat plateau, it would be more helpful to see those ranges of parameters displayed in the figures.

      Ideally one should do as you suggest and plot the extent of the plateau, but in our situation this is not very helpful. In the best data set, flies, optimised models predict D well, get close to ∆φ in larger eyes, and demonstrate that these optimum values are not very sensitive to K<sub>E</sub> L is a different matter, it is very sensitive to K<sub>E</sub> L which, as we show (and frequently remind) is poorly constrained by experimental data. The best we can do is estimate the envelope of L vs C<sub>tot</sub>  curves, as defined by a plausible range of K<sub>E</sub>L . Because most of the plateau boundaries you ask for will fall within this envelope, plotting them does little to clear the fog of uncertainty. We note that all three referees agree that our model can account for two robust trends, i) in apposition eyes L increase with optical resolving power and acuity, both within individual eyes and among eyes of different sizes, and ii) L is much longer is apposition eyes than in simple eyes. Nonetheless, the scatter of data points and their failure to fit creates a bad impression. We gave a number of reasons why the model does not fit the data points, but these were scattered throughout the Results and Discussion and, as referees 1 and 3 point out, this makes it difficult to draw convincing conclusions. To rectify this failing, we have rewritten two sections, in the Results Flies invest efficiently in costly photoreceptor arrays and in the Discussion, How efficiently do insects allocate resources within their apposition eyes?, to discuss these reasons en bloc, draw conclusions and suggest how better data and refinements to modelling could resolve these issues.  

      Throughout the figures, the discontinuities in the optimal cuts through parameter space are not sufficiently explained.

      We added a couple of sentences that address the “jumps” (lines 313 – 318)

      None of the data seems to hug any of the optimal lines and only weakly follow the trends shown in the plots. This makes interpretation difficult for the reader and should be better explained. The text can be a little telegraphic in the Results after roughly page 10, and requires several readings to glean insight into the manuscript's conclusions.

      We revised the Results section in which we compare the best data set, flies’  NS eyes with theoretical predictions, Flies invest efficiently in costly photoreceptor arrays,  to expand our interpretation of the data and clarify our arguments. The remaining sections have not been expanded. In the next section, which is on fused rhabdom apposition eyes, our interpretation of the scattering of data points follows the same line of argument. The remaining Results sections are entirely theoretical.  

      Overall, the rough conclusions outlined in the Results seem moderately supported by the matches of the data to the optimal information transmission cuts through parameter space, but only weakly.

      We agree, more data is required to test and refine our theoretical predictions.

      The Discussion is long and well-argued, and contains the most cogent writing in the manuscript.

      Thank you: this is most pleasing. We submitted our study to eLife because it allows longer Discussions, but we worried that ours was too long. However, we felt that our extensive Discussion was necessary for two reasons. First, we are introducing a new approach to understanding of eye design and evolution. Second, because the data on eye morphology and costs are limited, we had to make a number of assumptions and by discussing these, warts and all, we hoped to encourage experimentalists to gather more data and focus their efforts on the most revealing material.  

      Minor comments:

      We have acted upon most of your minor comments and we confine our remarks to our disagreements. We are grateful for your attention to details that we \textshould have picked up on.  

      It's a more standard convention to say "cost-benefit" rather than with a colon. 

      "equation" should be abbreviated "eq" or "eqn", never with a "t"

      when referring to the work of van Hateren, quote the paper and the database using "van Hateren" not just "Hateren"

      small latex note: use "\textit{SNR}" to get the proper formatting for those letters when in the math environment

      Line 100-110: "f" is introduced, but only f' is referenced in the figure. This should be explained in order. d_rh is not included in the figure. Also in this section, d_rh/f is also referenced before \Delta \rho_rf, which is the same quantity, without explanation.  

      Figure 1 shows eye structure and geometry. f’ is a lineal dimension of the eye but f is not, so f is not shown in Fig 1e. We eliminated the confusion surrounding ∆ρ<sub>rh</sub>  by deleting “and changing the acceptance angle of the photoreceptive waveguide ∆ρ<sub>rh</sub> (Snyder, 1979)”.  

      Fig 1 caption: this says "From dorsal to ventral," then describes trends that run ventral to dorsal, which is a confusing typo.

      Fig 3 - adding some data points to these plots might help the reader understand how (or if) K_E is constrained by the data.

      It is not possible to add data points because to total cost, Ctot ,is unknown.

      Fig 4c (and in other subplots): the jumps in L with C_tot could be explained better in the text - it wasn't clear to this reviewer why there are these discontinuities.

      Dealt with in the revised text (lines  310-318).

      Fig 4d: The caption for this subplot could be more clearly written.

      We have rewritten the subscript for subplot 4d.

      Fig 5 and other plots with data: please indicate which symbols are samples from the same species. This info is hard to reconstruct from the tables.

      We have revised Figure 5 accordingly. Species were already indicated in Figure 6.

      Line 328: missing equation number

    1. Reviewer #3 (Public review):

      Summary:

      The manuscript by Qiu and co-workers describes the single-particle cryo-electron microscopy structures of various oligomeric states of the orphan GPCR, GPR3. It describes the monomeric and dimeric structure of a mutant of GPR3 with a modified G-protein complex (miniGs) and then builds on this work to attempt an inactive 'apo' dimer and an allosteric modulator (AF bound dimer structure, by using an ICL3 insertion and stabilizing FAB fragments.

      In general, I'm supportive of the work done in this study, and it does indeed provide valuable insight into GPR3 function. It may be that dimerization of certain class A GPCRs may be a means of signalling regulation or perhaps even amplification. However, some of the interpretation of the single particle data needs some extra attention to strengthen the hypothesis presented in the manuscript.

      Firstly, I want to thank the authors for providing the unfiltered half-maps and PDB models for careful assessment. During this review, I did my own post-processing of the half-maps and used the resultant maps for careful analysis of models.

      So to begin, I understand that the authors didn't model any lipid in the binding orthosteric binding site in any of the maps, but it may be worthwhile to model something in there, as many readers only download coordinates and not the maps.

      A more general point about all the maps. In no case were any focussed refinements carried out. As the point of this paper are some of the finer details between active and intermediate states and the effect of an allosteric modulator, masking out hypervariable portions of the structure and doing local Euler searches would most certainly provide richer insights of the details in GPR3 (especially as the BRIL:Fab structures are not of interest). And also, generally, no 3D-variability studies were performed to see if minor differences in, say, TM4/5/6 positions were due to large variation in the single particles or were a stable consensus position.

      As for the PFK dimeric structure. It appears to be refined with C2 point group symmetry (which is not mentioned anywhere except in a tiny bit of text in a supplemental figure). Was this also calculated in C1 to assess if there is any difference in either GPR3 protomer? Also, how certain are the authors of the cholesterol positions at the bottom of TM4/5? At lower map thresholds in the PFK dimer structure, one of them appears to be continuous with the orthosteric lipid. It also appears that there are many unmodelled lipids in this structure, and only two were assigned as cholesterol. It appears that many of the unmodelled lipids are forming bridging connections between the GPR3 protomers. Also, it may be worthwhile to provide a table of the key interactions between the protomers (although I note that there was a figure highlighting them).

      With the PFK monomer structure, there was weak density for the same cholesterol, which was not modelled in this one; perhaps some commentary on the authors' approach for deciding how to assign density would be helpful. It also appears that the refinement mask was probably a bit tight in this one (something that cryoSPARC is notorious for), and rerefining with a much looser mask around the TM domain may be helpful in resolving the inner lipid leaflet positions.

      The Apo structure, I think, I have the most issues with. Firstly, it is not 'apo'. There is definitely unaccounted for density in the orthosteric site. Also, the structure definitely needs a bit more attention. Firstly, masking out the BRIL and FABs would be a good start in helping better resolve the TMD regions, and then even focussing on a single monomer to increase the map interpretability. My major problem here is that, if this is being called 'apo' and inactive, the map doesn't reflect this; also, the TM5/6 does not look to be in a fully inactive position. The map density (at least around one of the protomers) in this region looks to be poorly resolved, most likely due to averaging due to internal motion. I think some 3DVA is certainly warranted here to strengthen the hypothesis that they have solved an 'apo' inactive.

      The AF (allosteric modulator) bound structure is of significantly better quality. But again, only AF is modelled, and no lipids are. How are the authors sure? Perhaps some focussed refinements (and changing the Euler Origin to centre it on the AF molecule could be a good start). To this reviewer, at least in one of the protomers, adjacent to the AF position, there is a density that looks very much like the allosteric modulator, so it could even be forming a bridging dimer. Also, some potential assignments of the lipids may enlighten some of the structure-activity relationship of this modulator, as it seems to make as many contacts with surrounding lipids as it does with TM4/5. Also, it may be worthwhile exploring carefully the 3DVA of this data. In our studies (Russel et al.), we noted that the orthosteric lipid appears to ratchet back-and-forth in concert with TM4/5 twisting. Perhaps in the AF bound structure, as it binds at the 'exit' site of the lipid, perhaps it is locking in a specific conformation.

    1. Reviewer #3 (Public review):

      Summary:

      In their study McDermott et al. investigate the neurocomputational mechanism underlying sensory prediction errors. They contrast two accounts: representational sharpening and dampening. Representational sharpening suggests that predictions increase the fidelity of the neural representations of expected inputs, while representational dampening suggests the opposite (decreased fidelity for expected stimuli). The authors performed decoding analyses on EEG data, showing that first expected stimuli could be better decoded (sharpening), followed by a reversal during later response windows where unexpected inputs could be better decoded (dampening). These results are interpreted in the context of opposing process theory (OPT), which suggests that such a reversal would support perception to be both veridical (i.e., initial sharpening to increase the accuracy of perception) and informative (i.e., later dampening to highlight surprising, but informative inputs).

      Strengths:

      The topic of the present study is of significant relevance for the field of predictive processing. The experimental paradigm used by McDermott et al. is well designed, allowing the authors to avoid several common confounds in investigating predictions, such as stimulus familiarity and adaptation. The introduction of the manuscript provides a well written summery of the main arguments for the two accounts of interest (sharpening and dampening), as well as OPT. Overall, the manuscript serves as a good overview of the current state of the field.

      Weaknesses:

      In my opinion some details of the methods, results and manuscript raise some doubts about the reliability of the reported findings. Key concerns are:

      (1) In the previous round of comments, I noted that: "I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease (or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible". The authors responded: "we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1%. Given the results of this analysis and to ensure a sufficient number of trials, we focused our further analyses on bins 1-2". However, I do not see how this new analysis addresses the concern that the conclusion highlights differences in decoding performance between bins 1 and 2, yet no contrast between these bins are performed. While I appreciate the addition of the new model, in my current understanding it does not solve the problem I raised. I still believe that if the authors wish to conclude that an effect differs between two bins they must contrast these directly and/or use a different appropriate analysis approach.

      Relatedly, the logarithmic model fitting and how it justifies the focus on analysis bin 1-2 needs to be explained better, especially the rationale of the analysis, the choice of parameters (e.g., why logarithmic, why change of logarithmic fit < 0.1% as criterion, etc), and why certain inferences follow from this analysis. Also, the reporting of the associated results seems rather sparse in the current iteration of the manuscript.

      (2) A critical point the authors raise is that they investigate the buildup of expectations during training. They go on to show that the dampening effect disappears quickly, concluding: "the decoding benefit of invalid predictions [...] disappeared after approximately 15 minutes (or 50 trials per condition)". Maybe the authors can correct me, but my best understanding is as follows: Each bin has 50 trials per condition. The 2:1 condition has 4 leading images, this would mean ~12 trials per leading stimulus, 25% of which are unexpected, so ~9 expected trials per pair. Bin 1 represents the first time the participants see the associations. Therefore, the conclusion is that participants learn the associations so rapidly that ~9 expected trials per pair suffice to not only learn the expectations (in a probabilistic context) but learn them sufficiently well such that they result in a significant decoding difference in that same bin. If so, this would seem surprisingly fast, given that participants learn by means of incidental statistical learning (i.e. they were not informed about the statistical regularities). I acknowledge that we do not know how quickly the dampening/sharpening effects develop, however surprising results should be accompanied with a critical evaluation and exceptionally strong evidence (see point 1). Consider for example the following alternative account to explain these results. Category pairs were fixed across and within participants, i.e. the same leading image categories always predicted the same trailing image categories for all participants. Some category pairings will necessarily result in a larger representational overlap (i.e., visual similarity, etc.) and hence differences in decoding accuracy due to adaptation and related effects. For example, house  barn will result in a different decoding performance compared to coffee cup  barn, simply due to the larger visual and semantic similarity between house and barn compared to coffee cup and barn. These effects should occur upon first stimulus presentation, independent of statistical learning, and may attenuate over time e.g., due to increasing familiarity with the categories (i.e., an overall attenuation leading to smaller between condition differences) or pairs.

      (3) In response to my previous comment, why the authors think their study may have found different results compared to multiple previous studies (e.g. Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011), particularly the sharpening to dampening switch, the authors emphasize the use of non-repeated stimuli (no repetition suppression and no familiarity confound) in their design. However, I fail to see how familiarity or RS could account for the absence of sharpening/dampening inversion in previous studies.

      First, if the authors argument is about stimulus novelty and familiarity as described by Feuerriegel et al., 2021, I believe this point does not apply to the cited studies. Feuerriegel et al., 2021 note: "Relative stimulus novelty can be an important confound in situations where expected stimulus identities are presented often within an experiment, but neutral or surprising stimuli are presented only rarely", which indeed is a critical confound. However, none of the studies (Han et al., 2019; Richter et al., 2018; Kumar et al., 2017; Meyer and Olson, 2011) contained this confound, because all stimuli served as expected and unexpected stimuli, with the expectation status solely determined by the preceding cue. Thus, participants were equally familiar with the images across expectation conditions.

      Second, for a similar reason the authors argument for RS accounting for the different results does not hold either in my opinion. Again, as Feuerriegel et al. 2021 correctly point out: "Adaptation-related effects can mimic ES when the expected stimuli are a repetition of the last-seen stimulus or have been encountered more recently than stimuli in neutral expectation conditions." However, it is critical to consider the precise design of previous studies. Taking again the example of Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. To my knowledge none of these studies contained manipulations that would result in a more frequent or recent repetition of any specific stimulus in the expected compared to unexpected condition. The crucial manipulation in all these previous studies is not that a single stimulus or stimulus feature (which could be subject to familiarity or RS) determines the expectation status, but rather the transitional probability (i.e. cue-stimulus pairing) of a particular stimulus given the cue. Therefore, unless I am missing something critical, simple RS seems unlikely to differ between expectation condition in the previous studies and hence seems implausible to account for differences in results compared to the current study.

      Moreover, studies cited by the authors (e.g. Todorovic & de Lange, 2012) showed that RS and ES are separable in time, again making me wonder how avoiding stimulus repetition should account for the difference in the present study compared to previous ones. I am happy to be corrected in my understanding, but with the currently provided arguments by the authors I do not see how RS and familiarity can account for the discrepancy in results.

      I agree with the authors that stimulus familiarity is a clear difference compared to previous designs, but without a valid explanation why this should affect results I find this account rather unsatisfying. I see the key difference in that the authors manipulated category predictability, instead of exemplar prediction - i.e. searching for a car instead of your car. However, if results in support of OPT would indeed depend on using novel images (i.e. without stimulus repetition), would this not severely limit the scope of the account and hence also its relevance? Certainly, the account provided by the authors casts the net wider and tries to explain visual prediction. Relatedly, if OPT only applies during training, as the authors seem to argue, would this again not significantly narrow the scope of the theory? Combined these two caveats would seem to demote the account from a general account of prediction and perception to one about perception during very specific circumstances. In my understanding the appeal of OPT is that it accounts for multiple challenges faced by the perceptual system, elegantly integrating them into a cohesive framework. Most of this would be lost by claiming that OPT's primary prediction would only apply to specific circumstances - novel stimuli during learning of predictions. Moreover, in the original formulation of the account, as outlined by Press et al., I do not see any particular reason why it should be limited to these specific circumstances. This does of course not mean that the present results are incorrect, however it does require an adequate discussion and acknowledgement in the manuscript.

      Impact:

      McDermott et al. present an interesting study with potentially impactful results. However, given my concerns raised in this and the previous round of comments, I am not entirely convinced of the reliability of the results. Moreover, the difficulty of reconciling some of the present results with previous studies highlights the need for more convincing explanations of these discrepancies and a stronger discussion of the present results in the context of the literature.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer 1 (Public Review):

      Many thanks for the positive and constructive feedback on the manuscript.

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      Thank you for the interesting comment. We now discuss the limitation of task-irrelevant prediction . In brief, some studies which showed sharpening found that task demands were relevant, while some studies which showed dampening were based on task-irrelevant predictions, but it is unlikely that task relevance - which was not manipulated in the current study - would explain the switch between sharpening and dampening that we observe within and across trials.

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

      Thank you for the suggestion. We calculated Pearson’s correlation coefficients for behavioural responses (difference in mean reaction times), neural responses during the sharpening effect (difference in decoding accuracy), and neural responses during the dampening effect for each participant, which resulted in null findings.

      Reviewer 2 (Public Review):

      Thank you for your helpful and constructive comments on the manuscript.

      The strength in controlling for repetition effects by introducing a neutral (50% expectation) condition also adds a weakness to the current version of the manuscript, as this neutral condition is not integrated into the behavioral (reaction times) and EEG (ERP and decoding) analyses. This procedure remained unclear to me. The reported results would be strengthened by showing differences between the neutral and expected (valid) conditions on the behavioral and neural levels. This would also provide a more rigorous check that participants had implicitly learned the associations between the picture category pairings.

      Following the reviewer's suggestion, we have included the neutral condition in the behavioural analysis and performed a repeated measures ANOVA on all three conditions.

      It is not entirely clear to me what is actually decoded in the prediction condition and why the authors did not perform decoding over trial bins in prediction decoding as potential differences across time could be hidden by averaging the data. The manuscript would generally benefit from a more detailed description of the analysis rationale and methods.

      In the original version of the manuscript, prediction decoding aimed at testing if the upcoming stimulus category can be decoded from the response to the preceding ( leading) stimulus. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript as it is now apparent that prediction decoding cannot be separated from category decoding based on pixel information.

      Finally, the scope of this study should be limited to expectation suppression in visual perception, as the generalization of these results to other sensory modalities or to the action domain remains open for future research.

      We have clarified the scope of the study in the revised manuscipt .

      Reviewer 3 (Public Review):

      Thank you for the thought-provoking and interesting comments and suggestions.

      (1) The results in Figure 2C seem to show that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, Figure 2E suggests the prediction (surprisingly, valid or invalid) during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Unless I am misinterpreting the analyses, it seems implausible to me that a prediction, but not actually shown image, can be better decoded using EEG than an image that is presented on-screen.

      Following this and the remaining comments by the Reviewer (see below), we have decided to remove the prediction analysis from the manuscript. Specifically, we have focused on the Reviewer’s concern that it is implausible that image prediction would be better decoded that an image that is presented on-screen. This led us to perform a control analysis, in which we tried to decode the leading image category based on pixel values alone (rather than on EEG responses). Since this decoding was above chance, we could not rule out the possibility that EEG responses to leading images reflect physical differences between image categories. This issue does not extend to trailing images, as the results of the decoding analysis based on trailing images are based on accuracy comparisons between valid and invalid trials, and thus image features are counterbalanced. We would like to thank the Reviewer for raising this issue

      (2) The "prediction decoding" analysis is described by the authors as "decoding the predictable trailing images based on the leading images". How this was done is however unclear to me. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there were only 2 possible trailing image categories: 1 valid, 1 invalid). How is it then possible that the analysis is performed separately for valid and invalid trials? If the authors simply decode which leading image category was shown, but combine L1+L2 and L4+L5 into one class respectively, the resulting decoder would in my opinion not decode prediction, but instead dissociate the representation of L1+L2 from L4+L5, which may also explain why the time-course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding predictions (e.g. Kok et al. 2017). Instead for the prediction analysis to be informative about the prediction, the decoder ought to decode the representation of the trailing image during the leading image and inter-stimulus interval. Therefore I am at present not convinced that the utilized analysis approach is informative about predictions.

      In this analysis, we attempted to decode ( from the response to leading images) which trailing categories ought to be presented. The analysis was split between trials where the expected category was indeed presented (valid) vs. those in which it was not (invalid). The separation of valid vs invalid trials in the prediction decoding analysis served as a sanity check as no information about trial validity was yet available to participants. However, as mentioned above, we have decided to remove the “prediction decoding” analysis based on leading images as we cannot disentangle prediction decoding from category decoding.

      (3) I may be misunderstanding the reported statistics or analyses, but it seems unlikely that >10  of the reported contrasts have the exact same statistic of Tmax= 2.76 . Similarly, it seems implausible, based on visual inspection of Figure 2, that the Tmax for the invalid condition decoding (reported as Tmax = 14.903) is substantially larger than for the valid condition decoding (reported as Tmax = 2.76), even though the valid condition appears to have superior peak decoding performance. Combined these details may raise concerns about the reliability of the reported statistics.

      Thank you for bringing this to our attention. This copy error has now been rectified.

      (4) The reported analyses and results do not seem to support the conclusion of early learning resulting in dampening and later stages in sharpening. Specifically, the authors appear to base this conclusion on the absence of a decoding effect in some time-bins, while in my opinion a contrast between time-bins, showing a difference in decoding accuracy, is required. Or better yet, a non-zero slope of decoding accuracy over time should be shown ( not contingent on post-hoc and seemingly arbitrary binning).

      Thank you for the helpful suggestion. We have performed an additional analysis to address this issue, we calculated the trial-by-trial time-series of the decoding accuracy benefit for valid vs. invalid for each participant and averaged this benefit across time points for each of the two significant time windows. Based on this, we fitted a logarithmic model to quantify the change of this benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1% (i.e., accuracy was stabilized). Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 to directly assess the effects of learning. This is explained in more detail in the revised manuscript .

      (5) The present results both within and across trials are difficult to reconcile with previous studies using MEG (Kok et al., 2017; Han et al., 2019), single-unit and multi-unit recordings (Kumar et al., 2017; Meyer & Olson 2011), as well as fMRI (Richter et al., 2018), which investigated similar questions but yielded different results; i.e., no reversal within or across trials, as well as dampening effects with after more training. The authors do not provide a convincing explanation as to why their results should differ from previous studies, arguably further compounding doubts about the present results raised by the methods and results concerns noted above.

      The discussion of these findings has been expanded in the revised manuscript . In short, the experimental design of the above studies did not allow for an assessment of these effects prior to learning. Several of them also used repeated stimuli (albeit some studies changed the pairings of stimuli between trials), potentially allowing for RS to confound their results.

      Recommendations for the Authors:

      Reviewer 1 (Recommendations for the authors):

      (1) On a first read, I was initially very confused by the statement on p.7 that each stimulus was only presented once - as I couldn't then work out how expectations were supposed to be learned! It became clear after reading the Methods that expectations are formed at the level of stimulus category (so categories are repeated multiple times even if exemplars are not). I suspect other readers could have a similar confusion, so it would be helpful if the description of the task in the 'Results' section (e.g., around p.7) was more explicit about the way that expectations were generated, and the (very large) stimulus set that examples are being drawn from.

      Following your suggestion, we have clarified the paradigm by adding details about the categories and the manner in which expectations are formed.

      (2) p.23: the authors write that their 1D decoding images were "subjected to statistical inference amounting to a paired t-test between valid and invalid categories". What is meant by 'amounting to' here? Was it a paired t-test or something statistically equivalent? If so, I would just say 'subjected to a paired t-test' to avoid any confusion, or explaining explicitly which statistic inference was done over.

      We have rephrased this as “subjected to (1) a one-sample t-test against chance-level, equivalent to a fixed-effects analysis, and (2) a paired t-test”.

      Relatedly, this description of an analysis amounting to a 'paired t-test' only seems relevant for the sensory decoding and memory decoding analyses (where there are validity effects) rather than the prediction decoding analysis. As far as I can tell the important thing is that the expected image category can be decoded, not that it can be decoded better or worse on valid or invalid trials.

      In the previous version of the manuscript, the comparison of prediction decoding between valid and invalid trials was meant as a sanity check. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript due to confounds.

      It would be helpful if authors could say a bit more about how the statistical inferences were done for the prediction decoding analyses and the 'condition against baseline' contrasts (e.g., when it is stated that decoding accuracy in valid trials *,in general,* is above 0 at some cluster-wise corrected value). My guess is that this amounts to something like a one-sample t-test - but it may be worth noting that one-sample t-tests on information measures like decoding accuracy cannot support population-level inference, because these measures cannot meaningfully be below 0 (see Allefeld et al, 2016).

      When testing for decoding accuracy against baseline, we used one-sample t-tests against chance level (rather than against 0) throughout the manuscript. We now clarify in the manuscript that this corresponds to a fixed-effects analysis (Allefeld et al., 2016). In contrast, when testing for differences in decoding accuracy between valid and invalid conditions, we used paired-sample t-tests. As mentioned above, the prediction decoding analysis has been removed from the analysis.

      (3) By design, the researchers focus on implicit predictive learning which means the expectations being formed are ( by definition) task-irrelevant. I thought it could be interesting if the authors might speculate in the discussion on how they think their results may or may not differ when predictions are deployed in task-relevant scenarios -  particularly given that some studies have found sharpening effects do not seem to depend on task demands ( e.g., Kok et al, 2012 ; Yon et al, 2018)  while other studies have found that some dampening effects do seem to depend on what the observer is attending to ( e.g., Richter et al, 2018) . Do these results hint at a possible explanation for why this might be? Even if the authors think they don't, it might be helpful to say so!

      Thank you for the interesting comment. We have expanded on this in the revised manuscript.

      Reviewer 2  (Recommendations for the authors):

      Methods/results

      (1) The goal of this study is the assessment of expectation effects during statistical learning while controlling for repetition effects, one of the common confounds in prediction suppression studies (see, Feuerriegel et al., 2021). I agree that this is an important aspect and I assume that this was the reason why the authors introduced the P=0.5 neutral condition (Figure 1B, L3). However, I completely missed the analyses of this condition in the manuscript. In the figure caption of Figure 1C, it is stated that the reaction times of the valid, invalid, and neutral conditions are shown, but only data from the valid and invalid conditions are depicted. To ensure that participants had built up expectations and had learned the pairing, one would not only expect a difference between the valid and invalid conditions but also between the valid and neutral conditions. Moreover, it would also be important to integrate the neutral condition in the multivariate EEG analysis to actually control for repetition effects. Instead, the authors constructed another control condition based on the arbitrary pairings. But why was the neutral condition not compared to the valid and invalid prediction decoding results? Besides this, I also suggest calculating the ERP for the neutral condition and adding it to Figure 2A to provide a more complete picture.

      As mentioned above, we have included the neutral condition in the behavioural analysis, as outlined in the revised manuscript. We have also included a repeated measures ANOVA on all 3 conditions. The purpose of the neutral condition was not to avoid RS, but rather to provide a control condition. We avoided repetition by using individual, categorised stimuli. Figure 1C has been amended to include the neutral condition). In response to the remaining comments, we have decided to remove the prediction decoding analysis from the manuscript.

      (2) One of the main results that is taken as evidence for the OPT is that there is higher decoding accuracy for valid trials (indicate sharpening) early in the trial and higher decoding accuracy for invalid trials (indicate dampening) later in the trial. I would have expected this result for prediction decoding that surprisingly showed none of the two effects. Instead, the result pattern occurred in sensory decoding only, and partly (early sharpening) in memory decoding. How do the authors explain these results? Additionally, I would have expected similar results in the ERP; however, only the early effect was observed. I missed a more thorough discussion of this rather complex result pattern. The lack of the opposing effect in prediction decoding limits the overall conclusion that needs to be revised accordingly.

      Since sharpening vs. dampening rests on the comparison between valid and invalid trials, evidence for sharpening vs. dampening could only be obtained from decoding based on responses to trailing images. In prediction decoding (removed from the current version), information about the validity of the trial is not yet available. Thus, our original plan was to compare this analysis with the effects of validity on the decoding of trailing images (i.e. we expected valid trials to be decoded more accurately after the trailing image than before). The results of the memory decoding did mirror the sensory decoding of the trailing image in that we found significantly higher decoding accuracy of the valid trials from 123-180 ms. As with the sensory decoding, there was a tendency towards a later flip (280-296 ms) where decoding accuracy of invalid trials became nominally higher, but this effect did not reach statistical significance in the memory decoding.

      (3) To increase the comprehensibility of the result pattern, it would be helpful for the reader to clearly state the hypotheses for the ERP and multivariate EEG analyses. What did you expect for the separate decoding analyses? How should the results of different decoding analyses differ and why? Which result pattern would (partly, or not) support the OPT?

      Our hypotheses are now stated in the revised manuscript.

      (4) I was wondering why the authors did not test for changes during learning for prediction decoding. Despite the fact that there were no significant differences between valid and invalid conditions within-trial, differences could still emerge when the data set is separated into bins. Please test and report the results.

      As mentioned above, we have decided to remove the prediction decoding analysis from the current version of the manuscript.

      (5) To assess the effect of learning the authors write: 'Given the apparent consistency of bins 2-4, we focused our analyses on bins 1-2.' Please explain what you mean by 'apparent consistency'. Did you test for consistency or is it based on descriptive results? Why do the authors not provide the complete picture and perform the analyses for all bins? This would allow for a better assessment of changes over time between valid and invalid conditions. In Figure 3, were valid and invalid trials different in any of the QT3 or QT4 bins in sensory or memory encoding?

      We have performed an additional analysis to address this issue. The reasoning behind the decision to focus on bins 1-2 is now explained in the revised manuscript. In short, fitting a learning curve to trial-by-trial decoding estimates indicates that decoding stabilizes within <50% of the trials. To quantify changes in decoding occurring within these <50% of the trials while ensuring a sufficient number of trials for statistical comparisons, we decided to focus on bins 1-2 only.

      (6) Please provide the effect size for all statistical tests.

      Effect sizes have now been provided.

      (7) Please provide exact p-values for non-significant results and significant results larger than 0.001.

      Exact p-values have now been provided.

      (8) Decoding analyses: I suppose there is a copy/paste error in the T-values as nearly all T-values on pages 11 and 12 are identical (2.76) leading to highly significant p-values (0.001) as well as non-significant effects (>0.05). Please check.

      Thank you for bringing this to our attention. This error has now been corrected.

      (9) Page 12:  There were some misleading phrases in the result section. To give one example: 'control analyses was slightly above change' - this sounds like a close to non-significant effect, but it was indeed a highly significant effect of p<0.001. Please revise.

      This phrase was part of the prediction decoding analysis and has therefore been removed.

      (10) Sample size: How was the sample size of the study be determined (N=31)? Why did only a subgroup of participants perform the behavioral categorization task after the EEG recording? With a larger sample, it would have been interesting to test if participants who showed better learning (larger difference in reaction times between valid and invalid conditions) also showed higher decoding accuracies.

      This has been clarified in the revised manuscript. In short, the larger sample size of N=31 was based on previous research; ten participants were initially tested as part of a pilot which was then expanded to include the categorisation task.

      (11) I assume catch trials were removed before data analyses?

      We have clarified that catch trials were indeed removed prior to analyses.

      (12) Page 23, 1st line: 'In each, the decoder...' Something is missing here.

      Thank you for bringing this to our attention, this sentence has now been rephrased as “In both valid and invalid analyses” in the revised manuscript.

      Discussion

      (1) The analysis over multiple trials showed dampening within the first 15 min followed by sharpening. I found the discussion of this finding very lengthy and speculative (page 17). I recommend shortening this part and providing only the main arguments that could stimulate future research.

      Thank you for the suggestion. Since Reviewer 3 has requested additional details in this part of the discussion, we have opted to keep this paragraph in the manuscript. However, we have also made it clearer that this section is relatively speculative and the arguments provided for the across trials dynamics are meant to stimulate further research.

      (2) As this task is purely perceptual, the results support the OPT for the area of visual perception. For action, different results have been reported. Suppression within-trial has been shown to be larger for expected than unexpected features of action targets and suppression even starts before the start of the movement without showing any evidence for sharpening ( e.g., Fuehrer et al., 2022, PNAS). For suppression across trials, it has been found that suppression decreases over the course of learning to associate a sensory consequence to a specific action (e.g., Kilteni et al., 2019, ELife). Therefore, expectation suppression might function differently in perception and action (an area that still requires further research). Please clarify the scope of your study and results on perceptual expectations in the introduction, discussion, and abstract.

      We have clarified the scope of the study in the revised manuscript.

      Figures

      (1) Figure 1A: Add 't' to the arrow to indicate time.

      This has been rectified.

      (2) Figure 3:  In the figure caption, sensory and memory decoding seem to be mixed up. Please correct. Please add what the dashed horizontal line indicates.

      Thank you for bringing this to our attention, this has been rectified.

      Reviewer 3  (Recommendations for the authors):

      I applaud the authors for a well-written introduction and an excellent summary of a complicated topic, giving fair treatment to the different accounts proposed in the literature. However, I believe a few additional studies should be cited in the Introduction, particularly time-resolved studies such as Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. This would provide the reader with a broader picture of the current state of the literature, as well as point the reader to critical time-resolved studies that did not find evidence in support of OPT, which are important to consider in the interpretation of the present results.

      The introduction has been expanded to include the aforementioned studies in the revised manuscript.

      Given previous neuroimaging studies investigating the present phenomenon, including with time-resolved measures (e.g. Kok et al., 2017; Han et al., 2019; Kumar et al., 2017; Meyer & Olson 2011), why do the authors think that their data, design, or analysis allowed them to find support for OPT but not previous studies? I do not see obvious modifications to the paradigm, data quantity or quality, or the analyses that would suggest a superior ability to test OPT predictions compared to previous studies. Given concerns regarding the data analyses (see points below), I think it is essential to convincingly answer this question to convince the reader to trust the present results.

      The most obvious alteration to the paradigm is the use of non-repeated stimuli. Each of the above time-resolved studies utilised repeated stimuli (either repeated, identical stimuli, or paired stimuli where pairings are changed but the pool of stimuli remains the same), allowing for RS to act as a confound as exemplars are still presented multiple times. By removing this confound, it is entirely plausible that we may find different time-resolved results given that it has been shown that RS and ES are separable in time (Todorovic & de Lange, 2012). We also test during learning rather than training participants on the task beforehand. By foregoing a training session, we are better equipped to assess OPT predictions as they emerge. In our across-trial results, learning appears to take place after approximately 15 minutes or 432 trials, at which point dampening reverses to sharpening. Had we trained the participants prior to testing, this effect would have been lost.

      What is actually decoded in the "prediction decoding" analysis? The authors state that it is "decoding the predictable trailing images based on the leading images" (p.11). The associated chance level (Figure 2E) is indicated as 50%. This suggests that the classes separated by the SVM are T6 vs T7. How this was done is however unclear. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there are only 2 possible trailing images, where one is the valid and the other the invalid image). How is it then possible that the analysis is performed separately for valid and invalid trials? Are the authors simply decoding which leading image was shown, but combine L1+L2 and L4+L5 into one class respectively? If so, this needs to be better explained in the manuscript. Moreover, the resulting decoder would in my opinion not decode the predicted image, but instead learn to dissociate the representation of L1+L2 from L4+L5, which may also explain why the time course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding (prestimulus) predictions (e.g. Kok et al. 2017). If this is indeed the case, I find it doubtful that this analysis relates to prediction. Instead for the prediction analysis to be informative about the predicted image the authors should, in my opinion, train the decoder on the representation of trailing images and test it during the prestimulus interval.

      As mentioned above, the prediction decoding analysis has been removed from the manuscript. The prediction decoding analysis was intended as a sanity check, as validity information was not yet available to participants.

      Related to the point above, were the leading/trailing image categories and their mapping to L1, L2, etc. in Figure 1B fixed across subjects? I.e. "'beach' and 'barn' as 'Leading' categories would result in 'church' as a 'Trailing' category with 75% validity" (p.20) for all participants? If so, this poses additional problems for the interpretation of the analysis discussed in the point above, as it may invalidate the control analyses depicted in Figure 2E, as systematic differences and similarities in the leading image categories could account for the observed results.

      Image categories and their mapping were indeed fixed across participants. While this may result in physical differences and similarities between images influencing results, counterbalancing categories across participants would not have addressed this issue. For example, had we swapped “beach” with “barn” in another participant, physical differences between images may still be reflected in the prediction decoding. On the other hand, counterbalancing categories across trials was not possible given our aim of examining the initial stages of learning over trials. Had we changed the mappings of categories throughout the experiment for each participant, we would have introduced reversal learning and nullified our ability to examine the initial stages of learning under flat priors. In any case, the prediction decoding analysis has been removed from the manuscript, as outlined above.

      Why was the neutral condition L3 not used for prediction decoding? After all, if during prediction decoding both the valid and invalid image can be decoded, as suggested by the authors, we would also expect significant decoding of T8/T9 during the L3 presentation.

      In the neutral condition, L3 was followed by T8 vs. T9 with 50% probability, precluding prediction decoding. While this could have served as an additional control analysis for EEG-based decoding, we have opted for removing prediction decoding from the analysis. However, in response to the other Reviewers’ comments, the neutral condition has now been included in the behavioral analysis.

      The following concern may arise due to a misunderstanding of the analyses, but I found the results in Figures 2C and 2E concerning. If my interpretation is correct, then these results suggest that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, the predicted (valid or invalid) image during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Does this seem reasonable? Unless I am misinterpreting the analyses, it seems implausible to me that a prediction but not actually shown image can be better decoded than an on-screen image. Moreover, to my knowledge studies reporting decoding of predictions can (1) decode expectations just above chance level (e.g. Kok et al., 2017; which is expected given the nature of what is decoded) and (2) report these prestimulus effects shortly before the anticipated stimulus onset, and not coinciding with the leading image onset ~800ms before the predicted stimulus onset. For the above reasons, the key results reported in the present manuscript seem implausible to me and may suggest the possibility of problems in the training or interpretation of the decoding analysis. If I misunderstood the analyses, the analysis text needs to be refined. If I understood the analyses correctly, at the very least the authors would need to provide strong support and arguments to convince the reader that the effects are reliable (ruling out bias and explaining why predictions can be decoded better than on-screen stimuli) and sensible (in the context of previous studies showing different time-courses and results).

      As explained above, we have addressed this concern by performing an additional analysis, implementing decoding based on image pixel values. Indeed we could not rule out the possibility that “prediction” decoding reflected stimulus differences between leading images.

      Relatedly, the authors use the prestimulus interval (-200 ms to 0 ms before predicted stimulus onset) as the baseline period. Given that this period coincides with prestimulus expectation effects ( Kok et al., 2017) , would this not result in a bias during trailing image decoding? In other words, the baseline period would contain an anticipatory representation of the expected stimulus ( Kok et al., 2017) , which is then subtracted from the subsequent EEG signal, thereby allowing the decoder to pick up on this "negative representation" of the expected image. It seems to me that a cleaner contrast would be to use the 200ms before leading image onset as the baseline.

      The analysis of trailing images aimed at testing specific hypotheses related to differences between decoding accuracy in valid vs. invalid trials. Since the baseline was by definition the same for both kinds of trials (since information about validity only appears at the onset of the trailing image), changing the baseline would not affect the results of the analysis. Valid and invalid trials would have the same prestimulus effect induced by the leading image.

      Again, maybe I misunderstood the analyses, but what exactly are the statistics reported on p. 11 onward? Why is the reported Tmax identical for multiple conditions, including the difference between conditions? Without further information this seems highly unlikely, further casting doubts on the rigor of the applied methods/analyses. For example: "In the sensory decoding analysis based on leading images, decoding accuracy was above chance for both valid (Tmax= 2.76, pFWE < 0.001) and invalid trials (Tmax= 2.76, pFWE < 0.001) from 100 ms, with no significant difference between them (Tmax= 2.76, pFWE > 0.05) (Fig. 2C)" (p.11).

      Thank you for bringing this to our attention. As previously mentioned, this copy error has been rectified in the revised manuscript.

      Relatedly, the statistics reported below in the same paragraph also seem unusual. Specifically, the Tmax difference between valid and invalid conditions seems unexpectedly large given visual inspection of the associated figure: "The decoding accuracy of both valid (Tmax = 2.76, pFWE < 0.001) and invalid trials (Tmax = 14.903, pFWE < 0.001)" (p.12). In fact, visual inspection suggests that the largest difference should probably be observed for the valid not invalid trials (i.e. larger Tmax).

      This copy error has also been rectified in the revised manuscript.

      Moreover, multiple subsequent sections of the Results continue to report the exact same Tmax value. I will not list all appearances of "Tmax = 2.76" here but would recommend the authors carefully check the reported statistics and analysis code, as it seems highly unlikely that >10 contrasts have exactly the same Tmax. Alternatively, if I misunderstand the applied methods, it would be essential to better explain the utilized method to avoid similar confusion in prospective readers.

      This error has also now been rectified. As mentioned above the prediction decoding analysis has been removed.

      I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease ( or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible.

      Thank you for the helpful suggestion. As previously mentioned we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1 %. Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 . This is explained in more detail in the revised manuscript.

      Relatedly, based on the literature there is no reason to assume that the dampening effect disappears with more training, thereby placing more burden of proof on the present results. Indeed, key studies supporting the dampening account (including human fMRI and MEG studies, as well as electrophysiology in non-human primates) usually seem to entail more learning than has occurred in bin 2 of the present study. How do the authors reconcile the observation that more training in previous studies results in significant dampening, while here the dampening effect is claimed to disappear with less training?

      The discussion of these findings has been expanded on in the revised manuscript. As previously outlined, many of the studies supporting dampening did not explicitly test the effect of learning as they emerge, nor did they control for RS to the same extent.

      The Methods section is quite bare bones. This makes an exact replication difficult or even impossible. For example, the sections elaborating on the GLM and cluster-based FWE correction do not specify enough detail to replicate the procedure. Similarly, how exactly the time points for significant decoding effects were determined is unclear (e.g., p. 11). Relatedly, the explanation of the decoding analysis, e.g. the choice to perform PCA before decoding, is not well explained in the present iteration of the manuscript. Additionally, it is not mentioned how many PCs the applied threshold on average resulted in.

      Thank you for this suggestion, we have described our methods in more detail.

      To me, it is unclear whether the PCA step, which to my knowledge is not the default procedure for most decoding analyses using EEG, is essential to obtain the present results. While PCA is certainly not unusual, to my knowledge decoding of EEG data is frequently performed on the sensor level as SVMs are usually capable of dealing with the (relatively low) dimensionality of EEG data. In isolation this decision may not be too concerning, however, in combination with other doubts concerning the methods and results, I would suggest the authors replicate their analyses using a conventional decoding approach on the sensory level as well.

      Thank you for this suggestion, we have explained our decision to use PCA in the revised manuscript.

      Several choices, like the binning and the focus on bins 1-2 seem rather post-hoc. Consequently, frequentist statistics may strictly speaking not be appropriate. This further compounds above mentioned concerns regarding the reliability of the results.

      The reasoning behind our decision to focus on bins 1-2 is now explained in more detail in the revised manuscript.

      A notable difference in the present study, compared to most studies cited in the introduction motivating the present experiment, is that categories instead of exemplars were predicted.

      This seems like an important distinction to me, which surprisingly goes unaddressed in the Discussion section. This difference might be important, given that exemplar expectations allow for predictions across various feature levels (i.e., even at the pixel level), while category predictions only allow for rough (categorical) predictions.

      The decision to use categorical predictions over exemplars lies in the issue of RS, as it is impossible to control for RS while repeating stimuli over many trials. This has been discussed in more detail in the revised manuscript.

      While individually minor problems, I noticed multiple issues across several figures or associated figure texts. For example: Figure 1C only shows valid and invalid trials, but the figure text mentions the neutral condition. Why is the neutral condition not depicted but mentioned here? Additionally, the figure text lacks critical information, e.g. what the asterisk represents. The error shading in Figure 2 would benefit from transparency settings to not completely obscure the other time-courses. Increasing the figure content and font size within the figure (e.g. axis labels) would also help with legibility (e.g. consider compressing the time-course but therefore increasing the overall size of the figure). I would also recommend using more common methods to indicate statistical significance, such as a bar at the bottom of the time-course figure typically used for cluster permutation results instead of a box. Why is there no error shading in Figure 2A but all other panels? Fig 2C-F has the y-axis label "Decoding accuracy (%)" but certainly the y-axis, ranging roughly from 0.2 to 0.7, is not in %. The Figure 3 figure text gives no indication of what the error bars represent, making it impossible to interpret the depicted data. In general, I would recommend that the authors carefully revisit the figures and figure text to improve the quality and complete the information.

      Thank you for the suggestions. Figure 1C now includes the neutral condition. Asterisks denote significant results. The font size in Figure 2C-E has been increased. The y-axis on Figure 2C-E has been amended to accurately reflect decoding accuracy in percentage. Figure 2A has error shading, however, the error is sufficiently small that the error shading is difficult to see. The error bars in Figure 3 have been clarified.

      Given the choice of journal (eLife), which aims to support open science, I was surprised to find no indication of (planned) data or code sharing in the manuscript.

      Plans for sharing code/data are now outlined in the revised manuscript.

      While it is explained in sufficient detail later in the Methods section, it was not entirely clear to me, based on the method summary at the beginning of the Results section, whether categories or individual exemplars were predicted. The manuscript may benefit from clarifying this at the start of the Results section.

      Thank you for this suggestion, following this and suggestions from other reviewers, the experimental paradigm and the mappings between categories has been further explained in the revised manuscript, to make it clearer that predictions are made at the categorical level.

      "Unexpected trials resulted in a significantly increased neural response 150 ms after image onset" (p.9). I assume the authors mean the more pronounced negative deflection here. Interpreting this, especially within the Results section as "increased neural response" without additional justification may stretch the inferences we can make from ERP data; i.e. to my knowledge more pronounced ERPs could also reflect increased synchrony. That said, I do agree with the authors that it is likely to reflect increased sensory responses, it would just be useful to be more cautious in the inference.

      Thank you for the interesting comment, this has been rephrased as a “more pronounced negative deflection” in the revised manuscript.

      Why was the ERP analysis focused exclusively on Oz? Why not a cluster around Oz? For object images, we may expect a rather wide dipole.

      Feuerriegel et al (2021) have outlined issues questioning the robustness of univariate analyses for ES, as such we opted for a targeted ROI approach on the channel showing peak amplitude of the visually evoked response (Fig. 2B). More details on this are in the revised manuscript.           

      How exactly did the authors perform FWE? The description in the Method section does not appear to provide sufficient detail to replicate the procedure.

      FWE as implemented in SPM is a cluster-based method of correcting for multiple comparisons using random field theory. We have explained our thresholding methods in more detail in the revised manuscript.

      If I misunderstand the authors and they did indeed perform standard cluster permutation analyses, then I believe the results of the timing of significant clusters cannot be so readily interpreted as done here (e.g. p.11-12); see: Maris & Oostenveld 2007; Sassenhagen & Dejan 2019.

      All statistics were based on FWE under random field theory assumptions (as implemented in SPM) rather than on cluster permutation tests (as implemented in e.g.  Fieldtrip)

      Why did the authors choose not to perform spatiotemporal cluster permutation for the ERP results?

      As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021).

      Some results, e.g. on p.12 are reported as T29 instead of Tmax. Why?

      As mentioned above, prediction decoding analyses have been removed from the manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains. 

      Strengths: 

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Thank you for these comments.

      Weaknesses: 

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors. 

      Thank you for these comments. We point out that, while certain RT-qPCR data may not be statistically significant, our RNAseq data indicate late genes are, as a group, statistically significantly increased when increasing c-di-AMP levels and decreased when decreasing c-di-AMP levels. We do not believe running additional experiments to “achieve” statistical significance in the RT-qPCR data is worthwhile. We hope the reviewer agrees with this assessment.

      We have also included new data in this revised manuscript, which we believe further strengthens aspects of the conclusions linked to individual expression of full-length DacA isoforms. We have also quantified inclusion areas and bacterial sizes for critical strains.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data. 

      Thank you for your comments. Chlamydia is not an easy experimental system, but we have done our best to address the reviewer’s concerns in this revised submission.

      Describing the significance of the findings: 

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well. 

      Describing the strength of evidence: 

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported. 

      dacA-ybbR ectopic expression: 

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      As the reviewer notes, we also generated RNAseq data, which validates that late gene transcripts (including sigma28 and sigma54 regulated genes) are statistically significantly increased earlier in the developmental cycle in parallel to increased c-di-AMP levels. The lack of statistical significance in the RT-qPCR data for omcB, which shows a trend of higher transcripts, is less concerning given the statistically significantly RNAseq dataset. We have reported the data from three replicates for the RT-qPCR and do not think it would be worthwhile to attempt more replicates in an attempt to “achieve” statistical significance.

      We recognize that hctA may also increase during stress as noted by the Grieshaber Lab. In re-evaluating these data, we decided to remove the Penicillin-linked studies from the manuscript since they detract from the focus of the story we are trying to tell given the potential caveat the reviewer mentions.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers. 

      dacA knockdown and dacA(mut) 

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Thank you for this comment. We agree that it is not surprising given the shift in cell forms. The reduction in hctA transcripts argues against a stress state as noted above by the reviewer, and the RNAseq data from dacA-KD conditions indicates at least that secondary differentiation has been delayed. We agree that more time points would help address the reviewer’s point, but the time and cost to perform such studies is prohibitive with an obligate intracellular bacterium.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells. 

      We agree that it seems that perturbing c-di-AMP production by knockdown or overexpressing the mutant DacA(D164N) has different impacts on chlamydial growth. We have generated new data, which we believe addresses this. Overexpressing membrane-localized DacA isoforms is clearly detrimental to chlamydiae as noted in the manuscript. However, when we removed the transmembrane domain and expressed N-terminal truncations of these isoforms, we observed no effects of overexpression on chlamydial morphology or growth. Importantly, for the wild-type full-length or truncated isoforms, overexpressing each resulted in the same level of c-di-AMP production, further supporting that the negative effect of overexpressing the wild-type full-length is linked to its membrane localization and not c-di-AMP levels. These data have been included as new Figure 3. These data indicate that too much DacA in the membrane is disruptive and suggest that the balance of DacA to YbbR is important since overexpression of both did not result in the same phenotype. This is further described in the Discussion.

      As it relates to knockdown of dacA-ybbR, we have essentially removed/reduced the amount of these proteins from the membrane and have blocked the production of c-di-AMP. This is fundamentally different from overexpression.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings. 

      Reviewer #1 (Recommendations for the authors): 

      There is a generally consistent set of experiments conducted with each of the mutant strains, allowing a straightforward examination of the effects of each transformant. There are a few general and specific things that need to be addressed for both the benefit of the reader and the accuracy of interpretation. The following is a list of items that need to be addressed in the document, with an overall goal of making it more readable and making the interpretations more quantitatively defended. 

      Specific comments: 

      (1) The manuscript overall is wordy and there are quite a few examples of text in the results that should be in the discussion (examples include lines 224-225, 248-262, 282-288, 304-308) the manuscript overall could use a careful editing for verbosity. 

      Thank you for this comment. We have removed some of the indicated sentences. However, to maintain the flow and logic of the manuscript, some statements may have been preserved to help transition between sections. As far as verbosity, we have tried to be as clear as possible in our descriptions of the results to minimize ambiguity. Others who read our manuscript appreciated the thoroughness of our descriptions.

      (2) There is also a trend in the document to base fact statements on qualitative and quantitative differences that do not approach statistical significance. Examples of this include the following: lines 156-158, 190-192, 198-199, 230-232, 239-242, 292-293). This is something the authors need to be careful about, as these different statistically insignificant differences may tend to multiply a degree of uncertainty across the entire manuscript. 

      We have quantified inclusion areas and tried to remove instances of qualitative assessments as noted by the reviewer. In regards to some of the transcripts, we can only report the data as they are. In some cases, there are trends that are not statistically significant, but it would seem to be inaccurate to state that they were unchanged. In other cases, a two-fold or less difference in transcript levels may be statistically significant but biologically insignificant. A reader can and should make their own conclusions.

      (3) Any description of inclusion or RB size being modestly different needs to be defended with microscopic quantification. 

      We have quantified inclusion areas and RB sizes and tried to remove instances of qualitative assessments as noted by the reviewer.

      (4) It would be very helpful to reviewers if there was a figure number added to each figure in the reviewer-delivered text. 

      Added.

      (5) Figure 1A: This should indicate that the genes indicated beneath each developmental form are on high (I think that is what that means). 

      We have reorganized Figure 1 to better improve the flow.

      (6) Figure 1B is exactly the same as the three images in Figure 8B. I would delete this in Figure 1. This relates to comment 9. 

      We presented this intentionally to clearly illustrate to the reader, who may not be knowledgeable in this area, what we propose is happening in the various strains. As such, we respectfully disagree and have left this aspect of the figure unchanged.

      (7) Figure 1D: It is not clear if the period in E.V has any meaning. I think this is just a typo. Also, the color coding needs to be indicated here. What do the gray bars represent? The labeling for the gene schematic for dacA-KDcom should not be directly below the first graph in D. This makes the reader think this is a label for the graph. This can be accomplished if the image in panel B is removed and the first graph in panel D is moved into B. This will make a better figure. 

      We have reorganized Figure 1 to better improve the flow.

      (8) Figure 2 C, G: The utility of these panels is not clear. For them to have any value, they need to be expressed in genome copies. If they are truly just a measure of chlamydia genomic DNA, they have minimal utility to the reader. There are similar panels in several other figures. 

      We have reported genome copies as suggested in lieu of ng gDNA for these measurements. Importantly, it does not alter any interpretations.

      (9) I am not sure about the overall utility of Figure 8. Granted, a summary of their model is useful, but the cartoons in the figure are identical or very nearly identical to model figures shown in two other publications from the same group (PMID: 39576108, 39464112) These are referenced at least tangentially in the current manuscript (Jensen paper- now published- and ref 53). Because the model has been published before, if they are to be included, there needs to be a direct comparison of the results in each of these three papers, as they basically describe the same developmental process. The model images should also be referenced directly to the first of the other papers.

      This was intentional so that readers familiar with our work will see the similarities between these systems. We have added additional comments in the Discussion related to our newly published work. As an aside, Dr. Lee generated the first version of the figure that was adapted by others in the lab. It is perhaps unlucky that those other studies have been published before his work.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Given the importance that these coupling mechanisms have been given in theory, this is a timely and important contribution to the literature in terms of determining whether these theoretical assumptions hold true in human data.

      Thank you!

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we have discussed this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how overall spindle amplitude was related to coupling as an indicator of oscillation strength overnight– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can provide a more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as the most important moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, allowing future meta-analyses to incorporate these measures comprehensively. We have added this discussion to the revised version of the manuscript (p. 3) to further clarify these points.

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we have revised the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references (p. 13). We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work, and have found evidence that participants with a more frontal topography of fast spindles show better overnight consolidation. These findings will be presented in our future publications. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz and found a frontal-dominated topography (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual and age differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the propagating nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). In addition, the frontal region has the strongest and most active SOs as its origin site, which may contribute to the role of frontal coupling. In contrast, not all SOs propagate from PFC to centro-parietal sites. The reviewer also raised an interesting idea that slow spindles would be perfectly suited for memory consolidation given their frontal distribution. We propose that one possible explanation is that if SOs couple exclusively with slow SPs, they may lose their ability to coordinate inter-area activity between centro-parietal and frontal regions, which could play a critical role in long-range memory transmission across hippocampus, thalamus, and prefrontal cortex. This hypothesis requires investigation in future studies. We believe a better understanding of coupling in the context of the propagation of these waves will help us better understand the observed frontal relationship with consolidation. Therefore, we believe this result supports our conclusion that coupling precision is more important than intensity, and we have addressed this in revised manuscript (pp. 15-16).

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and have added more information in section 3.4-3.5 (p. 17) to advocate for a standardized “template” used to report effect sizes and correct multiple comparisions in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, strength, and prevalence (density, count, and/or percentage coupled). Each coupling metric captures distinct a property of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We have reported these additional results in the revised manuscript (pp. 6, 11), and interpret “the moderator effect of age in the phase-memory association becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups, and they represent the aging effects.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect”.

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with stronger associations observed in moderator subgroups that have historically exhibited better memory performance, particularly after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We have included discussion about the influence of moderators and hierarchical structures on the dynamics of coupling-memory associations (pp. 17, 20). In addition, we have updated the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation” (p. 1).

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we have included more details in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). In addition, priors can also increase transparency, since all assumptions are formally encoded and open to critique or sensitivity analysis. In contrast, frequentist methods often rely on hidden or implicit assumptions such as homogeneity of variance, fixed-effects models, and independence of observations that are not directly testable. Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots, which is an advantage of Bayesian models in handling heterogeneity. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript (pp. 21-23).

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We have included clearer references in the updated version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. In addition, Hamiltonian Monte Carlo (HMC) is the default algorithm Stan (the software we used to fit Bayesian models) uses to sample from the posterior distribution in Bayesian models. It is a type of MCMC method designed to be faster and more efficient than traditional sampling algorithms, especially for complex or high-dimensional models. We have added exemplary plots in the supplemental material S4.1-4.3 and the method section (pp. 21-22) to explain the results and interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true r_z” (note that we use r_z instead of Z_r in the revised manuscript per your suggestion). The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) r_z correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true r_z and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like r_z = 0.65 is unlikely. We loop through simulations to generate population data and ensure their r_z values fall within a threshold. For moderate effect sizes (e.g., r_z = 0.35), this is straightforward using a narrow range (0.34 < r_z < 0.35). However, for larger effect sizes like r_z = 0.65, a wider range (0.6 < r_z < 0.7) is required. therefore sometimes the population we used to draw the sample has a r_z slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that a large r_z still has a normal sampling distribution, but not focus specifically on achieving r_z = 0.65.

      We acknowledge that this variability of the range used was not clearly explained in supplemental material 12 and it is not accurate to report “true r_z = 0.65”. In the revised version, we have addressed this issue by adding vertical lines to each subplot to indicate the r_z of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we have revised the title to “Sampling distributions of r_z drawn from strong correlations

      (r_z = 0.6-0.7)”. We confirmed that population r_z and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming r_z = -1 represents the null hypothesis is not accurate. The circlin r_z = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population under the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we updated Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r =0.5). We thank the reviewer again for their valuable feedback.

      Reviewer #2 (Recommendations for the authors):

      (1) There is an extra space in the Notes of Figure 1. "SW R sharp-wave ripple.".

      We thank the reviewer for pointing this out. We have confirmed that the "extra space" is not an actual error but a result of how italicized Times New Roman font is rendered in the LaTeX format. We believe that the journal’s formatting process will resolve this issue.

      (2) In the introduction, slow oscillations (SO) are defined with a frequency of 0.16-4 Hz, sleep spindles (SP) at 8-16 Hz, and sharp-wave ripples (SWR) at 80-300 Hz. The term "fast oscillation" (FO) is first introduced with the clarification "SPs in our case." However, on page 2, the authors state, "SO-FO coupling involving SWRs, SPs, and SOs..." There seems to be a discrepancy in the definition of FO; does it consistently refer to SPs and SWRs throughout the article?

      We appreciate the reviewer’s observation regarding the potential ambiguity of the term "FO." In our manuscript, "FO" is used as a general term to describe the interaction of a "relatively faster oscillation" with a "relatively slower oscillation" in the phase-amplitude coupling mechanism, therefore it is not intended to exclusively refer to SPs or SWRs. For example, it is usually used to describe SO–SP–SWR couplings during sleep memory studies, but Theta–Alpha–Gamma couplings in wakeful memory studies. To address this confusion, we removed the phrase "SPs in our case" and explicitly use "SPs" when referring to spindles. In addition, we have replaced "fast oscillation" with "faster oscillation" to emphasize that it is used in a relative sense (p. 1), rather than to refer to a specific oscillation. Also, we only retained the term “FO” when introducing the PAC mechanism.

      (3) On page 2, the first paragraph contains the phrase: "...which occur in the precise hierarchical temporal structure of SO-FO coupling involving SWRs, SPs, and SOs ..." Since "SO-FO" refers to slow and fast oscillations, it is better to maintain the order of frequencies, suggesting it as: SOs, SPs, and SWRs.

      We sincerely thank the reviewer for their valuable suggestion. We have updated the sentence to maintain the correct order from the lowest to the highest frequencies in the revised version (p. 2).

      (4) References should be provided:

      a “Studies using calcium imaging after SP stimulation explained the significance of the precise coupling phase for synaptic plasticity.".

      b. "Electrophysiology evidence indicates that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions."

      c. "Since some studies found that fast SPs predominate in the centroparietal region, while slow SPs are more common in the frontal region, a significant amount of studies only extracted specific types of SPs from limited electrodes. Some studies even averaged all electrodes to estimate coupling..."

      This is a great point.  These have been referenced as follows:

      a. Rephrased: “Studies using calcium imaging and SP stimulation explained the significance of the precise coupling phase for synaptic plasticity.” We changed “after” to “and” to reflect that these were conducted as two separate experiments. This is a summary statement, with relevant citations provided in the following two sentences of the paragraph, including Niethard et al., 2018, and Rosanova et al., 2005. (p. 2)

      b. Included diverse sources of evidence: “Electrophysiology evidence from studies included in our meta-analysis (e.g. Denis et al., 2021; Hahn et al., 2020; Mylonas et al., 2020) and others (e.g. Bartsch et al., 2019; Muehlroth et al., 2019; Rodheim et al., 2023) reported that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions.” (p. 3)

      c. Added references and more details: “Since some studies found that fast SPs predominate in the centroparietal region, while slow SPs are more common in the frontal region, a significant amount of studies selectively extracted specific types of SPs from limited electrodes (e.g. Dehnavi et al., 2021; Perrault et al., 2019; Schreiner et al., 2021). Some studies even averaged all electrodes in their spectral and/or time-series analysis to estimate metrics of oscillations and their couplings (e.g. Denis et al., 2022; Mölle et al., 2011; Nicolas et al., 2022).” (p. 4)

      Reviewer #3 (Recommendations for the authors):

      There are a number of terms that are not clearly defined or used:

      (1) SP amplitude. Does this mean only the amplitude of coupled spindles or of spindles in general?

      This refers to the amplitude of spindles in general. We clarified this in the revised text (and see response to reviewer #1, point #1).

      (2) The definition of a small effect

      We thank the reviewer again for raising this important question. As we responded in the public review, small effect sizes are common in neuroscience and meta-analyses due to the complexity of the underlying mechanisms and the presence of numerous confounding variables and hierarchical levels. To help readers better interpret effect sizes, we changed rigid ranges to widely accepted benchmarks for effect size levels in neuroscience research: small (r=0.1), moderate (r=0.3), and large (r=0.5; Cohen, 1988). We also noted that an evidence and context-based framework will provide a more practical way to interpret the observed effect sizes compared to rigid categorizations.

      (3) Can a BF10 based on experimental evidence actually be "infinite" and a probability actually be 1.00?

      We appreciate the reviewer for highlighting this potential confusion. The formula used to calculate BF10 is P(data | H1) / P(data | H0). In the experimental setting with an informative prior, an ‘infinite’ BF10 value indicates that all posterior samples are overwhelmingly compatible with H1 given the data and assumptions (Cox et al., 2023; Heck et al., 2023; Ly et al., 2016). In such cases, the denominator P(data | H0) becomes vanishingly small, leading BF10 to converge to infinity. This scenario occurs when the probability of H1 converges to 1 (e.g., 0.9999999999…).

      It is a well-established convention in Bayesian statistics to report the Bayes factor as "infinity" in cases where the evidence is overwhelmingly strong, and BF10 exceeds the numerical limits of the computation tools to become effectively infinite. To address this ambiguity, we added a footnote in the revised version of the manuscript to clarify the interpretation of an 'infinite' BF10 . (p. 8)

      (4) Z_r should be renamed to r_z or similar. These are not Z values (-inf..+inf), but r values (-1..1).

      We thank the reviewers for their suggestions. We agree that r_z would provide a clearer and more accurate interpretation, while z is more appropriate for referring to Fisher's z-transformed r (see point (5)). We have updated the notation accordingly.

      (5) Also, it remains quite unclear at which points in the analyses, "r" values or "Fisher's z transformed r" values are used. Assumptions of normality should only apply to the transformed values. However, the formulas for the random effects model seem to assume normality for r values.

      The correlation values were z-transformed during preprocessing to ensure normality and the correct estimation of sampling variances before running the models. The outputs were then back-transformed to raw r values only when reporting the results to help readers interpret the effect size. We mentioned this in Section 5.5.1, therefore the normality assumptions are not a concern. We have updated the notation r to z (-inf..+inf) in the formula of the random and mixed effect models in the revised version of the manuscript (p. 22).

      Language

      (1) Frequency. In the introduction, the authors use "frequency" when they mean something like the incidence of spindles.

      We agree that the term "frequency" has been used inconsistently to describe both the incidence of events and the frequency bands of oscillations. We have replaced "frequency" with "prevalence" to refer to the incidence of coupling events where applicable (p. 3).

      (2) Moderate and mediate. These two terms are usually meant to indicate two different types of causal influences.

      Thanks for the reviewer’s suggestions. We agree that "moderate" is more appropriate to describe moderators in this study since it does not directly imply causality. We have replaced mediate with moderate in relevant contexts.

      (3) "the moderate effect of memory task is relatively weak": "moderator effect" or "moderate effect"?

      We appreciate the reviewer for pointing out this mistake. We have updated the term to "moderator effect" in Section 2.2.2 (p. 6).

      (4) "in frontal regions we found a latest coupled but most precise and strong SO-fast SP coupling" Meaning?

      We thank the reviewer for bringing this concern of clarity to our attention. By 'latest,' we refer to the delayed phase of SO-fast SP coupling observed in the frontal regions compared to the central and parietal regions (see Figure 5), "Precise and strong" describes the high precision and strength of phase-locking between the SO up-state and the fast SP peak in these regions. We have rephrased this sentence to be: “We found that SO-fast SP coupling in the frontal region occurred at the latest phase observed across all regions, characterized by the highest precision and strength of phase-locking.” to improve clarity (p. 9).

      (5) Figure 5 and others contain angles in degrees and radians.

      We appreciate the reviewer pointing out this inconsistency. We have updated the manuscript and supplementary material to consistently use radians throughout.

    1. Reviewer #2 (Public review):

      The revised manuscript by Altan et al. includes some real improvements to the visualizations and explanations of the authors' thesis statement with respect to fMRI measurements of pRF sizes. In particular, the deposition of the paper's data has allowed me to probe and refine several of my previous concerns. While I still have major concerns about how the data are presented in the current draft of the manuscript, my skepticism about data quality overall has been much alleviated. Note that this review focuses almost exclusively on the fMRI data as I was satisfied with the quality of the psychophysical data and analyses in my previous review.

      Major Concerns

      (I) Statistical Analysis

      In my previous review, I raised the concern that the small sample size combined with the noisiness of the fMRI data, a lack of clarity about some of the statistics, and a lack of code/data likely combine to make this paper difficult or impossible to reproduce as it stands. The authors have since addressed several aspects of this concern, most importantly by depositing their data. However their response leaves some major questions, which I detail below.

      First of all, the authors claim in their response to the previous review that the small sample size is not an issue because large samples are not necessary to obtain "conclusive" results. They are, of course, technically correct that a small sample size can yield significant results, but the response misses the point entirely. In fact, small samples are more likely than large samples to erroneously yield a significant result (Button et al., 2013, DOI:10.1038/nrn3475), especially when noise is high. The response by the authors cites Schwarzkopf & Huang (2024) to support their methods on this front. After reading the paper, I fail to see how it is at all relevant to the manuscript at hand or the criticism raised in the previous review. Schwarzkopf & Huang propose a statistical framework that is narrowly tailored to situations where one is already certain that some phenomenon (like the adaptation of pRF size to spatial frequency) either always occurs or never occurs. Such a framework is invalid if one cannot be certain that, for example, pRF size adapts in 98% of people but not the remaining 2%. Even if the paper were relevant to the current study, the authors don't cite this paper, use its framework, or admit the assumptions it requires in the current manuscript. The observation that a small dataset can theoretically lead to significance under a set of assumptions not appropriate for the current manuscript is not a serious response to the concern that this manuscript may not be reproducible.

      To overcome this concern, the authors should provide clear descriptions of their statistical analyses and explanations of why these analyses are appropriate for the data. Ideally, source code should be published that demonstrates how the statistical tests were run on the published data. (I was unable to find any such source code in the OSF repository.) If the effects in the paper were much stronger, this level of rigor might not be strictly necessary, but the data currently give the impression of being right near the boundary of significance, and the manuscript's analyses needs to reflect that. The descriptions in the text were helpful, but I was only able to approximately reproduce the authors analyses based on these descriptions alone. Specifically, I attempted to reproduce the Mood's median tests described in the second paragraph of section 3.2 after filtering the data based on the criteria described in the final paragraph of section 3.1. I found that 7/8 (V1), 7/8 (V2), 5/8 (V3), 5/8 (V4), and 4/8 (V3A) subjects passed the median test when accounting for the (40) multiple comparisons. These results are reasonably close to those reported in the manuscript and might just differ based on the multiple comparisons strategy used (which I did not find documented in the manuscript). However, Mood's median test does not test the direction of the difference-just whether the medians are different-so I additionally required that the median sigma of the high-adapted pRFs be greater than that of the low-adapted pRFs. Surprisingly, in V1 and V3, one subject each (not the same subject) failed this part of the test, meaning that they had significant differences between conditions but in the wrong direction. This leaves 6/8 (V1), 7/8 (V2), 4/8 (V3), 5/8 (V4), and 4/8 (V3A) subjects that appear to support the authors' conclusions. As the authors mention, however, this set of analyses runs the risk of comparing different parts of cortex, so I also performed Wilcox signed-rank tests on the (paired) vertex data for which both the high-adapted and low-adapted conditions passed all the authors' stated thresholds. These results largely agreed with the median test (only 5/8 subjects significant in V1 but 6/8 in in V3A, other areas the same, though the two tests did not always agree which subjects had significant differences). These analyses were of course performed by a reviewer with a reviewer's time commitment to the project and shouldn't be considered a replacement for the authors' expertise with their own data. If the authors think that I have made a mistake in these calculations, then the best way to refute them would be to publish the source code they used to threshold the data and to perform the same tests.

      Setting aside the precise values of the relevant tests, we should also consider whether 5 of 8 subjects showing a significant effect (as they report for V3, for example) should count as significant evidence of the effect? If one assumes, as a null hypothesis, that there is no difference between the two conditions in V3 and that all differences are purely noise, then a binomial test across subjects would be appropriate. Even if 6 of 8 subjects show the effect, however (and ignoring multiple comparisons), the p-value of a one-sided binomial test is not significant at the 0.05 level (7 of 8 subjects is barely significant). Of course, a more rigorous way to approach this question could be something like an ANOVA, and the authors use an ANOVA analysis of the medians in the paragraph following their use of Mood's median test. However, ANOVA assumes normality, and the authors state in the previous paragraph that they employed Mood's median test because "the distribution of the pRF sizes is zero-bounded and highly skewed" so this choice does not make sense. The Central Limits Theorem might be applied to the medians in theory, but with only 8 subjects and with an underlying distribution of pRF sizes that is non-negative, the relevant data will almost certainly not be normally distributed. These tests should probably be something like a Kruskal-Wallis ANOVA on ranks.

      All of the above said, my intuition about the data is currently that there are significant changes to the adapted pRF size in V2. I am not currently convinced that the effects in other visual areas are significant, and I suspect that the paper would be improved if authors abandoned their claims that areas other than V2 show a substantial effect. Importantly, I don't think this causes the paper to lose any impact-in fact, if the authors agree with my assessments, then the paper might be improved by focusing on V2. Specifically, the authors' already discuss psychophysical work related to the perception of texture on pages 18 and 19 and link it to their results. V2 is also implicated in the perception of texture (see, for example, Freeman et al., 2013; DOI:10.1038/nn.3402; Ziemba et al., 2016, DOI:10.1073/pnas.1510847113; Ziemba et al., 2019; DOI:10.1523/JNEUROSCI.1743-19.2019) and so would naturally be the part of the visual cortex where one might predict that spatial frequency adaptation would have a strong effect on pRF size. This neatly connects the psychophysical and imaging sides of this project and could make a very nice story out of the present work.

      (II) Visualizations

      The manuscript's visual evidence regarding the pRF data also remains fairly weak (but I found the pRF size comparisons in the OSF repository and Figure S1 to be better evidence-more in the next paragraph). The first line of the Results section still states, "A visual inspection on the pRF size maps in Figure 4c clearly shows a difference between the two conditions, which is evident in all regions." As I mentioned in my previous review, I don't agree with this claim (specifically, that it is clear). My impression when I look at these plots is of similarity between the maps, and, where there is dissimilarity, of likely artifacts. For example, the splotch of cortex near the upper vertical meridian (ventral boundary) of V1 that shows up in yellow in the upper plot but not the lower plot also has a weirdly high eccentricity and a polar angle near the opposite vertical meridian: almost certainly not the actual tuning of that patch of cortex. If this is the clearest example subject in the dataset, then the effect looks to me to be very small and inconsistently distributed across the visual areas. That said, I'm not convinced that the problem here is the data-rather, I think it's just very hard to communicate a small difference in parameter tuning across a visual area using this kind of side-by-side figure. I think that Figure S2, though noisy (as pRF maps typically are), is more convincing than Figure 4c, personally. For what it's worth, when looking at the data myself, I found that plotting log(𝜎(H) / 𝜎(L)), which will be unstable when noise causes 𝜎(H) or 𝜎(L) to approach zero, was less useful than plotting plotting (𝜎(H) - 𝜎(L)) / (𝜎(H) + 𝜎(L)). This latter quantity will be constrained between -1 and 1 and shows something like a proportional change in the pRF size (and thus should be more comparable across eccentricity).

      In my opinion, the inclusion of the pRF size comparison plots in the OSF repository and Figure S1 made a stronger case than any of the plots of the cortical surface. I would suggest putting these on log-log plots since the distribution of pRF size (like eccentricity) is approximately exponential on the cortical surface. As-is, it's clear in many plots that there is a big splotch of data in the compressed lower left corner, but it's hard to get a sense for how these should be compared to the upper right expanse of the plots. It is frequently hard to tell whether there is a greater concentration of points above or below the line of equality in the lower left corner as well, and this is fairly central to the paper's claims. My intuition is that the upper right is showing relatively little data (maybe 10%?), but these data are very emphasized by the current plots.
The authors might even want to consider putting a collection of these scatter-plots (or maybe just subject 007, or possible all subjects' pRFs on a single scatter-plot) in the main paper and using these visualizations to provide intuitive supporting for the main conclusions about the fMRI data (where the manuscript currently use Figure 4c for visual intuition).

      Minor Comments

      (1) Although eLife does not strictly require it, I would like to see more of the authors' code deposited along with the data (especially the code for calculating the statistics that were mentioned above). I do appreciate the simulation code that the authors added in the latest submission (largely added in response to my criticism in the previous reviews), and I'll admit that it helped me understand where the authors were coming from, but it also contains a bug and thus makes a good example of why I'd like to see more of the authors' code. If we set aside the scientific question of whether the simulation is representative of an fMRI voxel (more in Minor Comment 5, below), Figures 1A and the "AdaptaionEffectSimulated.png" file from the repository (https://osf.io/d5agf) imply that only small RFs were excluded in the high-adapted condition and only large RFs were excluded in the low-adapted condition. However, the script provided (SimlatePrfAdaptation.m: https://osf.io/u4d2h) does not do this. Lines 7 and 8 of the script set the small and large cutoffs at the 30th and 70th percentiles, respectively, then exclude everything greater than the 30th percentile in the "Large RFs adapted out" condition (lines 19-21) and exclude anything less than the 70th percentile in the "Small RFs adapted out" condition (lines 27-29). So the figures imply that they are representing 70% of the data but they are in fact representing only the most extreme 30% of the data. (Moreover, I was unable to run the script because it contains hard-coded paths to code in someone's home directory.) Just to be clear, these kinds of bugs are quite common in scientific code, and this bug was almost certainly an honest mistake.

      (2) I also noticed that the individual subject scatter-plots of high versus low adapted pRF sizes on the OSF seem to occasionally have a large concentration of values on the x=0 and y=0 axes. This isn't really a big deal in the plots, but the manuscript states that "we denoised the pRF data to remove artifactual vertices where at least one of the following criteria was met: (1) sigma values were equal to or less than zero ..." so I would encourage the authors to double-check that the rest of their analysis code was run with the stated filtering.

      (3) The manuscript also says that the median test was performed "on the raw pRF size values". I'm not really sure what the "raw" means here. Does this refer to pRF sizes without thresholding applied?

      (4) The eccentricity data are much clearer now with the additional comments from the authors and the full set of maps; my concerns about this point have been met.

      (5) Regarding the simulation of RFs in a voxel (setting aside the bug), I will admit both to hoping for a more biologically-grounded situation and to nonetheless understanding where the authors are coming from based on the provided example. What I mean by biologically-grounded: something like, assume a 2.5-mm isotropic voxel aligned to the surface of V1 at 4{degree sign} of eccentricity; the voxel would span X to Y degrees of eccentricity, and we predict Z neurons with RFs in this voxel with a distribution of RF sizes at that eccentricity from [reference], etc. eventually demonstrating a plausible pRF size change commensurate to the paper's measurements. I do think that a simulation like this would make the paper more compelling, but I'll acknowledge that it probably isn't necessary and might be beyond the scope here.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) It remains unclear how this stimulation protocol is proposed to enhance memory. Memories are believed to be stored by precise inputs to specific neurons and highly tuned changes in synaptic strengths. It remains unclear whether proposed neural activity generated by the stimulation reflects the activation of specific memories or generally increased activity across all classes of neurons.

      Thank you for raising the important issue of the actual neurophysiological effects of non-invasive brain stimulation. Unfortunately, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints, while studies on cadavers or rodents would not fully resolve our question. Indeed, the authors of the cited study (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human brain and cadavers due to alterations in electrical conductivity that occur in postmortem tissue.

      We acknowledge that further exploration of this aspect would be highly valuable, and we agree that it is worth discussing both as a technical limitation and as a potential direction for future research, we therefore modify the manuscript correspondingly. However, to address the challenge of in vivo recordings, we conducted Experiments 3 and 4, which respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      (2) The claim that effects directly involve the precuneus lacks strong support. The measurements shown in Figure 3 appear to be weak (i.e., Figure 3A top and bottom look similar, and Figure 3C left and right look similar). The figure appears to show a more global brain pattern rather than effects that are limited to the precuneus. Related to this, it would perhaps be useful to show the different positions of the stimulation apparatus. This could perhaps show that the position of the stimulation matters and could perhaps illustrate a range of distances over which position of the stimulation matters.

      Thank you for your feedback. We will improve the clarity of the manuscript to better address this important aspect. Our assumption that the precuneus plays a key role in the observed effects is based on several factors:

      (1) The non-invasive stimulation protocol was applied to an individually identified precuneus for each participant. Given existing evidence on TMS propagation, we can reasonably assume that the precuneus was at least a mediator of the observed effects (Ridding & Rothwell, Nature Reviews Neuroscience 2007). For further details about target identification and TMS and tACS propagation, please refer to the MRI data acquisition section in the main text and Biophysical modeling and E-field calculation section in the supplementary materials.

      (2) To investigate the effects of the neuromodulation protocol on cortical responses, we conducted a whole-brain analysis using multiple paired t-tests comparing each data point between different experimental conditions. To minimize the type I error rate, data were permuted with the Monte Carlo approach and significant p-values were corrected with the false discovery rate method (see the Methods section for details). The results identified the posterior-medial parietal areas as the only regions showing significant differences across conditions.

      (3) To control for potential generalized effects, we included a control condition in which TMS-EEG recordings were performed over the left parietal cortex (adjacent to the precuneus). This condition did not yield any significant results, reinforcing the cortical specificity of the observed effects.

      However, as stated in the Discussion, we do not claim that precuneus activity alone accounts for the observed effects. As shown in Experiment 4, stimulation led to connectivity changes between the precuneus and hippocampus, a network widely recognized as a key contributor to long-term memory formation (Bliss & Collingridge, Nature 1993). These connectivity changes suggest that precuneus stimulation triggered a ripple effect extending beyond the stimulation site, engaging the broader precuneus-hippocampus network.

      Regarding Figure 3A, it represents the overall expression of oscillatory activity detected by TMS-EEG. Since each frequency band has a different optimal scaling, the figure reflects a graphical compromise. A more detailed representation of the significant results is provided in Figure 3B. The effect sizes for gamma oscillatory activity in the delta T1 and T2 conditions were 0.52 and 0.50, respectively, which correspond to a medium effect based on Cohen’s d interpretation.

      (3) Behavioral results showing an effect on memory would substantiate claims that the stimulation approach produces significant changes in brain activity. However, placebo effects can be extremely powerful and useful, and this should probably be mentioned. Also, in the behavioral results that are currently presented, there are several concerns:

      a) There does not appear to be a significant effect on the STMB task.

      b) The FNAT task is minimally described in the supplementary material. Experimental details that would help the reader understand what was done are not described. Experimental details are missing for: the size of the images, the duration of the image presentation, the degree of image repetition, how long the participants studied the images, whether the names and occupations were different, genders of the faces, and whether the same participant saw different faces across the different stimulation conditions. Regarding the latter point, if the same participant saw the same faces across the different stimulation conditions, then there could be memory effects across different conditions that would need to be included in the statistical analyses. If participants saw different faces across the different stimulus conditions, then it would be useful to show that the difficulty was the same across the different stimuli.

      We thank you for signaling the lack in the description of FNAT task. We will add all the information required to the manuscript.

      In the meantime, here we provide the answers to your questions. The size of the images 19x15cm. They were presented in the learning phase and the immediate recall for 8 seconds each, while in the delayed recall they were shown (after the face recognition phase) until the subject answered. The learning phase, where name and occupation were shown together with the faces, lasted around 2 minutes comprising the instructions. We used a different set of stimuli for each stimulation condition, for a total of 3 parallel task forms balanced across the condition and order of sessions. All the parallel forms were composed of 6 male and 6 female faces, for each sex there were 2 young adults (aged around 30 years old), 2 middle adults (aged around 50 years old), and 2 old adults (aged around 70 years old). Before the experiments, we ran a pilot study to ensure there were no differences between the parallel forms of the task. We can provide the task with its parallel form upon request. The chance level in the immediate and delayed recall is not quantifiable since the participants had to freely recall the name and the occupation without a multiple choice. In the recognition, the chance level was around 33% (since the possible answers were 3).

      c) Also, if I understand FNAT correctly, the task is based on just 12 presentations, and each point in Figure 2A represents a different participant. How the performance of individual participants changed across the conditions is unclear with the information provided. Lines joining performance measurements across conditions for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it seems as though these 12 measurements do not correspond to a large effect size. For example, in Figure 3A for the immediate condition (total), it seems that, on average, the participants may remember one more face/name/occupation.

      We will add another graph to the manuscript with lines connecting each participant's performance. Unfortunately, we were not able to incorporate it in the box-and-whisker plot.

      We apologize for the lack of clarity in the description of the FNAT. As you correctly pointed out, we used the percentage based on the single association between face, name and occupation (12 in total). However, each association consisted of three items, resulting in a total of 36 items to learn and associate – we will make it more explicit in the manuscript.

      In the example you mentioned, participants were, on average, able to recall three more items compared to the other conditions. While this difference may not seem striking at first glance, it is important to consider that we assessed memory performance after a single, three-minute stimulation session. Similar effects are typically observed only after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022).

      d) Block effects. If I understand correctly, the experiments were conducted in blocks. This is potentially problematic. An example study that articulates potential problems associated with block designs is described in Li et al (TPAMI 2021, https://ieeexplore.ieee.org/document/9264220). It is unclear if potential problems associated with block designs were taken into consideration.

      Thank you for the interesting reference. According to this paper, in a block design, EEG or fMRI recordings are performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design where both TMS-EEG and fMRI were conducted in a resting state on different days according to the different stimulation conditions.

      e) In the FNAT portion of the paper, some results are statistically significant, while others are not. The interpretation of this is unclear. In Figure 3A, it seems as though the authors claim that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. The interpretation of such a result is unclear. Results are also unclear when separated by name and occupation. There is only one condition that is statistically significant in Figure 3A in the name condition, and no significant results in the occupation condition. In short, the statistical analyses, and accompanying results that support the authors’ claims, should be explained more clearly.

      Thank you again for your feedback. We will work on making the large amount of data we reported easier to interpret.

      Hoping to have thoroughly addressed your initial concerns in our previous responses, we now move on to your observations regarding the behavioral results, assuming you were referring to Figure 2A. The main finding of this study is the improvement in long-term memory performance, specifically the ability to correctly recall the association between face, name, and occupation (total FNAT), which was significantly enhanced in both Experiments 1 and 2. However, we also aimed to explore the individual contributions of name and occupation separately to gain a deeper understanding of the results. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall. We understand that this may have caused some confusion. Therefore we will clarify this in the manuscript and consider presenting the name and occupation in a separate plot.

      Regarding the stimulation conditions, your concerns about the performance pattern (iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham) are understandable. However, this new protocol was developed precisely in response to the variability observed in behavioral outcomes following non-invasive brain stimulation, particularly when used to modulate memory functions (Corp et al., 2020; Pabst et al., 2022). As discussed in the manuscript, it is intended as a boost to conventional non-invasive brain stimulation protocols, leveraging the mechanisms outlined in the Discussion section.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      Thank you for your comments. As you pointed out, we did not include a condition where γtACS was applied alone. This decision was based on the findings of Guerra et al. (Brain Stimulation 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. However, we agree that investigating the effects of γtACS alone is an interesting and relevant aspect worthy of further exploration. In line with these observations, we will expand the discussion on this point in the study’s limitations section.

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      Thank you for your comment and for raising this interesting point. As you correctly noted, the protocol we used has a duration of three minutes, a choice based on previous studies demonstrating its greater efficacy with respect to single stimulation from a neurophysiological point of view. Specifically, these studies have shown that the combined stimulation enhanced gamma-band oscillations and increased cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) are all associated with encoding processes, we decided to apply the co-stimulation immediately before it to enhance the efficacy.

      Regarding the question of whether stimulation could also benefit recall, the answer is yes. We can speculate that repeating the stimulation before recall might provide an additional boost. This is supported by evidence showing that both the precuneus and gamma oscillations are involved in recall processes (Flanagin et al., Cerebral Cortex 2023; Griffiths et al., Trends in Neurosciences 2023). Furthermore, previous research suggests that reinstating the same brain state as during encoding can enhance recall performance (Javadi et al., The Journal of Neuroscience 2017).

      We will expand the study rationale and include these considerations in the future directions section.

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      The stimulation protocol was chosen based on previous studies (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Gamma tACS sinusoid frequency wave was set at 70 Hz while iTBS consisted of ten bursts of three pulses at 50 Hz lasting 2 s, repeated every 10 s with an 8 s pause between consecutive trains, for a total of 600 pulses total lasting 190 s (see iTBS+γtACS neuromodulation protocol section). In particular, the theta iTBS has been inspired by protocols used in animal models to elicit LTP in the hippocampus (Huang et al., Neuron 2005). Consequently, neither Theta iTBS nor the gamma frequency of tACS were personalized. The increase in gamma oscillations was referred to the patient’s baseline and did not correspond to the administrated tACS frequency.

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      Thank you for the suggestion. We analyzed theta oscillations finding no changes.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their individual contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we will revise the manuscript accordingly and consider presenting name and occupation recall in separate plots.

      Reviewer #3 (Public review):

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      Thank you for the observation. As you mentioned, our power analysis was based on our previous study investigating the same neuromodulation protocol with a corresponding experimental design. The relatively small sample could be considered a possible limitation of the study which we will add to the discussion. A fundamental future step will be to replay these results on a larger population, however, to strengthen our results we performed the sensitivity analysis you suggested.

      In detail, we performed a sensitivity analysis for repeated-measures ANOVA with α=0.05 and power(1-β)=0.80 with no sphericity correction. For experiment 1, a sensitivity analysis with 1 group and 3 measurements showed a minimal detectable effect size of f=0.524 with 20 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η2\=0.274 corresponding to f=0.614; the ANOVA on FNAT delayed performance revealed an effect size of η2 =0.236 corresponding to f=0.556. For experiment 2, a sensitivity analysis for total FNAT immediate performance (1 group and 3 measurements) showed a minimal detectable effect size of f=0.797 with 10 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η2 =0.448 corresponding to f=0.901. The sensitivity analysis for total FNAT delayed performance (1 group and 6 measurements) showed a minimal detectable effect size of f=0.378 with 10 participants. In our paper, the ANOVA on total FNAT delayed performance revealed an effect size of η2 =0.484 corresponding to f=0.968. Thus, the sensitivity analysis showed that both experiments were powered enough to detect the minimum effect size computed in the power analysis. We have now added this information to the manuscript and we thank the reviewer for her/his suggestion.

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      Thank you again for the suggestions. Your observations are correct, we used a slightly different statistical depending on our hypothesis. Here are the details:

      In experiment 1, we used a repeated-measure ANOVA with one factor “stimulation condition” (iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS). Following the significant effect of this factor we performed post-hoc analysis with Bonferroni correction.

      In experiment 2, we used a repeated-measures with two factors “stimulation condition” and “time”. As expected, we observed a significant effect of condition, confirming the result of experiment 1, but not of time. Thus, this means that the neuromodulatory effect was present regardless of the time point. However, to explore whether the effects of stimulation condition were present in each time point we performed some explorative t-tests with no correction for multiple comparisons since this was just an explorative analysis.

      In experiment 3, we used the same approach as experiment 1. However, since we had a specific hypothesis on the direction of the effect already observed in our previous study, i.e. increase in spectral power (Maiella et al., Scientific Report 2022), our tests were 1-tailed.

      For the p-values, we will correct the manuscript reporting the exact values for every result.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      Thank you for your comment. We fully agree with your observation, which is why this aspect has been considered in the study's limitations. To address your concern, we will further emphasize the fact that our findings do not allow precise inferences regarding the specific mechanisms by which dual iTBS and γtACS of the precuneus modulate learning and memory.

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      Thank you for your insightful observation. We argue that the intervention primarily targeted long-term associative memory formation, as our findings demonstrated effects only on FNAT. However, as you correctly pointed out, we cannot exclude the possibility that the stimulation may also influence short-term verbal associative memory. We will acknowledge this potential effect when discussing the absence of significant findings in the STM task.

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

      Thank you for your comment and for raising these important points.

      We hypothesized that co-stimulation would enhance encoding processes. Previous studies have shown that co-stimulation can enhance gamma-band oscillations and increase cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) have all been associated with encoding processes, we decided to apply co-stimulation before the encoding phase, to boost it.

      We applied the co-stimulation immediately before the learning phase to maximize its potential effects. While we observed a significant increase in gamma oscillatory activity lasting up to 20 minutes, we cannot determine whether the behavioral effects we observed would have been the same with a co-stimulation applied 20 minutes before learning. Based on existing literature, a reduction in the efficacy of co-stimulation over time could be expected (Huang et al., Neuron 2005; Thut et al., Brain Topography 2009). However, we hypothesize that multiple stimulation sessions might provide an additional boost, helping to sustain the effects over time (Thut et al., Brain Topography 2009; Koch et al., Neuroimage 2018; Koch et al., Brain 2022).

      Regarding the absence of further forgetting in both stimulation conditions, we think that the clinical and demographical characteristics of the sample (i.e. young and healthy subjects) explain the almost absence of forgetting after one week.

    1. Author response:

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The study is well-executed and provides many interesting leads for further experimental studies, which makes it very important. One of the significant hypotheses in this context is metazoan Wnt Lipocone domain interactions with lipids, which remain to be explored.

      The manuscript is generally navigable for interesting reading despite being content-rich. Overall, the figures are easy to follow.

      We thank the reviewer for the thoughtful and favorable assessment.

      Major comments:

      I urge the authors to consider creating a first figure summarizing the broad approach and process involved in discovering the lipocone superfamily. This would help the average reader easily follow the manuscript.

      It will be helpful to have the final model/synthesis figure, which provides a take-home message that combines the main deductions from Fig 1c, Fig 4, Fig 5, and Fig 6 to provide an eagle's eye view (also translating the arguments on Page 38 last para into this potential figure).

      We have generated a two-part figure that synthesizes these two requests, also in line with the recommendations made by Reviewer 3. Depending on the accepting Review Commons journal, we plan to either submit this as a graphical abstract/TOC figure (as suggested by Reviewer 3) or as a single figure. We prefer starting with the first approach as it will keep our figure count the same.

      Minor comments:

      Fig 1C: The authors should provide a statistical estimate of the difference in transmembrane tendency scores between the "membrane" and "globular" versions of the Lipocone domains.

      To address this, we calculated group-wise differences using the Kruskal-Wallis nonparametric test, followed by Dunn’s test with Bonferroni correction for a more stringent evaluation. The results of which are presented as a critical difference diagram in the new Supplementary Figure S3. The analysis is explained in the Methods section of the revised manuscript, and the statistically significant difference is mentioned in the text. This analysis identifies three groups of significantly different Lipocone families based on their transmembrane tendency: those predicted (or known) to associate with the prokaryotic membranes, those predicted to be diffusible, and a small number of families residing eukaryotic ER membranes or bacterial outer membranes.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This is a remarkable study, one of a kind. The authors trace the entire huge superfamily containing Wnt proteins which origins remained obscure before this work. Even more amazingly, they show that Wnts originated from transmembrane enzymes. The work is masterfully executed and presented. The conclusions are strongly supported by multiple lines of evidence. Illustrations are beautifully crafted. This is an exemplary work of how modern sequence and structure analysis methods should be used to gain unprecedented insights into protein evolution and origins.

      We thank the reviewer for the positive evaluation of our work.

      Minor comments.

      (1) In fig 1, VanZ structure looks rather different from the rest and is a more tightly packed helical bundle. It might be useful for the readers to learn more about the arguments why authors consider this family to be homologous with the rest, and what caused these structural changes in packing of the helices.

      First, the geometry of an α-helix can be approximated as a cylinder, resulting in contact points that are relatively small. Fewer contact constraints can lead to structural variation in the angular orientations between the helices of an all α-helical domain, resulting in some dispersion in space of the helical axes. As a result, some of the views can be a bit confounding when presented as static 2D images. Second, of the two VanZ clades the characteristic structure similar to the other superfamily members is more easily seen in the VanZ-2 clade (as illustrated in supplementary Figure S2).    

      Importantly, the membership of the VanZ domains was recovered via significant hits in our sequence analysis of the superfamily. Importantly, when the sequence alignments of the active site are compared (Figure 2), VanZ retains the conserved active site residue positions, which are predicted to reside spatially in the same location and project into an equivalent active site pocket as seen in the other families in the superfamily. Further, this sequence relationship is captured by the edges in the network in Figure 1B: multiple members of the superfamily show edges indicating significant relationships with the two VanZ families (e.g., HHSearch hits of probability greater than 90%; p<0.0001 are observed between VanZ-1 and Skillet-DUF2809, Skillet-1, Skillet-4, YfiM-1, YfiM-DUF2279, Wok, pPTDSS, and cpCone-1). Thus, they occupy relatively central locations in the sequence similarity network, indicating a consistent sequence similarity connection to multiple other families.

      (2) Fig. 4 color bars before names show a functional role. How does the blue bar "described for the first time" fits into this logic? Maybe some other way to mark this (an asterisk?) could be better to resolve this sematic inconsistency.

      We have shifted the blue bars into asterisks, which follow family names, now stated in the updated legend.

      Reviewer #3 (Evidence, reproducibility and clarity):

      The manuscript by Burroughs et al. uses informatic sequence analysis and structural modeling to define a very large, new superfamily which they dub the Lipocone superfamily, based on its function on lipid components and cone-shaped structure. The family includes known enzymatic domains as well as previously uncharacterized proteins (30 families in total). Support for the superfamily designation includes conserved residues located on the homologous helical structures within the fold. The findings include analyses that shed light on important evolutionary relationships including a model in which the superfamily originated as membrane proteins where one branch evolved into a soluble version. Their mechanistic proposals suggest possible functions for enzymes currently unassigned. There is also support for the evolutionary connection of this family with the human immune system. The work will be of interest to those in the broad areas of bioinformatics, enzyme mechanisms, and evolution. The work is technically well performed and presented.

      We appreciate the positive evaluation of our work by the reviewer.

      Referees cross-commenting

      All the comments seem useful to me. I like Reviewer 1's suggestion for a flowchart showing the methodology. I think the summarizing figure suggested could be a TOC abstracvt, which many journals request.

      To accommodate this comment (along with Reviewer 1’s comments), we have generated a two-part figure containing the methodology flowchart and the summary of findings. Combining the two provides some before-and-after symmetry to a TOC figure, while also avoiding further inflation of the figure count, which would likely be an issue at one or more of the Review Commons journals.

      The authors may wish to consider the following points (page numbers from PDF for review):

      (1) It would be useful in Fig 1A, either in main text or the supporting information, to also have a an accompanying topology diagram- I like the coloring of the helices to show the homology but the connections between them are hard to follow

      We acknowledge the reviewer’s concern as one shared by ourselves. We have placed such a topology diagram in Figure 1A, and now refer to it at multiple points in the manuscript text.

      (2) Page: 6- In the paragraph marked as an example- please call out Fig1A when the family mentioned is described (I believe SAA is described as one example)

      We have added these pointers in the text, where appropriate.

      (3) Page: 7- The authors state "these 'hydrophobic families' often evince a deeper phyletic distribution pattern than the less-hydrophobic families (Figure S1), implying that the ancestral version of the superfamily was likely a TM domain" there should be more explanation or information here - I am not certain from looking at FigS1 what a deeper phyletic distribution pattern means. Perhaps explaining for a single example? I also see that this important point is discussed in the conclusions- it is useful to point to the conclusion here.

      Our use of the ‘deeper’ in this context is meant to convey the concept that more widely conserved families/clades (both across and within lineages) suggest an earlier emergence. In the Lipocone superfamily, this phylogenetic reasoning supports an evolutionary scenario where the membrane-inserted versions generally emerged early, while the solubilized versions, which are found in relatively fewer lineages, emerged later.

      To address this objectively, we have calculated a simple phyletic distribution metric that combines the phyletic spread of a Lipocone clade with its depth within individual lineages, which is then plotted as a bargraph (Supplemental Figure S1). Briefly, this takes the width of the bar as the phyletic spread across the number of distinct taxonomic lineages and its height as a weighted mean of occurrence within each lineage (depth). The latter helps dampen the effects of sampling bias. In the resulting graph, lineages with a lower height and width are likely to have been derived later than those with a greater height and width. A detailed description clarifying this has been added to the Methods section of the revised manuscript. The results support two statements that are made in the text: 1) that the Wok and VanZ clades are the most widely and deeply represented clades in the superfamily, and 2) that the predicted transmembrane versions tend to be more widely and deeply distributed. We have also added a statement in the results with a pointer to Figure S1 to clarify this point raised by the referee.

      (4) For figure 3 I would suggest instead of coloring by atom type- to color the leaving group red and the group being added blue so the reader can see where the moieties start and end in substrates and products

      We have retained the atom type coloring in the figure for ease of visualizing the atom types. However, to address the reviewer’s concern, we have added dashed colored circles to highlight attacking and leaving groups in the reactions. The legend has been updated accordingly.

      (5) Page: 13- The authors state "While the second copy in these versions is catalytically inactive, the H1' from the second duplicate displaces the H1 from the first copy," So this results in a "sort of domain swap" correct? It may be more clear to label both copies in Figure 3 upper right so it is easier for the reader to follow.

      We have added these labels to the updated Figure S4 (formerly S3).

      (6) The authors state "In addition to the fusion to the OMP β-barrel, the YfiM-DUF2279 family (Figure 5H) shows operonic associations with a secreted MltG-like peptidoglycan lytic transglycosylase (127,128), a lipid anchored cytochrome c heme-binding domain (129), a phosphoglucomutase/phosphomannomutase enzyme (130), a GNAT acyltransferase (131), a diaminopimelate (DAP) epimerase (132), and a lysozyme like enzyme (133). In a distinct operon, YfiM-DUF2279 is combined with a GT-A glycosyltransferase domain (79), a further OMP β-barrel, and a secreted PDZ-like domain fused to a ClpP-like serine protease (134,135) (Figure 5H)." this combination of enzymes sounds like those in the pathways for oligosaccharide synthesis which is cytoplasmic but the flippase acts to bring the product to the periplasm. Please make sure it is clear that these enzymes may act at different faces of the membrane.

      We have made that point explicit in the revised manuscript in the paragraph following the above-quoted statement.

      (7) Page: 21- the authors should remove the unpublished observations on other RDD domain or explain or cite them

      The analysis of the RDD domain is a part of a distinct study whose manuscript we are currently preparing, and explaining its many ramifications would be outside the scope of this manuscript. Moreover, placing even an account of it in this manuscript would break its flow and take the focus away from the Lipocone superfamily. Further, its inclusion of the RDD story would substantially increase the size of the manuscript. However, it is commonly fused to the Lipocone domain; hence, it would be remiss if we entirely remove a reference to it. Accordingly, we retain a brief account of the RDD-fused Lipocone domains in the revised manuscript that is just sufficient to make the relevant functional case”.

      (8) Page: 34- The authors state "For instance, the emergence of the outer membrane in certain bacteria was potentially coupled with the origin of the YfiM and Griddle clades (Figure 4)." I don't see origin point indicated in figure 4 (emergence of outer membrane- this may be helpful to indicate in some way- also I am not certain what the dashed circles in Fig 4 are indicating- its not in the legend?

      This annotation has been added to the revised Figure 4, and the point of recruitment is indicated with a  “X” sign, along with a clarification in the legend regarding the dashed circles.

      (9) In terms of the hydrophobicity analysis, it would be good to mark on the plot (Fig 1C) one or two examples of lipocone members with known structure that are transmembrane proteins as a positive control

      We have added these markers (colored triangles and squares for these families to the plot.

      Grammar, typos

      Page: 3- abstract severance is an odd word to use for hydrolysis or cleavage

      We have changed to “cleavage”.

      Page: 5- "While the structure of Wnt was described over a decade prior" should read "Although the structure of ..."

      Page 7 - "One family did not yield a consistent prediction for orientation"- please state which family

      Page: 8 "While the ancestral pattern is noticeably degraded in the metazoan Wnt (Met-Wnt) family, it is strongly preserved in the prokaryotic Min-Wnt family." Should read "Although the ancestral..."

      throughout- please replace solved with experimentally determined to be clear and avoid jargon

      Please replace "TelC severs the link" with "TelC cleaves the bond "

      We have made the above changes.

      Page: 19- the authors state "a lipobox-containing synaptojanin superfamily phosphoesterase (125) and a secreted R-P phosphatase (126) (see Figure 6, Supplementary Data)" I was uncertain if the authors meant Fig S6 or they meant see Fig 6 and something else in supplementary data. Please fix.

      In this pointer, we intended to flag the relevant gene neighborhoods in both Figures 5H and 6, as well as highlight the additional examples contained in the Supplementary Data. We have updated the point

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):*

      As stated by the authors in the introduction, the RNA-binding protein Sxl is foundational to understanding sex determination in Drosophila. Sxl has been extensively studied as the master regulator of female sex determination in the soma, where it is known to initiate an alternative splicing cascade leading to the expression of DsxF. Additionally, Sxl has been shown to be responsible for keeping X chromosome dosage compensation off in females, while males hyperactivate their X chromosome. While these roles have been well defined, the authors explore an aspect of Sxl that is quite separate from its role as master regulator of female fate. They describe Sxl-RAC, a Sxl isoform that is expressed in the male and female nervous system. Using several genomic techniques, the authors conclude that the Sxl-RAC isoform associates with chromatin in a similar pattern to the RNA polymerase II/III subunit, Polr3E, and Sxl depends on Polr3E for chromatin-association. Further, neuronal loss of Sxl causes changes in lifetime and geotaxis in a similar manner as loss of Polr3E. The work is thorough and significant and should be appropriate for publication if a few issues can be addressed.

      Major Concerns:*

      * 1) How physiological is the Sxl chromatin-association assay? As binding interactions are concentration-dependent, how similar is Sxl-DAM expression to wt Sxl expression in neurons? In addition, does the Sxl-DAM protein function as a wt Sxl protein? Does UAS-Sxl-DAM rescue any Sxl loss phenotypes?*

      Author response:

      As Reviewer 3 correctly notes, Targeted DamID relies on ribosomal re-initiation (codon slippage) to produce only trace amounts of the Dam-fusion protein. By design, this results in expression levels that are significantly lower than those of the endogenous protein. As such, the experiment can be interpreted within a near–wild-type context, rather than as an overexpression model. The primary aim of this experiment was to determine whether Sxl associates with chromatin, and our dataset provides clear evidence supporting such binding.

      2) Is Polr3E chromatin-association also dependent on Sxl? They should do the reciprocal experiment to their examination of Sxl chromatin-association in Polr3E knockdown. This might also help address point 1-if wt Sxl is normally required for aspects of Polr3E chromatin binding, then concerns about whether the Sxl-DAM chromatin-association is real or artifactual would be assuaged.

      Author response:

      This is an interesting thought, however, if Sxl were required for Polr3E recruitment to RNA Pol III, then, in most male Drosophila melanogaster cells, Polr3E would not be incorporated, and males would not be viable (as it is essential for Pol III activity). While it is possible that there could be a subtle effect on Polr3E recruitment, such an experiment, would not alter the central conclusion of our study - that Sxl is recruited to chromatin (accessory to the Pol III complex) via Polr3E.

      Minor concerns:

      * The observed Sxl loss of function phenotypes are somewhat subtle (although perhaps any behavior phenotype at all is a plus). Did they try any other behaviour assays-courtship, learning/memory, anything else at all to test nervous system function?*


      Author response:

      Given the exploratory nature of this study, we focused on broader behavioural and transcriptional assays.

      While well written, it is sometimes difficult to understand how the experiment was performed or what genotypes were used without looking into the methods sections. One example is they should describe the nature of the Sxl-DAM fusion protein clearly in the results.

      Author response:

      We will revise these sections to improve clarity and ensure there is no confusion.

      * Reviewer #1 (Significance (Required)):

      This manuscript represents a dramatic change in our thinking about the action of the Sex-lethal protein. Previously, Sxl was known as the master regulator of both sex determination and dosage compensation, and performed these roles as an RNA-binding protein affecting RNA splicing and translational regulation. Here, the authors describe a sex-non-specific role of Sxl in the male and female nervous system. Further, this activity appears independent of Sxl's RNA binding activity and instead Sxl functions as a chromatin-associating protein working with the RNA pol2/3 factor Polr3E to regulate gene expression. Thus, this represents a highly significant finding. *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):*

      Summary: In this paper, the authors report on an unexpected activity for Sex lethal (Sxl) (a known splicing regulator that functions in sex determination and dosage compensation) in binding to chromatin. They show, using DamID, that Sxl binds to approximately the same chromatin regions as Polr3E (a subunit of RNA Pol III). They show that this binding to chromatin is unaffected by mutations in the RNA binding domains or by deletions of either N or C terminal regions of the Sxl protein. This leads the authors to conclude that Sxl must bind to chromatin through some interacting protein working through the central region of the Sxl protein. They show that Sxl binding is dependent on Polr3E function. They show that male-specific neuronal knockdown of Sxl gives similar phenotypes to knockdown of Polr3E in terms of lethality and improved negative geotaxis. They show gene expression changes with knockdown of Sxl in male adult neurons - mainly that metabolic and pigmentation genes go down in expression. They also show that expression of a previously discovered male adult specific form of Sxl (that does not have splicing activity) in the same neurons also leads to changes in gene expression, including more upregulated than downregulated tRNAs. But they don't see (or don't show) that the same tRNA genes are down with knockdown of Sxl. Nonetheless, based on these findings, they suggest that Sxl plays an important role in regulating Pol III activity through the Polr3E subunit.

      Major comments:

      *

      *To be honest, I'm not convinced that the conclusions drawn from this study are correct. The fact that every mutant form of Sxl shows the same result from the DamID labelling is a little concerning. I would like to see independent evidence of the SxlRac protein binding chromatin. *

      Do antibodies against this form (or any form) of Sxl bind chromatin in salivary gland polytene chromosomes, for example? Does Sxl from other insects where Sxl has no role in sex determination bind chromatin?


      __Author Response: __

      Regarding the reviewer’s overall concerns about the legitimacy of the Sxl binding data:

      1. i) The fold differences between Dam-Sxl-mutants and the Dam-only control are very robust (up to 9 log2 fold change (500-fold change)), which is higher than what we observe with most transcription factors using Targeted DamID.
      2. ii) We observed that Sxl binding was significantly reduced upon knockdown of Polr3E, confirming that the signal we observe is biologically specific and not due to technical noise or background. iii) If the concern relates to potential Sxl binding in non-neuronal tissues such as salivary glands, we would like to clarify that all DamID constructs were expressed under elav-GAL4, a pan-neuronal driver. Furthermore, dissections were performed to isolate larval brains, with salivary glands carefully removed. This ensures that chromatin profiles were derived from neuronal tissue exclusively.

      3. iv) Salivary gland polytene chromosome staining with a Sxl antibody in a closely related species (Drosophila virilis) show __binding of Sxl to chromatin __in both sexes (Bopp et al., 1996). We will include more text in the revised manuscript to emphasise these points.

      Do antibodies against this form (or any form) of Sxl bind chromatin in salivary gland polytene chromosomes, for example? Does Sxl from other insects where Sxl has no role in sex determination bind chromatin?

      Author Response:

      Prior work in Drosophila virilis (where Sxl is also required for sex determination and Sxl-RAC is conserved) has already demonstrated Sxl-chromatin association (using a full-length Sxl antibody) in salivary glands using polytene chromosome spreads (Bopp et al., 1996). Binding is observed in both sexes and across the genome, reflecting our observations. We will incorporate this into the revised discussion to support the chromatin-binding role of Sxl across species.

      There is a clear and long-overlooked precedent for Sxl's alternative, sex-independent roles, findings that have been largely overshadowed by the gene’s canonical function. Our study not only validates and extends these observations but also brings much-needed attention to this understudied aspect of fundamental biology.

      Bopp D, Calhoun G, Horabin JI, Samuels M, Schedl P. Sex-specific control of Sex-lethal is a conserved mechanism for sex determination in the genus Drosophila. Development. 1996 Mar;122(3):971-82. doi: 10.1242/dev.122.3.971. PMID: 8631274.

      I would like to see independent evidence of the SxlRac protein binding chromatin.

      * *__Author Response: __

      We do not believe this is necessary:

      1. i) Our data demonstrated that a large N-terminal truncation of Sxl (removing far more of the N-terminal region than is absent in Sxl-RAC) does not impair chromatin binding.
      2. ii) Our deletion experiments show that it is the central domain __of Sxl that is required for chromatin association (as removal of the N-or C-terminal domain has no effect). This central domain is __unaffected in Sxl-RAC. iii) Independent Y2H experiments have shown that it is exclusively the__ RBD-1 __(RNA binding domain 1) of the central domain of Sxl that interacts with Polr3E (Dong et al., 1999). Sxl-RAC contains this region, therefore will be recruited by Polr3E.

      iv) Review 3 also believes that this is not necessary (see cross-review below) and highlights the robustness of the Y2H experiments performed by Dong et al., 1999.

      • *

      Also, given that their DamID experiments reveal that Sxl binds half of the genes encoded in the Drosophila genome, finding that it binds around half of the tRNA genes is perhaps not surprising.


      __Author Response: __

      Our data show that Sxl binds to a range of Pol III-transcribed loci, and this binding pattern supports the proposed model that Sxl plays a broader regulatory role in Pol III activity. Within these Pol III targets, tRNA genes represent a specific and biologically relevant subset. The emphasis on tRNAs is not to suggest they are the exclusive or primary targets of Sxl, but rather to__ highlight a functionally important class of Pol III-transcribed elements__ that align with the model we are proposing. We will revise the text to better reflect this framing and avoid any confusion regarding the scope of Sxl’s binding profile.

      *I would like to see evidence beyond citing a 1999 yeast two-hybrid study that Sxl and Polr3E directly interact with one another. *


      Author response:

      We do not believe this is necessary (these points were also mentioned above):

      1. i) The Dong et al., 1999 study was highly comprehensive in its characterisation of Sxl binding to Polr3E.
      2. ii) Our DamID data provide strong complementary evidence for this interaction: knockdown of Polr3E robustly reduces Sxl’s recruitment to chromatin, strongly supporting the relevance of the interaction in vivo. iii) Review 3 highlights the robustness of the Y2H experiments performed by Dong et al., 1999.

      In my opinion, the differences in lethality observed with loss of Sxl versus control are unlikely to be meaningful given the different genetic backgrounds. The similar defects in negative geotaxis could be meaningful, but I'm unsure how often this phenotype is observed. What other class of genes affect negative geotaxis? It's a little unclear why having reduced expression of metabolic and pigment genes or of tRNAs would improve neuronal function.


      Author response:

      While the differences in survival were indeed subtle, they were statistically significant and thus warranted inclusion. Our primary aim in this section was to demonstrate that knockdown of Sxl or Polr3E results in comparable behavioural and transcriptional phenotypes, suggesting overlapping functional roles. In this context, we believe the data were presented transparently and effectively support our interpretation.

      Regarding the negative geotaxis phenotype, we appreciate the reviewer’s interest and agree that it is both intriguing and atypical. For this reason, we performed the assay multiple times, particularly in Polr3e knockdowns, to confirm the robustness of the result. To address potential confounding variables, we carefully selected control lines that account for genetic background and transgene insertion site, including KK controls and attP40-matched lines. We also employed multiple independent RNAi lines targeting Sxl to validate the phenotype across different genetic backgrounds.

      Although the observed improvement in climbing is unexpected, it is not without precedent in the RNA polymerase III field. Notably, Malik et al. (2024) demonstrated that heterozygous Polr3DEY/+ mutants exhibit a significantly delayed decline in climbing ability with age. We allude to this in the discussion and will revise the text to emphasise this connection more explicitly.

      Finally, while we recognise that negative geotaxis is a relatively broad assay and thus does not pinpoint the precise cellular mechanisms involved, we interpret the phenotype as suggesting a neural basis and a functional role for Sxl in the nervous system.

      One would expect that not just the same classes of genes would be affected by loss and overexpression of Sxl, but the same genes would be affected - are the same genes changing in opposite directions in the two experiments or just the same classes of genes. Likewise, are the same genes changing expression in the same direction with both Sxl and the Polr3E loss? Also, why are tRNA genes not also affected with Sxl loss. Finally, they describe the changes in gene expression as being in male adult neurons, but the sequencing was done of entire heads - so no way of knowing which cell type is showing differential gene expression.

      Author response:

      While we do examine gene classes, our approach also includes pairwise correlation analyses of gene expression changes between specific genotypes. Notably, we observed a significant positive correlation between Polr3e knockdowns and Sxl knockdowns, and a significant negative correlation between Sxl-RAC–expressing flies and Sxl knockdowns. Furthermore, we examined Sxl-DamID target genes within our RNA-seq datasets and found a consistent relationship between Sxl targets and genes differentially expressed in Polr3e knockdowns.

      Regarding the Pol III qPCR results, we note that tRNA expression changes may require a longer duration of RNAi induction (e.g., beyond 4 days) to become apparent, especially given that phenotypic effects such as changes in lifespan and negative geotaxis only emerge after 20 days or more. It is also plausible that Sxl knockdown leads to a partial reduction in Pol III efficiency, which may not be readily detectable through bulk Pol III qPCRs. We are willing to repeat Pol III qPCRs at later timepoints to further investigate this trend.

      Finally, we infer that gene expression changes observed in our RNA-seq data are of neuronal origin, as all knockdown and overexpression constructs used in this study were driven pan-neuronally using elav-/nSyb-GAL4. While we acknowledge that bulk RNA-seq does not provide cell-type resolution, tissue-specific assumptions are widely used in the field when driven by a relevant promoter.

      I'm also not sure what I'm supposed to be seeing in panel 5F (or in the related supplemental figure) and if it has any meaning - If they are using the Sxl-T2A-Gal4 to drive mCherry, I think one would expect to see expression since Sxl transcripts are made in both males and in females. Also, one would expect to see active protein expression (OPP staining) in most cells of the adult male brain and I think that is what is observed, but again, I'm not sure what I'm supposed to be looking at given the absence of any arrows or brackets in the figures.

      Author Response:

      Due to the presence of the T2A tag and the premature stop codon in exon 3 of early male Sxl transcripts, GAL4 expression is not expected in males unless the head-specific SxlRAC isoform is produced. The aim of panel 5F is to demonstrate the spatial overlap between SxlRAC expression (as we are examining male brains) and regions of elevated protein synthesis, as detected by OPP staining.

      To quantitatively assess this relationship, we performed colocalisation analysis using ImageJ, which showed a positive correlation between Sxl and OPP signal intensity, supporting this interpretation. It is also evident from our images that regions with lower levels of protein synthesis (such as the neuropil - as shown in independent studies Villalobos-Cantor et al., 2023) concurrently lack Sxl-related signal. We have highlighted regions in Fig. 5 exhibiting higher/lower levels of Sxl/OPP signal to better illustrate this relationship. We can also test the effects of knockdown/overexpression on general protein synthesis if required.

      Villalobos-Cantor S, Barrett RM, Condon AF, Arreola-Bustos A, Rodriguez KM, Cohen MS, Martin I. Rapid cell type-specific nascent proteome labeling in Drosophila. Elife. 2023 Apr 24;12:e83545. doi: 10.7554/eLife.83545. PMID: 37092974; PMCID: PMC10125018.

      Minor comments:

      * Line 223 - 225 - I believe that it is expected that Sxl transcripts would be broadly expressed in the male and female adult, given that it is only the spliced form of the transcript that is female specific in expression. *

      As explained above, the only isoform that will be ‘trapped’ by the T2A-GAL4 in males is the Sxl-RAC isoform (as the other isoforms contain premature stop codons). Our immunohistochemistry data indicate that Sxl-RAC is expressed in the male brain, specifically in neurons. Therefore, knockdown experiments in males will reduce all mRNA isoforms, of which, Sxl-RAC is the only one producing a protein.

      Line 236 - 238 - Sentence doesn't make sense.

      We have addressed and clarified this.

      Reviewer #2 (Significance (Required)):

      It would be significant to discover that a gene previously thought to function in only sex determination and dosage compensation also moonlights as a regulator of RNA polymerase III activity. Unfortunately, I am not convinced by the work presented in this study that this is the case.

      My expertise is in Drosophila biology, including development, transcription, sex determination, morphogenesis, genomics, transcriptomics, DNA binding

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):*

      Storer, McClure and colleagues use genome-wide DNA-protein binding assays, transcriptomics, and genetics to work out that Drosophila Sxl, widely known as an RNA-binding protein which functions as a splicing factor to determine sex identity in Drosophila and related species, is also a chromatin factor that can stimulate transcription by Pol III and Pol II of genes involved with metabolism and protein homeostasis, specifically some encoding tRNAs.

      The evidence for the tenet of the paper -- that Sxl acts as a chromatin regulator with Polr3E, activating at least some of its targets with either Pol III or Pol II -- is logical and compelling, the paper is well written and the figures well presented. Of course, more experiments could always be wished for and proposed, but I think this manuscript could be published in many journals with just a minor revision not involving additional experiments. I have a few specific comments below, all minor.*

      Scientific points: - The approach taken for the evaluation of Sxl DNA-binding activity in Fig2 is not entirely clear. I assume these are crosses of elav-Gal4 x different UAS- lines, then using males or females for UAS-Sxl-Full-Length. But what about the others? Were the experiments done in males only? This is hinted at in the main text but not explicitly indicated in the figure or the methods (at least, that I could easily find). And is this approach extended to all other experiments? Longevity? Climbing assays? Considering the role of Sxl, it may be helpful to be fastidiously systematic with this.


      Author Response:

      We have revised the wording to ensure greater clarity. Males were used for all survival and behavioural experiments (as only males can be leveraged for knocking down Sxl-RAC without affecting the canonical Sxl-F isoform).

      - In the discussion, lines 360-61, the authors say: Indeed, knockdown of Polr3E leads to a loss of Sxl binding to chromatin, suggesting a cooperative mechanism. Maybe I am misunderstanding the authors, but when I read "cooperation" in this context I think of biochemical cooperative binding. This is possible, but I do not think a simple 'requirement' test can suggest specifically that this mechanistic feature of biochemical binding is at play. I would expect, for starters, a reciprocal requirement for binding (which is not tested), and some quantitative features that would be difficult to evaluate in vivo. I do not think cooperative binding needs to be invoked anyway, as the authors do not make any specific point or prediction about it. But if they do think this is going on, I think it would need to be referred to as a speculation.


      Author Response:

      We appreciate that the original wording may have been unclear and will revise the text to more accurately reflect a functional relationship, rather than implying direct cooperation.

      - In lines 428-432, the authors discuss the ancestral role of Sxl and make a comparison with ELAV, in the context of an RNA-binding protein that has molecular functions beyond those of a splicing factor, considering the functions of ELAV in RNA stability and translation, and finishing with "suggesting that similar regulatory mechanisms may be at play". I do not understand this latter sentence. Which mechanisms are these? Are the authors referring to the molecular activities of ELAV and SXL? But what would be the similarity? SXL seems to have a dual capacity to bind RNA and protein interactors, which allows it to work both in chromatin-level regulation as well as post-transcriptionally in splicing; but ELAV seems rather to take advantage of its RNA binding function to make it work in multiple RNA-related contexts, all post-transcriptional. I do not see an obvious parallel beyond the fact that RNA binding proteins can function at different levels of gene expression regulation -- but I would not say this parallel are "similar regulatory mechanisms", so I find the whole comparison a bit confusing.


      Author Response:

      We have reduced this section, as it is largely speculative and intended to highlight potential, though indirect, links in higher organisms. Our goal was primarily to illustrate the possibility that Sxl may have an ancestral role distinct from its well-characterised function, and to suggest a potential avenue for future research into ELAV2’s involvement in chromatin or Pol III regulation.

      - One aspect of the work that I find is missing in the discussion is the possibility that the simultaneous capacity of Sxl for RNA binding and Polr3E binding: are these mutually exclusive? if so, are they competitive or hierarchical? how would they be coordinated anyway?


      Author Response:

      This is an interesting point, and we have expanded on it further in the Discussion section.

      - The only aspect of the paper where I found that one could make an experimental improvement is the claim that Sxl induces the expression of genes that have the overall effect of stimulating protein synthesis. The OPP experiment shows a correlation between the expression of Sxl and the rate of protein synthesis initiation. However, a more powerful experiment would be, rather obviously, to introduce Sxl knock-down in the same experiment, and observe whether in Sxl-expressing neurons the incorporation of OPP is reduced. I put this forth as a minor point because the tenet of the paper would not be affected by the results (though the perception of importance of the newly described function could be reinforced).

      • *

      Author Response:

      This could be a valid experiment and we are prepared to perform it if required.

      - In a similar way, it would be interesting to know whether the recruitment of Polr3E and Sxl to chromatin is co-dependent or Sxl follows Polr3E. This is also a minor point because this would possibly refine the mechanism of recruitment but does not alter the main discovery.

      Author Response:

      We have addressed a similar point for Reviewer 2 (see below) and will include a Discussion point for this:

      If Sxl were required for Polr3E recruitment to RNA Pol III, then, in most male Drosophila melanogaster cells, Polr3E would not be incorporated, and males would not be viable (as it is essential for Pol III activity). While it is possible that there could be a subtle effect on Polr3E recruitment, such an experiment, would not alter the central conclusion of our study - that Sxl is recruited to chromatin (accessory to the Pol III complex) via Polr3E.

      * Figures and reporting:

      • In Figure 2, it would be helpful to see the truncation coordinate for the N and C truncations.

      • In Figure 3D, genomic coordinates are missing.

      • In Figure 3E, the magnitude in the Y axis is not entirely clear (at least not to me). How is the amount of binding across the genome quantified? Is this the average amplitude of normalised TaDa signal across the genome? Or only within binding intervals?

      • Figure S3E-F: it would be interesting to show the degree of overlap between the downregulated genes that are also binding targets (regardless of the outcome).

      • Figure 5C-E: similarly to Figure S3, it would be interesting to know how the transcriptional effects compare with the binding targets.

      • Authors use Gehan-Breslow-Wilcoxon to test survival, which is a bit unusual, as it gives more weight to the early deaths (which are rare in most Drosophila longevity experiments). Is there any rationale behind this? It may be even favour their null hypothesis.*


      Author response:

      Thank you for the detailed feedback on our figures. We have__ incorporated__ the suggested changes.

      We agree that examining the overlap between Sxl binding sites and transcriptional changes is valuable, and we aimed to highlight this in the pie charts shown in Figures S3 and S5. If the reviewer is suggesting a more explicit quantification of the proportion of Sxl-Dam targets with significant transcriptomic changes, we are happy to include this analysis in the final version of the manuscript.

      As noted in the Methods, both Gehan–Breslow–Wilcoxon (GBW) and Kaplan–Meier tests were used. The significance in Figure 4a is specific to the GBW test, which we indicated by describing the effect as mild. Our focus here is not on the magnitude of survival differences, but on the consistent trends observed in both Polr3e and Sxl knockdowns.

      Writing and language:*

      • Introduction finishes without providing an outline of the findings (which is fine by me if that is what the authors wanted).

      • In lines 361-5, the authors say "We speculate that this interaction not only facilitates Pol III transcription but may also influence chromatin architecture and RNA Pol II-driven transcription as observed with Pol III regulation in other organisms". "This interaction" refers to Polr3E-Sxl-DNA interaction and with "Pol III transcription" I presume the authors refer to transcription executed by Pol III. I am not clear about the meaning of the end of the sentence "as observed with Pol III regulation in other organisms". What is the observation, exactly? That Pol III modifies chromatin in Pol II regulated loci, or that Pol III interactors change chromatin architecture?

      • DPE abbreviation is not introduced (and only used once).

      • A few typos: Line 41 ...splicing of the Sxl[late] transcripts, which is [ARE?] constitutively transcribed (Keyes et al.,... Line 76 ...sexes but appears restricted to the nervous system [OF] male pupae and adults (Cline et Line 289 ...and S41). To assess any effect [ON]translational output, O-propargyl-puromycin (OPP)o Line 323 ...illustrating that the majority (72%) changes in tRNA levels [ARE] due to upregulation...hi Line 402 ...it was discovered [WE DISCOVERED] Line 792 ...Sxl across chromosomes X, 2 L/R, 3 L/R and 4. The y-axis represents the log[SYMBOL] ratio... This happens in other figure legends as well.*


      Author response:

      Thank you for the detailed feedback, we have clarified and incorporated the suggested changes.

      **Referee Cross-commenting***

      Reviewer 1 asks how physiological is the Sxl chromatin-association assay. I think the loss of association in Polr3E knock-down and the lack of association of other splicing factors goes a long way into answering this question. It is true that having positive binding data specifically for Sxl-RAC and negative binding data for a deletion mutant of the RMM domain would provide more robust conclusions (see below), but I am not sure it is completely necessary -- though this will depend on which journal the authors want to send the paper to.

      I think that the comment of reviewer 1 about the levels of expression of Sxl-DAM does not apply here because of the way TaDa works - it relies on codon slippage to produce minimal amounts of the DAM fusion protein, so by construction it will be expressed at much lower levels than the endogenous protein.

      Reviewer 1 also asks whether Polr3E chromatin-association is also dependent on Sxl, to round up the model and also as a way to address whether Sxl association to chromatin is real. While I agree with this on the former aim (this would be a nice-to-have), I think I disagree on the latter; there is no need for Polr3E recruitment to depend on Sxl for Sxl association to chromatin to be physiologically relevant. Polr3E is a peripheral component of Pol III and unlikely to depend on a factor of restricted expression like Sxl to interact with chromatin. The recruitment of Sxl could well be entirely 'hierarchical' and subject to Polr3E.

      Revewer 2 is concerned with the fact that every mutant form of Sxl shows the same result from the DamID labelling. I have to agree with this to a point. A deletion mutant of RMM domains would address this. Microscopy evidence in salivary glands would be nice, certainly, but the system may not lend itself to this particular interaction, which might be short-lived and/or weak. I do not immediately see the relevance of the chromatin binding capacity of non-Drosophilidae Sxl -- though it might indicate that the impact of the discovery is less likely to go beyond this group.

      Reviewer 2 does not find surprising that some tRNA genes (less than half) are regulated by Sxl. I think the value of that observation is just qualitative, as tRNAs are Pol III-produced transcripts, but their point is correct. A hypergeometric test could settle this.

      Reviewer 2 is concerned that the evidence of direct interaction between Sxl and Polr3E is a single 1999 two-hybrid study. But that paper contains also GST pull-downs that narrow down the specific domains that mediate binding, and perform the binding in competitive salt conditions. I think it is enough. The author team, I think, are not biochemists, so finding the right collaborators and performing these experiments would take time that I am not sure is warranted.

      Reviewer 2 is also concerned that the longevity assays may not be meaningful due to the difference in genetic backgrounds. This is a very reasonable concern (which I would extend to the climbing assays - any quantitative phenotype is sensitive to genetic background). However, I think the authors here may have already designed the experiment with this in mind - the controls express untargeted RNAi constructs, but I lose track of which one is control of which. This should be clarified in Methods.

      Other comments are in line, I think, with what I have pointed out and I generally agree with everything else that has been said.

      Reviewer #3 (Significance (Required)):

      Drosophila Sxl is widely known as an RNA-binding protein which functions as a splicing factor to determine sex identity in Drosophila and related species. It is a favourite example of how splicing factors and alternative can have profound influence in biology and used cleverly in the molecular circuitry of the cell to enact elegant regulatory decisions.

      In this work, Storer, McClure and colleagues use genome-wide DNA-protein binding assays, transcriptomics, and genetics to work out that Sxl is also a chromatin factor with an sex-independent, neuron-specific role in stimulating transcription by Pol III and Pol II, of genes involved with metabolism and protein homeostasis, including some encoding tRNAs.

      This opens a large number of interesting biological questions that range from biochemistry, gene regulation or neurobiology to evolution. How is the simultaneous capacity of binding RNA and chromatin (with the same protein domain, RRM) regulated/coordinated? How did this dual activity evolve and which one is the ancestral one? How many other RRM-containin RNA-binding proteins can also bind chromatin? How is Sxl recruited to chromatin to both Pol II and Pol III targets and are they functionally related? If so, how is the coordination of cellular functions activated through different RNA polymerases taking place and what is the role of Sxl in this? What are the functional consequences to neuronal biology? Does this affect similarly all Sxl-expressing neurons?

      The evidence for the central tenet of the paper -- that Sxl acts as a chromatin regulator with Polr3E, activating at least some of its targets with either Pol III or Pol II -- is logical and compelling, the paper is well written and the figures well presented. Of course, more experiments could always be wished for and proposed, but I think this manuscript could be published in many journals with just a minor revision not involving additional experiments.*

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      *The convincing analysis demonstrates a role for the Drosophila Sex determining gene sex lethal in controlling aspects of transcription in the nervous system independent of its role in splicing. Interaction with an RNA Pol III subunit mediating Sxl association with chromatin and similar knockdown phenotypes strongly support the role of Sxl in the regulation of neuronal metabolism. Given that Sxl is an evolutionary recent acquisition for sex determination, the study may reveal an ancestral role for Sxl.

      The conclusions are well justified by the datasets presented and I have no issues with the study or the interpretation. Throughout the work is well referenced, though perhaps the authors might take a look at Zhang et al (2014) (PMID: 24271947) for an interesting evolutionary perspective for the discussion.*

      Author Response:

      Thank you for the thoughtful suggestion. We will be sure to incorporate the findings from Zhang et al. regarding the evolution of the sex determination pathway.

      *I have some minor comments for clarification:

      There is no Figure 2b, should be labelled 2 or label TaDa plots as 2b

      Clarify if Fig 2 data are larval or adult *

      *Larval

      Fig 3d - are these replicates or female and male?

      Please elaborate on tub-GAL80[ts] developmental defects

      Fig 4e, are transcriptomics done with the VDRC RNAi line? The VDRC and BDSC RNAi lines exhibit different behaviours - former has "better" survival and Better negative geotaxis, the latter seems to have poorer survival but little geotaxis effect?*

      *Fig S3 - volcano plot for Polr3E?

      Fig S4a - legend says downregulated genes?

      The discussion should at least touch on the fact that Sxl amorphs (i.e. Sxl[fP7B0] are male viable and fertile, emphasising that the newly uncovered role is not essential.*

      Author Response:

      We agree with the suggestions outlined in the comments and have made the appropriate revisions.

      Reviewer #4 (Significance (Required)):*

      A nonessential role for Sxl in the nervous system independent of sex-determination contributes to better understanding a) the evolution of sex determining mechanisms, b) the role of RNA PolIII in neuronal homeostasis and c) more widely to the neuronal aging field. I think this well-focused study reveals a hitherto unsuspected role for Sxl.*

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Storer, McClure and colleagues use genome-wide DNA-protein binding assays, transcriptomics, and genetics to work out that Drosophila Sxl, widely known as an RNA-binding protein which functions as a splicing factor to determine sex identity in Drosophila and related species, is also a chromatin factor that can stimulate transcription by Pol III and Pol II of genes involved with metabolism and protein homeostasis, specifically some encoding tRNAs.

      The evidence for the tenet of the paper -- that Sxl acts as a chromatin regulator with Polr3E, activating at least some of its targets with either Pol III or Pol II -- is logical and compelling, the paper is well written and the figures well presented. Of course, more experiments could always be wished for and proposed, but I think this manuscript could be published in many journals with just a minor revision not involving additional experiments. I have a few specific comments below, all minor.

      Scientific points:

      • The approach taken for the evaluation of Sxl DNA-binding activity in Fig2 is not entirely clear. I assume these are crosses of elav-Gal4 x different UAS- lines, then using males or females for UAS-Sxl-Full-Length. But what about the others? Were the experiments done in males only? This is hinted at in the main text but not explicitly indicated in the figure or the methods (at least, that I could easily find). And is this approach extended to all other experiments? Longevity? Climbing assays? Considering the role of Sxl, it may be helpful to be fastidiously systematic with this.
      • In the discussion, lines 360-61, the authors say: Indeed, knockdown of Polr3E leads to a loss of Sxl binding to chromatin, suggesting a cooperative mechanism. Maybe I am misunderstanding the authors, but when I read "cooperation" in this context I think of biochemical cooperative binding. This is possible, but I do not think a simple 'requirement' test can suggest specifically that this mechanistic feature of biochemical binding is at play. I would expect, for starters, a reciprocal requirement for binding (which is not tested), and some quantitative features that would be difficult to evaluate in vivo. I do not think cooperative binding needs to be invoked anyway, as the authors do not make any specific point or prediction about it. But if they do think this is going on, I think it would need to be referred to as a speculation.
      • In lines 428-432, the authors discuss the ancestral role of Sxl and make a comparison with ELAV, in the context of an RNA-binding protein that has molecular functions beyond those of a splicing factor, considering the functions of ELAV in RNA stability and translation, and finishing with "suggesting that similar regulatory mechanisms may be at play". I do not understand this latter sentence. Which mechanisms are these? Are the authors referring to the molecular activities of ELAV and SXL? But what would be the similarity? SXL seems to have a dual capacity to bind RNA and protein interactors, which allows it to work both in chromatin-level regulation as well as post-transcriptionally in splicing; but ELAV seems rather to take advantage of its RNA binding function to make it work in multiple RNA-related contexts, all post-transcriptional. I do not see an obvious parallel beyond the fact that RNA binding proteins can function at different levels of gene expression regulation -- but I would not say this parallel are "similar regulatory mechanisms", so I find the whole comparison a bit confusing.
      • One aspect of the work that I find is missing in the discussion is the possibility that the simultaneous capacity of Sxl for RNA binding and Polr3E binding: are these mutually exclusive? if so, are they competitive or hierarchical? how would they be coordinated anyway?
      • The only aspect of the paper where I found that one could make an experimental improvement is the claim that Sxl induces the expression of genes that have the overall effect of stimulating protein synthesis. The OPP experiment shows a correlation between the expression of Sxl and the rate of protein synthesis initiation. However, a more powerful experiment would be, rather obviously, to introduce Sxl knock-down in the same experiment, and observe whether in Sxl-expressing neurons the incorporation of OPP is reduced. I put this forth as a minor point because the tenet of the paper would not be affected by the results (though the perception of importance of the newly described function could be reinforced).
      • In a similar way, it would be interesting to know whether the recruitment of Polr3E and Sxl to chromatin is co-dependent or Sxl follows Polr3E. This is also a minor point because this would possibly refine the mechanism of recruitment but does not alter the main discovery.

      Figures and reporting:

      • In Figure 2, it would be helpful to see the truncation coordinate for the N and C truncations.
      • In Figure 3D, genomic coordinates are missing.
      • In Figure 3E, the magnitude in the Y axis is not entirely clear (at least not to me). How is the amount of binding across the genome quantified? Is this the average amplitude of normalised TaDa signal across the genome? Or only within binding intervals?
      • Figure S3E-F: it would be interesting to show the degree of overlap between the downregulated genes that are also binding targets (regardless of the outcome).
      • Figure 5C-E: similarly to Figure S3, it would be interesting to know how the transcriptional effects compare with the binding targets.
      • Authors use Gehan-Breslow-Wilcoxon to test survival, which is a bit unusual, as it gives more weight to the early deaths (which are rare in most Drosophila longevity experiments). Is there any rationale behind this? It may be even favour their null hypothesis.

      Writing and language:

      • Introduction finishes without providing an outline of the findings (which is fine by me if that is what the authors wanted).
      • In lines 361-5, the authors say "We speculate that this interaction not only facilitates Pol III transcription but may also influence chromatin architecture and RNA Pol II-driven transcription as observed with Pol III regulation in other organisms". "This interaction" refers to Polr3E-Sxl-DNA interaction and with "Pol III transcription" I presume the authors refer to transcription executed by Pol III. I am not clear about the meaning of the end of the sentence "as observed with Pol III regulation in other organisms". What is the observation, exactly? That Pol III modifies chromatin in Pol II regulated loci, or that Pol III interactors change chromatin architecture?
      • DPE abbreviation is not introduced (and only used once).
      • A few typos: Line 41 ...splicing of the Sxl[late] transcripts, which is [ARE?] constitutively transcribed (Keyes et al.,... Line 76 ...sexes but appears restricted to the nervous system [OF] male pupae and adults (Cline et Line 289 ...and S41). To assess any effect [ON]translational output, O-propargyl-puromycin (OPP)o Line 323 ...illustrating that the majority (72%) changes in tRNA levels [ARE] due to upregulation...hi Line 402 ...it was discovered [WE DISCOVERED] Line 792 ...Sxl across chromosomes X, 2 L/R, 3 L/R and 4. The y-axis represents the log[SYMBOL] ratio... This happens in other figure legends as well.

      Referee Cross-commenting

      Reviewer 1 asks how physiological is the Sxl chromatin-association assay. I think the loss of association in Polr3E knock-down and the lack of association of other splicing factors goes a long way into answering this question. It is true that having positive binding data specifically for Sxl-RAC and negative binding data for a deletion mutant of the RMM domain would provide more robust conclusions (see below), but I am not sure it is completely necessary -- though this will depend on which journal the authors want to send the paper to.

      I think that the comment of reviewer 1 about the levels of expression of Sxl-DAM does not apply here because of the way TaDa works - it relies on codon slippage to produce minimal amounts of the DAM fusion protein, so by construction it will be expressed at much lower levels than the endogenous protein.

      Reviewer 1 also asks whether Polr3E chromatin-association is also dependent on Sxl, to round up the model and also as a way to address whether Sxl association to chromatin is real. While I agree with this on the former aim (this would be a nice-to-have), I think I disagree on the latter; there is no need for Polr3E recruitment to depend on Sxl for Sxl association to chromatin to be physiologically relevant. Polr3E is a peripheral component of Pol III and unlikely to depend on a factor of restricted expression like Sxl to interact with chromatin. The recruitment of Sxl could well be entirely 'hierarchical' and subject to Polr3E.

      Revewer 2 is concerned with the fact that every mutant form of Sxl shows the same result from the DamID labelling. I have to agree with this to a point. A deletion mutant of RMM domains would address this. Microscopy evidence in salivary glands would be nice, certainly, but the system may not lend itself to this particular interaction, which might be short-lived and/or weak. I do not immediately see the relevance of the chromatin binding capacity of non-Drosophilidae Sxl -- though it might indicate that the impact of the discovery is less likely to go beyond this group.

      Reviewer 2 does not find surprising that some tRNA genes (less than half) are regulated by Sxl. I think the value of that observation is just qualitative, as tRNAs are Pol III-produced transcripts, but their point is correct. A hypergeometric test could settle this.

      Reviewer 2 is concerned that the evidence of direct interaction between Sxl and Polr3E is a single 1999 two-hybrid study. But that paper contains also GST pull-downs that narrow down the specific domains that mediate binding, and perform the binding in competitive salt conditions. I think it is enough. The author team, I think, are not biochemists, so finding the right collaborators and performing these experiments would take time that I am not sure is warranted.

      Reviewer 2 is also concerned that the longevity assays may not be meaningful due to the difference in genetic backgrounds. This is a very reasonable concern (which I would extend to the climbing assays - any quantitative phenotype is sensitive to genetic background). However I think the authors here may have already designed the experiment with this in mind - the controls expres untargeted RNAi constructs, but I lose track of which one is control of which. This should be clarified in Methods.

      Other comments are in line, I think, with what I have pointed out and I generally agree with everything else that has been said.

      Significance

      Drosophila Sxl is widely known as an RNA-binding protein which functions as a splicing factor to determine sex identity in Drosophila and related species. It is a favourite example of how splicing factors and alternative can have profound influence in biology and used cleverly in the molecular circuitry of the cell to enact elegant regulatory decisions.

      In this work, Storer, McClure and colleagues use genome-wide DNA-protein binding assays, transcriptomics, and genetics to work out that Sxl is also a chromatin factor with an sex-independent, neuron-specific role in stimulating transcription by Pol III and Pol II, of genes involved with metabolism and protein homeostasis, including some encoding tRNAs.

      This opens a large number of interesting biological questions that range from biochemistry, gene regulation or neurobiology to evolution. How is the simultaneous capacity of binding RNA and chromatin (with the same protein domain, RRM) regulated/coordinated? How did this dual activity evolve and which one is the ancestral one? How many other RRM-containin RNA-binding proteins can also bind chromatin? How is Sxl recruited to chromatin to both Pol II and Pol III targets and are they functionally related? If so, how is the coordination of cellular functions activated through different RNA polymerases taking place and what is the role of Sxl in this? What are the functional consequences to neuronal biology? Does this affect similarly all Sxl-expressing neurons?

      The evidence for the central tenet of the paper -- that Sxl acts as a chromatin regulator with Polr3E, activating at least some of its targets with either Pol III or Pol II -- is logical and compelling, the paper is well written and the figures well presented. Of course, more experiments could always be wished for and proposed, but I think this manuscript could be published in many journals with just a minor revision not involving additional experiments.

    1. I am sincerely grateful to the editors and peer reviewers at MetaROR for their detailed feedback and valuable comments and suggestions. I have addressed each point below.

      Handling Editor

      1. However, the article’s progression and arguments, along with what it seeks to contribute to the literature need refinement and clarification. The argument for PRC is under-developed due to a lack of clarity about what the article means by scientific

      communication. Clarity here might make the endorsement of PRC seem like less of a foregone conclusion.

      The structure of the paper (and discussion) has changed significantly to address the feedback.

      2. I strongly endorse the main theme of most of the reviews, which is that the progression and underlying justifications for this article’s arguments needs a great deal of work. In my view, this article’s main contribution seems to be the evaluation of the three peer review models against the functions of scientific communication. I say ‘seems to be’ because the article is not very clear on that and I hope you will consider clarifying what your manuscript seeks to add to the existing work in this field. In any case, if that assessment of the three models is your main contribution, that part is somewhat underdeveloped. Moreover, I never got the sense that there is clear agreement in the literature about what the tenets of scientific communication are. Note that scientific communication is a field in its own right.

      I have implemented a more rigorous approach to argumentation in response. “Scientific communication” was replaced by “scholarly communication.”

      3. I also agree that paper is too strongly worded at times, with limitations and assumptions in the analysis minimised or not stated. For example, all of the typologies and categories drawn could easily be reorganised and there is a high degree of subjectivity in this entire exercise. Subjective choices should be highlighted and made salient for the reader. Note that greater clarity, rigour, and humility may also help with any alleged or actual bias.

      I have incorporated the conceptual framework and description of the research methodology. However, the

      Discussion section reflects my personal perspective in some points, which I have explicitly highlighted to ensure clarity.

      4. I agree with Reviewer 3 that the ‘we’ perspective is distracting.

      This has been fixed.

      5. The paragraph starting with ‘Nevertheless’ on page 2 is very long.

      The text was restructured.

      6. There are many points where language could be shortened for readability, for example:

      Page 3: ‘decision on publication’ could be ‘publication decision’.

      Page 5: ‘efficiency of its utilization’ could be ‘its efficiency’.

      Page 7: ‘It should be noted…’ could be ‘Note that…’.

      I have proofread the text.

      7. Page 7: ‘It should be noted that..’ – this needs a reference.

      This statement has been moved to the Discussion section, paraphrased, and reference added.

      “It should be also noted that peer review innovations pull in opposing directions, with some aiming to increase efficiency and reduce costs, while others aim to promote rigor and increase costs

      (Kaltenbrunner et al., 2022).”

      8. I’m not sure that registered reports reflect a hypothetico-deductive approach (page 6). For instance, systematic reviews (even non-quantitative ones) are often published as registered reports and Cochrane has required this even before the move towards registered reports in quantitative psychology.

      I have added this clarification.

      9. I agree that modular publishing sits uneasily as its own chapter.

      Modular publishing has been combined with registered reports into the deconstructed publication group of

      models, now Section 5.1.

      10. Page 14: ‘The "Publish-Review-Curate" model is universal that we expect to be the future of scientific publishing. The transition will not happen today or tomorrow, but in the next 5-10 years, the number of projects such as eLife, F1000Research, Peer Community in, or MetaROR will rapidly increase’. This seems overly strong (an example of my larger critique and that of the reviewers).

      This part of the text has been rewritten.

      Reviewer 1

      11. For example, although Model 3 is less chance to insert bias to the readers, it also weakens the filtering function of the review system. Let’s just think about the dangers of machine-generated articles, paper-mills, p-hacked research reports and so on. Although the editors do some pre-screening for the submissions, in a world with only Model 3 peer review the literature could easily get loaded with even more ‘garbage’ than in a model where additional peers help the screening.

      I think that generated text is better detected by software tools. At the same time, I tried and described the pros and cons of different models in a more balanced way in the concluding section.

      12. Compared to registered reports other aspects can come to focus that Model 3 cannot cover. It’s the efficiency of researchers’ work. In the care of registered reports, Stage 1 review can still help researchers to modify or improve their research design or data collection method. Empirical work can be costly and time-consuming and post-publication review can only say that “you should have done it differently then it

      would make sense”.

      Thank you very much for this valuable contribution, I have added this statement at P. 11.

      13. Finally, the author puts openness as a strength of Model 3. In my eyes, openness is a separate question. All models can work very openly and transparently in the right circumstances. This dimension is not an inherent part of the models.

      I think that the model, providing peer reviews to all the submissions, ensures maximum transparency. However, I have made effort to make the wording more balanced and distinguish my personal perspective from the literature.

      14. In conclusion, I would not make verdict over the models, instead emphasize the different functions they can play in scientific communication.

      This idea has been reflected now in the concluding section.

      15. A minor comment: I found that a number of statements lack references in the Introduction. I would have found them useful for statements such as “There is a point of view that peer review is included in the implicit contract of the researcher.”

      Thank you for your feedback. I have implemented a more rigorous approach to argumentation in response.

      Reviewer 2

      16. The primary weakness of this article is that it presents itself as an 'analysis' from which they 'conclude' certain results such as their typology, when this appears clearly to be an opinion piece. In my view, this results in a false claim of objectivity which detracts from what would otherwise be an interesting and informative, albeit subjective, discussion, and thus fails to discuss the limitations of this approach.

      I have incorporated the conceptual framework and description of the research methodology. However, the

      Discussion section reflects my personal perspective in some points, which I have explicitly highlighted to ensure clarity.

      17. A secondary weakness is that the discussion is not well structured and there are some imprecisions of expression that have the potential to confuse, at least at first.

      The structure of the paper (and discussion) has changed significantly.

      18. The evidence and reasoning for claims made is patchy or absent. One instance of the former is the discussion of bias in peer review. There are a multitude of studies of such bias and indeed quite a few meta-analyses of these studies. A systematic search could have been done here but there is no attempt to discuss the totality of this literature. Instead, only a few specific studies are cited. Why are these ones chosen? We have no idea. To this extent I am not convinced that the references used here are the most appropriate.

      I have reviewed the existing references and incorporated additional sources. However, the study does not claim to conduct a systematic literature review; rather, it adopts an interpretative approach to literature analysis.

      19. Instances of the latter are the claim that "The most well-known initiatives at the moment are ResearchEquals and Octopus" for which no evidence is provided, the claim that "we believe that journal-independent peer review is a special case of Model 3" for which no further argument is provided, and the claim that "the function of being the "supreme judge" in deciding what is "good" and "bad" science is taken on by peer review" for which neither is provided.

      Thank you for your feedback. I have implemented a more rigorous approach to argumentation in response.

      20. A particular example of this weakness, which is perhaps of marginal importance to the overall paper but of strong interest to this reviewer is the rather odd engagement with history within the paper. It is titled "Evolution of Peer Review" but is really focussed on the contemporary state-of-play. Section 2 starts with a short history of peer review in scientific publishing, but that seems intended only to establish what is

      described as the 'traditional' model of peer review. Given that that short history had just shown how peer review had been continually changing in character over centuries - and indeed Kochetkov goes on to describe further changes - it is a little difficult to work out what 'traditional' might mean here; what was 'traditional' in 2010 was not the same as what was 'traditional' in 1970. It is not clear how seriously this history is being taken. Kochetkov has earlier written that "as early as the beginning of the 21st century, it was argued that the system of peer review is 'broken'" but of course criticisms - including fundamental criticisms - of peer review are much older than this. Overall, this use of history seems designed to privilege the

      experience of a particular moment in time, that coincides with the start of the metascience reform movement.

      While the paper addresses some aspects of peer review history, it does not provide a comprehensive examination of this topic. A clarifying statement to this effect has been included in the methodology section.

      “… this section incorporates elements of historical analysis, it does not fully qualify as such because primary sources were not directly utilized. Instead, it functions as an interpretative literature review, and one that is intentionally concise, as a comprehensive history of peer review falls outside the scope of this research”.

      21. Section 2 also demonstrates some of the second weakness described, a rather loose structure. Having moved from a discussion of the history of peer review to detail the first model, 'traditional' peer review, it then also goes on to describe the problems of this model. This part of the paper is one of the best - and best - evidenced. Given the importance of it to the main thrust of the discussion it should probably have been given more space as a Section all on its own.

      This section (now Section 4) has been extended, see also previous comment.

      22. Another example is Section 4 on Modular Publishing, in which Kochetkov notes "Strictly speaking, modular publishing is primarily an innovative approach for the publishing workflow in general rather than specifically for peer review."

      Kochetkov says "This is why we have placed this innovation in a separate category" but if it is not an innovation in peer review, the bigger question is 'Why was it included in this article at all?'.

      Modular publishing has been combined with registered reports into the deconstructed publication group of models, now Section 5.1.

      23. One example of the imprecisions of language is as follows. The author also shifts between the terms 'scientific communication' and 'science communication' but, at least in many contexts familiar to this reviewer, these are not the same things, the former denoting science-internal dissemination of results through publication (which the author considers), conferences and the like (which the author specifically excludes) while the latter denotes the science-external public dissemination of scientific findings to non-technical audiences, which is entirely out of scope for this article.

      Thank you for your remark. As a non- native speaker, I initially did not grasp the distinction between the terms. However, I believe the phrase ‘scholarly communication’ is the most universally applicable term. This adjustment has now been incorporated into the text.

      24. A final note is that Section 3, while an interesting discussion, seems largely derivative from a typology of Waltman, with the addition of a consideration of whether a reform is 'radical' or 'incremental', based on how 'disruptive' the reform is. Given that this is inherently a subjective decision, I wonder if it might not have been more informative to consider 'disruptiveness' on a scale and plot it accordingly. This would allow for some range to be imagined for each reform as well; surely reforms might be more or less disruptive depending on how they are implemented. Given that each reform is considered against each model, it is somewhat surprising that this is not presented in a tabular or graphical form.

      Ultimately, I excluded this metric due to its current reliance on purely subjective judgment. Measuring 'disruptiveness', e.g., through surveys or interviews remains a task for future research. 

      25. Reconceptualize this as an opinion piece. Where systematic evidence can be drawn upon to make points, use that, but don't be afraid to just present a discussion from what is clearly a well-informed author.

      I cannot definitively classify this work as an opinion piece. In fact, this manuscript synthesizes elements of a literature review, research article, and opinion essay. My idea was to integrate the strengths of all three genres.

      26. Reconsider the focus on history and 'evolution' if the point is about the current state of play and evaluation of reforms (much as I would always want to see more studies on the history and evolution of peer review).

      I have revised the title to better reflect the study’s scope and explicitly emphasize its focus on contemporary developments in the field.

      “Peer Review at the Crossroads”

      27. Consider ways in which the typology might be expanded, even if at subordinate level.

      I have updated the typology and introduced the third tier, where it is applicable (see Fig.2).

      Reviewer 3

      28. In my view, the biggest issue with the current peer review system is the low quality of reviews, but the manuscript only mentions this fleetingly. The current system facilitates publication bias, confirmation bias, and is generally very inconsistent. I think this is partly due to reviewers’ lack of accountability in such a closed peer review system, but I would be curious to hear the author’s ideas about this, more elaborately than they provide them as part of issue 2.

      I have elaborated on this issue in the footnote.

      29. I’m missing a section in the introduction on what the goals of peer review are or should be. You mention issues with peer review, and these are mostly fair, but their importance is only made salient if you link them to the goals of peer review. The author does mention some functions of peer review later in the paper, but I think it would be good to expand that discussion and move it to a place earlier in the manuscript.

      The functions of peer review are summarized in the first paragraph of Introduction.

      30. Table 1 is intuitive but some background on how the author arrived at these categorizations would be welcome.

      When is something incremental and when is something radical? Why are some innovations included but not others (e.g., collaborative peer review, see https://content.prereview.org/how-collaborative-peer-review-can-

      transform-scientific-research/)?

      Collaborative peer review, namely, Prereview was mentioned in the context of Model 3 (Publish-Review-Curate). However, I have extended this part of the paper.

      31. “Training of reviewers through seminars and online courses is part of the strategies of many publishers. At the same time, we have not been able to find statistical data or research to assess the effectiveness of such training.” (p. 5)  There is some literature on this, although not recent. See work by Sara Schroter for example, Schroter et al., 2004; Schroter et al., 2008)

      Thank you very much, I have added these studies and a few more recent ones.

      32. “It should be noted that most initiatives aimed at improving the quality of peer review simultaneously increase the costs.” (p. 7) This claim needs some support. Please explicate why this typically is the case and how it should impact our evaluations of these initiatives.

      I have moved this part to the Discussion section.

      33. I would rephrase “Idea of the study” in Figure 2 since the other models start with a tangible output (the manuscript). This is the same for registered reports where they submit a tangible report including hypotheses, study design, and analysis plan. In the same vein, I think study design in the rest of the figure might also not be the best phrasing. Maybe the author could use the terminology used by COS (Stage 1 manuscript, and Stage 2 manuscript, see Details & Workflow tab of https://www.cos.io/initiatives/registered-reports). Relatedly, “Author submits the first version of the manuscript” in the first box after the ‘Manuscript (report)’ node maybe a confusing phrase because I think many researchers see the first version of the manuscript as the stage 1 report sent out for stage 1 review.

      Thank you very much. Stage 1 and Stage 2 manuscripts look like suitable labelling solution.

      34. One pathway that is not included in Figure 2 is that authors can decide to not conduct the study when improvements are required. Relatedly, in the publish- review-curate model, is revising the manuscripts based on the reviews not optional as well? Especially in the case of 3a, authors can hardly be forced to make changes even though the reviews are posted on the platform.

      All the four models imply a certain level of generalization; thus, I tried to avoid redundant details. However, I have added this choice to the PRC model (now, Model 4).

      35. I think the author should discuss the importance of ‘open identities’ more. This factor is now not explicitly included in any of the models, while it has been found to be one of the main characteristics of peer review systems (Ross-Hellauer, 2017).

      This part has been extended.

      36. More generally, I was wondering why the author chose these three models and not others. What were the inclusion criteria for inclusion in the manuscript? Some information on the underlying process would be welcome, especially when claims like “However, we believe that journal-independent peer review is a special case of Model 3 (“Publish-Review-Curate”).” are made without substantiation.

      The study included four generalized models of peer review that involved some level of abstraction.

      37. Maybe it helps to outline the goals of the paper a bit more clearly in the introduction. This helps the reader to know what to expect.

      The Introduction has been revised including the goal and objectives.

      38. The Modular Publishing section is not inherently related to peer review models, as you mention in the first sentence of that paragraph. As such, I think it would be best to omit this section entirely to maintain the flow of the paper. Alternatively, you could shortly discuss it in the discussion section but a separate paragraph seems too much from my point of view.

      Modular publishing has been combined with registered reports into the fragmented publishing group of models, now in Section 5.

      39. Labeling model 3 as post-publication review might be confusing to some readers. I believe many researchers see post-publication review as researchers making comments on preprints, or submitting commentaries to journals. Those activities are substantially different from the publish-review-curate model so I think it is important to distinguish between these types.

      The label was changed into Publish-Review-Curate model.

      40. I do not think the conclusions drawn below Table 3 logically follow from the earlier text. For example, why are “all functions of scientific communication implemented most quickly and transparently in Model 3”? It could be that the entire process takes longer in Model 3 (e.g. because reviewers need more time), so that Model 1 and Model 2 lead to outputs quicker. The same holds for the following claim: “The additional costs arising from the independent assessment of information based on open reviews are more than compensated by the emerging opportunities for scientific pluralism.” What is the empirical evidence for this? While I personally do think that Model 3 improves on Model 1, emphatic statements like this require empirical evidence. Maybe the author could provide some suggestions on how we can attain this evidence. Model 2 does have some empirical evidence underpinning its validity (see Scheel, Schijen, Lakens, 2021; Soderberg et al., 2021; Sarafoglou et al. 2022) but more meta-research inquiries into the effectiveness and cost- benefits ratio of registered reports would still be welcome in general.

      The Discussion section has been substantially revised to address this point. While I acknowledge the current scarcity of empirical studies on innovative peer review models, I have incorporated a critical discussion of this methodological gap. I am grateful for the suggested literature on RRs, which I have now integrated into the relevant subsection.

      41. What is the underlaying source for the claim that openness requires three conditions?

      I have made effort to clarify within the text that this reflects my personal stance.

      42. “If we do not change our approach, science will either stagnate or transition into other forms of communication.” (p. 2) I don’t think this claim is supported sufficiently strongly. While I agree there are important problems in peer review, I think would need to be a more in-depth and evidence-based analysis before claims like this can be made.

      The sentence has been rephrased.

      43. On some occasions, the author uses “we” while the study is single authored.

      This has been fixed.

      44. Figure 1: The top-left arrow from revision to (re-)submission is hidden

      I have updated Figure 1.

      45. “The low level of peer review also contributes to the crisis of reproducibility in scientific research (Stoddart, 2016).” (p. 4) I assume the author means the low quality of peer review.

      This has been fixed.

      46. “Although this crisis is due to a multitude of factors, the peer review system bears a significant responsibility for it.” (p. 4)

      This is also a big claim that is not substantiated

      I have paraphrased this sentence as

      “While multiple factors drive this crisis, deficiencies in the peer review process

      remain a significant contributor.” and added a footnote.

      47. “Software for automatic evaluation of scientific papers based on artificial intelligence (AI) has emerged relatively recently” (p. 5) The author could add RegCheck (https://regcheck.app/) here, even though it is still in development. This tool is especially salient in light of the finding that preregistration-paper checks are rarely done as part of reviews (see Syed, 2023)

      Thank you very much, I have added this information.

      48. There is a typo in last box of Figure 1 (“decicion” instead of “decision”). I also found typos in the second box of Figure 2, where “screns” should be “screens”, and the author decision box where “desicion” should be “decision”

      This has been fixed.

      49. Maybe it would be good to mention results blinded review in the first paragraph of 3.2. This is a form of peer review where the study is already carried out but reviewers are blinded to the results. See work by Locascio (2017), Grand et al. (2018), and Woznyj et al. (2018).

      Thanks, I have added this (now section 5.2)

      50. Is “Not considered for peer review” in figure 3b not the same as rejected? I feel that it is rejected in the sense that neither the manuscript not the reviews will be posted on the platform.

      Changed into “Rejected”

      51. “In addition to the projects mentioned, there are other platforms, for example, PREreview12, which departs even more radically from the traditional review format due to the decentralized structure of work.” (p. 11) For completeness, I think it would be helpful to add some more information here, for example why exactly decentralization is a radical departure from the traditional model.

      I have extended this passage.

      52. “However, anonymity is very conditional - there are still many “keys” left in the manuscript, by which one can determine, if not the identity of the author, then his country, research group, or affiliated organization.” (p.11) I would opt for the neutral “their” here instead of “his”, especially given that this is a paragraph about equity and inclusion.

      This has been fixed.

      53. “Thus, “closeness” is not a good way to address biases.” (p. 11) This might be a straw man argument because I don’t believe researchers have argued that it is a good method to combat biases. If they did, it would be good to cite them here. Alternatively, the sentence could be omitted entirely.

      I have omitted the sentence.

      54. I would start the Modular Publishing section with the definition as that allows readers to interpret the other statements better.

      Modular publishing has been combined with registered reports into the deconstructed publication group of

      models, now in Section 5, general definition added.

      55. It would be helpful if the Models were labeled (instead of using Model 1, Model 2, and Model 3) so that readers don’t have to think back what each model involved.

      All the models represent a kind of generalization, which is why non-detailed labels are used. The text labels may vary depending on the context.

      56. Table 2: “Decision making” for the editor’s role is quite broad, I recommend to specify and include what kind of decisions need to be made.

      Changed into “Making accept/reject decisions”

      57. Table 2: “Aim of review” – I believe the aim of peer review differs also within these models (see the “schools of thought” the author mentions earlier), so maybe a statement on what the review entails would be a better way to phrase this.

      Changed into “What does peer review entail?”

      58. Table 2: One could argue that the object of the review’ in Registered Reports is

      also the manuscript as a whole, just in different stages. As such, I would phrase this differently.

      Current wording fits your remark

      “Manuscript in terms of study design and execution”

      Reviewer 4

      59. Page 3: It’s hard to get a feel for the timeline given the dates that are described. We have peer review becoming standard after WWII (after 1945), definitively established by the second half of the century, an example of obligatory peer review starting in 1976, and in crisis by the end of the 20th century. I would consider adding

      examples that better support this timeline – did it become more common in specific journals before 1976? Was the crisis by the end of the 20th century something that happened over time or something that was already intrinsic to the institution? It doesn’t seem like enough time to get established and then enter crisis, but more details/examples could help make the timeline clear. Consider discussing the benefits of the traditional model of peer review.

      This section has been extended.

      60. Table 1 – Most of these are self- explanatory to me as a reader, but not all. I don’t know what a registered report refers to, and it stands to reason that not all of these innovations are familiar to all readers. You do go through each of these sections, but that’s not clear when I initially look at the table. Consider having a more informative caption. Additionally, the left column is “Course of changes” here but “Directions” in text. I’d pick one and go with it for consistency.

      Table 1 has been replaced by Figure 2. I have also extended text descriptions, added definitions.

      61. With some of these methods, there’s the ability to also submit to a regular journal. Going to a regular journal presumably would instigate a whole new round of review, which may or may not contradict the previous round of post-publication review and would increase the length of time to publication by going through both types. If someone has a goal to publish in a journal, what benefit would they get by going through the post-publication review first, given this extra time?

      Some of these platforms, e.g., F1000, Lifecycle Journal, replace conventional journal publishing. Modular publishing allows for step-by-step feedback from peers.

      An important advantage of RRs over other peer review models lies in their capacity to enhance research efficiency. By conducting peer review at Stage 1, researchers gain the opportunity to refine their study design or data collection protocols before empirical work begins.

      Other models of review can offer critiques such as "the study should have been conducted differently" without

      actionable opportunity for improvement. The key motivation for having my paper reviewed in MetaROR is the quality of peer review – I have never received so many comments, frankly! Moreover, platforms such as MetaROR usually have partnering journals.

      62. There’s a section talking about institutional change (page 14). It mentions that openness requires three conditions – people taking responsibility for scientific communication, authors and reviewers, and infrastructure. I would consider adding some discussion of readers and evaluators. Readers have to be willing to accept these papers as reliable, trustworthy, and respectable to read and use the information in them.

      Evaluators such as tenure committees and potential employers would need to consider papers submitted through these approaches as evidence of scientific scholarship for the effort to be worthwhile for scientists.

      I have omitted these conditions and employed the Moore’s Technology Adoption Life Cycle. Thank you very much for your comment!

      63. Based on this overview, which seems somewhat skewed towards the merits of these methods (conflict of interest, limited perspective on downsides to new methods/upsides to old methods), I am not quite ready to accept this effort as equivalent of a regular journal and pre-publication peer review process. I look forward to learning more about the approach and seeing this review method in action and as it develops.

      The Discussion section has been substantially revised to address this point. While I acknowledge the current scarcity of empirical studies on innovative peer review models, I have incorporated a critical discussion of this methodological gap.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper concerns mechanisms of foraging behavior in C. elegans. Upon removal from food, C. elegans first executes a stereotypical local search behavior in which it explores a small area by executing many random, undirected reversals and turns called "reorientations." If the worm fails to find food, it transitions to a global search in which it explores larger areas by suppressing reorientations and executing long forward runs (Hills et al., 2004). At the population level, the reorientation rate declines gradually. Nevertheless, about 50% of individual worms appear to exhibit an abrupt transition between local and global search, which is evident as a discrete transition from high to low reorientation rate (Lopez-Cruz et al., 2019). This observation has given rise to the hypothesis that local and global search correspond to separate internal states with the possibility of sudden transitions between them (Calhoun et al., 2014). The main conclusion of the paper is that it is not necessary to posit distinct internal states to account for discrete transitions from high to low reorientation rates. On the contrary, discrete transitions can occur simply because of the stochastic nature of the reorientation behavior itself.

      Strengths:

      The strength of the paper is the demonstration that a more parsimonious model explains abrupt transitions in the reorientation rate.

      Weaknesses:

      (1) Use of the Gillespie algorithm is not well justified. A conventional model with a fixed dt and an exponentially decaying reorientation rate would be adequate and far easier to explain. It would also be sufficiently accurate - given the appropriate choice of dt - to support the main claims of the paper, which are merely qualitative. In some respects, the whole point of the paper - that discrete transitions are an epiphenomenon of stochastic behavior - can be made with the authors' version of the model having a constant reorientation rate (Figure 2f).

      We apologize, but we are not sure what the reviewer means by “fixed dt”. If the reviewer means taking discrete steps in time (dt), and modeling whether a reorientation occurs, we would argue that the Gillespie algorithm is a better way to do this because it provides floating-point precision, rather than a time resolution limited by dt, which we hopefully explain in the updated text (Lines 107-192).

      The reviewer is correct that discrete transitions are an epiphenomenon of stochastic behavior as we show in Figure 2f. However, abrupt stochastic jumps that occur with a constant rate do not produce persistent changes in the observed rate because it is by definition, constant. The theory that there are local and global searches is based on the observation that individual worms often abruptly change their reorientation rates. But this observation is only true for a fraction of worms. We are trying to argue that the reason why this is not observed for all, or even most worms is because these are the result of stochastic sampling, not a sudden change in search strategy.

      (2) In the manuscript, the Gillespie algorithm is very poorly explained, even for readers who already understand the algorithm; for those who do not it will be essentially impossible to comprehend. To take just a few examples: in Equation (1), omega is defined as reorientations instead of cumulative reorientations; it is unclear how (4) follows from (2) and (3); notation in (5), line 133, and (7) is idiosyncratic. Figure 1a does not help, partly because the notation is unexplained. For example, what do the arrows mean, what does "*" mean?

      We apologize for this, you are correct, 𝛀 is cumulative reorientations, and we have edited the text for clarity (Lines 107-192):

      We apologize for the arrow notation confusion. Arrow notation is commonly used in pseudocode to indicate variable assignment, and so we used it to indicate variable assignment updates in the algorithm.

      We added Figure 2a to help explain the Gillespie algorithm for people who are unfamiliar with it, but you are correct, some notation, like probabilities, were left unexplained. We have added more text to the figure legend. Hopefully this additional text, along with lines 105-190, provide better clarification.

      (3) In the model, the reorientation rate dΩ⁄dt declines to zero but the empirical rate clearly does not. This is a major flaw. It would have been easy to fix by adding a constant to the exponentially declining rate in (1). Perhaps fixing this obvious problem would mitigate the discrepancies between the data and the model in Figure 2d.

      You are correct that the model deviates slightly at longer times, but this result is consistent with Klein et al. that show a continuous decline of reorientations. However, we have added a constant to the model (b, Equation 2), since an infinite run length is likely not physiological.

      (4) Evidence that the model fits the data (Figure 2d) is unconvincing. I would like to have seen the proportion of runs in which the model generated one as opposed to multiple or no transitions in reorientation rate; in the real data, the proportion is 50% (Lopez). It is claimed that the "model demonstrated a continuum of switching to non-switching behavior" as seen in the experimental data but no evidence is provided.

      We should clarify that the 50% proportion cited by López-Cruz was based on an arbitrary difference in slopes, and by assessing the data visually (López-Cruz, Figure S2). We added a comment in the text to clarify this (Lines 76 – 78). We sought to avoid this subjective assessment by plotting the distribution of slopes and transition times produced by the method used in López-Cruz. We should also clarify by what we meant by “a continuum of switching and non-switching” behavior. Both the transition time distributions and the slope-difference distributions do not appear to be the result of two distributions (the distributions in Figure 1 are not bimodal). This is unlike roaming and dwelling on food, where two distinct distributions of behavioral metrics can be identified based on speed and angular speed (Flavell et al, 2009, Fig S2a).

      Based on the advice of Reviewer #3, we have also modeled the data using different starting amounts of M (M<sub>0</sub>). By definition, an initial value of M<sub>0</sub> = 1 is a two-state switching strategy; the worm either uses a reorientation rate of a (when M = 1) or b (when M = 0). As expected, this does produce a bimodal distribution of slope differences (Figure 3b), which is significantly different than the experimental distribution (Figure 3c). We have added a new section to explain this in more detail (Lines 253 – 297).

      (5) The explanation for the poor fit between the model and data (lines 166-174) is unclear. Why would externally triggered collisions cause a shift in the transition distribution?

      Thank you, we rewrote the text to clarify this better (Lines 227-233). There were no externally triggered collisions; 10 animals were used per experiment. They would occasionally collide during the experiment, but these collisions were excluded from the data that were provided. However, worms are also known to increase reorientations when they encounter a pheromone trail, and it is unknown (from this dataset) which orientations may have been a result of this phenomenon.

      (6) The discussion of Levy walks and the accompanying figure are off-topic and should be deleted.

      Thank you, we agree that this topic is tangential, and we removed it.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors build a statistical model that stochastically samples from a timeinterval distribution of reorientation rates. The form of the distribution is extracted from a large array of behavioral data, and is then used to describe not only the dynamics of individual worms (including the inter-individual variability in behavior), but also the aggregate population behavior. The authors note that the model does not require assumptions about behavioral state transitions, or evidence accumulation, as has been done previously, but rather that the stochastic nature of behavior is "simply the product of stochastic sampling from an exponential function".

      Strengths:

      This model provides a strong juxtaposition to other foraging models in the worm. Rather than evoking a behavioral transition function (that might arise from a change in internal state or the activity of a cell type in the network), or evidence accumulation (which again maps onto a cell type, or the activity of a network) - this model explains behavior via the stochastic sampling of a function of an exponential decay. The underlying model and the dynamics being simulated, as well as the process of stochastic sampling, are well described and the model fits the exponential function (Equation 1) to data on a large array of worms exhibiting diverse behaviors (1600+ worms from Lopez-Cruz et al). The work of this study is able to explain or describe the inter-individual diversity of worm behavior across a large population. The model is also able to capture two aspects of the reorientations, including the dynamics (to switch or not to switch) and the kinetics (slow vs fast reorientations). The authors also work to compare their model to a few others including the Levy walk (whose construction arises from a Markov process) to a simple exponential distribution, all of which have been used to study foraging and search behaviors.

      Weaknesses:

      This manuscript has two weaknesses that dampen the enthusiasm for the results. First, in all of the examples the authors cite where a Gillespie algorithm is used to sample from a distribution, be it the kinetics associated with chemical dynamics, or a Lotka-Volterra Competition Model, there are underlying processes that govern the evolution of the dynamics, and thus the sampling from distributions. In one of their references, for instance, the stochasticity arises from the birth and death rates, thereby influencing the genetic drift in the model. In these examples, the process governing the dynamics (and thus generating the distributions from which one samples) is distinct from the behavior being studied. In this manuscript, the distribution being sampled is the exponential decay function of the reorientation rate (lines 100-102). This appears to be tautological - a decay function fitted to the reorientation data is then sampled to generate the distributions of the reorientation data. That the model performs well and matches the data is commendable, but it is unclear how that could not be the case if the underlying function generating the distribution was fit to the data.

      Thank you, we apologize that this was not clearer. In the Lotka-Volterra model, the density of predators and prey are being modeled, with the underlying assumption that rates of birth and death are inherently stochastic. In our model, the number of reorientations are being modeled, with the assumption (based on the experiments), that the occurrence of reorientations is stochastic, just like the occurrence (birth) of a prey animal is stochastic. However, the decay in M is phenomenological, and we speculate about the nature of M later in the manuscript.

      You are absolutely right that the decay function for M was fit to the population average of reorientations and then sampled to generate the distributions of the reorientation data. This was intentional to show that the parameters chosen to match the population average would produce individual trajectories with comparable stochastic “switching” as the experimental data. All we’re trying to show really is that observed sudden changes in reorientation that appear persistent can be produced by a stochastic process without resorting to binary state assignments. In Calhoun, et al 2014 it is reported all animals produced switch-like behavior, but in Klein et al, 2017 it is reported that no animals showed abrupt transitions. López-Cruz et al seem to show a mix of these results, which can easily be explained by an underlying stochastic process.

      The second weakness is somewhat related to the first, in that absent an underlying mechanism or framework, one is left wondering what insight the model provides.

      Stochastic sampling a function generated by fitting the data to produce stochastic behavior is where one ends up in this framework, and the authors indeed point this out: "simple stochastic models should be sufficient to explain observably stochastic behaviors." (Line 233-234). But if that is the case, what do we learn about how the foraging is happening? The authors suggest that the decay parameter M can be considered a memory timescale; which offers some suggestion, but then go on to say that the "physical basis of M can come from multiple sources". Here is where one is left for want: The mechanisms suggested, including loss of sensory stimuli, alternations in motor integration, ionotropic glutamate signaling, dopamine, and neuropeptides are all suggested: these are basically all of the possible biological sources that can govern behavior, and one is left not knowing what insight the model provides. The array of biological processes listed is so variable in dynamics and meaning, that their explanation of what governs M is at best unsatisfying. Molecular dynamics models that generate distributions can point to certain properties of the model, such as the binding kinetics (on and off rates, etc.) as explanations for the mechanisms generating the distributions, and therefore point to how a change in the biology affects the stochasticity of the process. It is unclear how this model provides such a connection, especially taken in aggregate with the previous weakness.

      Providing a roadmap of how to think about the processes generating M, the meaning of those processes in search, and potential frameworks that are more constrained and with more precise biological underpinning (beyond the array of possibilities described) would go a long way to assuaging the weaknesses.

      Thank you, these are all excellent points. We should clarify that in López-Cruz et al, they claim that only 50% of the animals fit a local/global search paradigm. We are simply proposing there is no need for designating local and global searches if the data don’t really support it. The underlying behavior is stochastic, so the sudden switches sometimes observed can be explained by a stochastic process where the underlying rate is slowing down, thus producing the persistently slow reorientation rate when an apparent “switch” occurs. What we hope to convey is that foraging doesn’t appear to follow a decision paradigm, but instead a gradual change in reorientations which for individual worms, can occasionally produce reorientation trajectories that appear switch-like.

      As for M, you are correct, we should be more explicit, and we have added text (Lines 319-359) to expand upon its possible biological origin.

      Reviewer #3 (Public review):

      Summary:

      This intriguing paper addresses a special case of a fundamental statistical question: how to distinguish between stochastic point processes that derive from a single "state" (or single process) and more than one state/process. In the language of the paper, a "state" (perhaps more intuitively called a strategy/process) refers to a set of rules that determine the temporal statistics of the system. The rules give rise to probability distributions (here, the probability for turning events). The difficulty arises when the sampling time is finite, and hence, the empirical data is finite, and affected by the sampling of the underlying distribution(s). The specific problem being tackled is the foraging behavior of C. elegans nematodes, removed from food. Such foraging has been studied for decades, and described by a transition over time from 'local'/'area-restricted' search'(roughly in the initial 10-30 minutes of the experiments, in which animals execute frequent turns) to 'dispersion', or 'global search' (characterized by a low frequency of turns). The authors propose an alternative to this two-state description - a potentially more parsimonious single 'state' with time-changing parameters, which they claim can account for the full-time course of these observations.

      Figure 1a shows the mean rate of turning events as a function of time (averaged across the population). Here, we see a rapid transient, followed by a gradual 4-5 fold decay in the rate, and then levels off. This picture seems consistent with the two-state description. However, the authors demonstrate that individual animals exhibit different "transition" statistics (Figure 1e) and wish to explain this. They do so by fitting this mean with a single function (Equations 1-3).

      Strengths:

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Weaknesses:

      (1) The authors claim that only about half the animals tested exhibit discontinuity in turning rates. Can they automatically separate the empirical and model population into these two subpopulations (with the same method), and compare the results?

      Thank you, we should clarify that the observation that about half the animals exhibit discontinuity was not made by us, but by López-Cruz et al. The observed fraction of 50% was based on a visual assessment of the dual regression method we described. We added text (Lines 76-79) to clarify this. To make the process more objective, we decided to simply plot the distributions of the metrics they used for this assessment to see if two distinct populations could be observed. However, the distributions of slope differences and transition times do not produce two distinct populations. Our stochastic approach, which does not assume abrupt state-transitions, also produces comparable distributions. To quantify this, we have added a section varying M<sub>0</sub>, including setting M<sub>0</sub> to 1, so that the model by definition is a switch model. This model performs the worst (Lines 253-296, Figure 3).

      (2) The equations consider an exponentially decaying rate of turning events. If so, Figure 2b should be shown on a semi-logarithmic scale.

      We chose to not do this because this average is based on the number of discrete reorientation events observed within a 2-minute window. The range of events ranges from 0 to 6 (hence a rate of 0.5-3 min<sup>-1</sup>), which does not span one order of magnitude. Instead, we included a heat map (Figure 1a, Figure 2b bottom panel) which shows the density that the average is based on. We hope this provides some clarity to the reader.

      (3) The variables in Equations 1-3 and the methods for simulating them are not well defined, making the method difficult to follow. Assuming my reading is correct, Omega should be defined as the cumulative number of turning events over time (Omega(t)), not as a "turn" or "reorientation", which has no derivative. The relevant entity in Figure 1a is apparently <Omega (t)>, i.e. the mean number of events across a population which can be modelled by an expectation value. The time derivative would then give the expected rate of turning events as a function of time.

      Thank you, you are correct. Please see response to Reviewer #1.

      (4) Equations 1-3 are cryptic. The authors need to spell out up front that they are using a pair of coupled stochastic processes, sampling a hidden state M (to model the dynamic turning rate) and the actual turn events, Omega(t), separately, as described in Figure 2a. In this case, the model no longer appears more parsimonious than the original 2-state model. What then is its benefit or explanatory power (especially since the process involving M is not observable experimentally)?

      Thank you, yes we see how as written this was confusing. In our response to Reviewer #1, and in the text, we added an important detail:

      While reorientations are modeled as discrete events, which is observationally true, the amount of M at time t=0 is chosen to be large (M<sub>0</sub> = 1000), so that over the timescale of 40 minutes, the decay in M is practically continuous. This ensures that sudden changes in reorientations are not due to sudden changes in M, but due to the inherent stochasticity of reorientations.

      However you are correct that if M was chosen to have a binary value of 0 or 1, then this would indeed be the two state model. We added a new section to address this (Lines 253-287, Figure 3). Unlike the experiments, the two-state model produces bimodal distributions in slope and transition times, and these distributions are significantly different than the experimental data (Figure 3).

      (5) Further, as currently stated in the paper, Equations 1-3 are only for the mean rate of events. However, the expectation value is not a complete description of a stochastic system. Instead, the authors need to formulate the equations for the probability of events, from which they can extract any moment (they write something in Figure 2a, but the notation there is unclear, and this needs to be incorporated here).

      Thank you, yes please see our response to Reviewer #1. We have clarified the text in Lines 105-190.

      (6) Equations 1-3 have three constants (alpha and gamma which were fit to the data, and M0 which was presumably set to 1000). How does the choice of M0 affect the results?

      Thank you, this is a good question. We address this in lines 253-296. Briefly, the choice of M<sub>0</sub> does not have a strong effect on the results, unless we set it to M<sub>0</sub>, which by definition, creates a two-state model. This model was significantly different than the experimental data, relative to the other models (Figure 3c).

      (7) M decays to near 0 over 40 minutes, abolishing omega turns by the end of the simulations. Are omega turns entirely abolished in worms after 30-40 minutes off food? How do the authors reconcile this decay with the leveling of the turning rate in Figure 1a?

      Yes, Reviewer #1 recommended adding a baseline reorientation rate which we did for all models (Equation 2). However, we should also note that in Klein et al they observed a continuous decay over 50 minutes. Though realistically, it is likely not plausible that worms will produce infinitely long runs at long time points.

      (8) The fit given in Figure 2b does not look convincing. No statistical test was used to compare the two functions (empirical and fit). No error bars were given (to either). These should be added. In the discussion, the authors explain the discrepancy away as experimental limitations. This is not unreasonable, but on the flip side, makes the argument inconclusive. If the authors could model and simulate these limitations, and show that they account for the discrepancies with the data, the model would be much more compelling.

      To do this, I would imagine that the authors would need to take the output of their model (lists of turning times) and convert them into simulated trajectories over time. These trajectories could be used to detect boundary events (for a given size of arena), collisions between individuals, etc. in their simulations and to see their effects on the turn statistics.

      Thank you, we have added dashed lines to indicate standard deviation to Figures 2b and 3a. After running the models several times, we found that some of the small discrepancies noted (like s<sub>1</sub>-s<sub>2</sub> < 0 for experiments but not the model), were spurious due to these data points being <1% of the data, so we cut this from the text. To compare how similar the continuous (M<sub>0</sub> > 1) and discrete (M<sub>0</sub> = 1) models were to the experimental data, we calculated a Jensen-Shannon distance for the models, and found that the discrete model was significantly more dissimilar to the experimental data than the continuous models (Lines 289-296, Figure 3c).

      (9) The other figures similarly lack any statistical tests and by eye, they do not look convincing. The exception is the 6 anecdotal examples in Figure 2e. Those anecdotal examples match remarkably closely, almost suspiciously so. I'm not sure I understood this though - the caption refers to "different" models of M decay (and at least one of the 6 examples clearly shows a much shallower exponential). If different M models are allowed for each animal, this is no longer parsimonious. Are the results in Figure 2d for a single M model? Can Figure 2e explain the data with a single (stochastic) M model?

      We certainly don’t want the panels in Figure 2e to be suspicious! These comparisons were drawn from calculating the correlations between all model traces and all experimental traces, and then choosing the top hits. Every time we run the simulation, we arrive at a different set of examples. Since it was recommended we add a baseline rate, these examples will be a completely different set when we run the simulation, again.

      We apologize for the confusion regarding M. Since the worms do not all start out with identical reorientation rates, we drew the initial M value from a distribution centered on M<sub>0</sub> to match the initial distribution of observed experimental rates (Lines 206-214). However, the decay in M (γ), as well as α and β, are the same for all in silico animals.

      (10) The left axes of Figure 2e should be reverted to cumulative counts (without the normalization).

      Thank you, we made this change.

      (11) The authors give an alternative model of a Levy flight, but do not give the obvious alternative models:<br /> a) the 1-state model in which P(t) = alpha exp (-gamma t) dt (i.e. a single stochastic process, without a hidden M, collapsing equations 1-3 into a single equation).

      b) the originally proposed 2-state model (with 3 parameters, a high turn rate, a low turn rate, and the local-to-global search transition time, which can be taken from the data, or sampled from the empirical probability distributions). Why not? The former seems necessary to justify the more complicated 2-process model, and the latter seems necessary since it's the model they are trying to replace. Including these two controls would allow them to compare the number of free parameters as well as the model results. I am also surprised by the Levy model since Levy is a family of models. How were the parameters of the Levy walk chosen?

      Thank you, we removed this section completely, as it is tangential to the main point of the paper.

      (12) One point that is entirely missing in the discussion is the individuality of worms. It is by now well known that individual animals have individual behaviors. Some are slow/fast, and similarly, their turn rates vary. This makes this problem even harder. Combined with the tiny number of events concerned (typically 20-40 per experiment), it seems daunting to determine the underlying model from behavioral statistics alone.

      Thank you, yes we should have been more explicit in the reasoning behind drawing the initial M from a distribution (response to comment #9). We assume that not every worm starts out with the same reorientation rate, but that some start out fast (high M) and some start out slow (low M). However, we do assume M decays with the same kinetics, which seems sufficient to produce the observed phenomena. Multiple decay rates are not needed to replicate the experimental data.

      (13) That said, it's well-known which neurons underpin the suppression of turning events (starting already with Gray et al 2005, which, strangely, was not cited here). Some discussion of the neuronal predictions for each of the two (or more) models would be appropriate.

      Thank you, yes we will add Gray et al, but also the more detailed response to Reviewer #2 (Lines 319-359 of manuscript).

      (14) An additional point is the reliance entirely on simulations. A rigorous formulation (of the probability distribution rather than just the mean) should be analytically tractable (at least for the first moment, and possibly higher moments). If higher moments are not obtainable analytically, then the equations should be numerically integrable. It seems strange not to do this.

      Thank you for suggesting this. For the Levy section (which we cut) this would have been an improvement. However, since the distributions of slope differences and transition times are based on a recursive algorithm, rather than an analytical formulation, we decided to use the Jensen-Shannon divergence to compare distributions (Lines 272-296, Figure 3c) since this is a parameter-free approach.

      In summary, while sample simulations do nicely match the examples in the data (of discontinuous vs continuous turning rates), this is not sufficient to demonstrate that the transition from ARS to dispersion in C. elegans is, in fact, likely to be a single 'state', or this (eq 1-3) single state. Of course, the model can be made more complicated to better match the data, but the approach of the authors, seeking an elegant and parsimonious model, is in principle valid, i.e. avoiding a many-parameter model-fitting exercise.

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Thank you, we agree that this is a generic phenomenon, which is partly why we did this. The data from López-Cruz seem to agree in part with Calhoun et al, that claim abrupt transitions occur, and Klein et al, which claim they do not occur. Since the underlying phenomenon is stochastic, we propose the mixed observations of sudden and gradual changes in search strategy are simply the result of a stochastic process, which can produce both phenomena for individual observations. We hope this work can help clarify why sudden changes in search strategy are not consistently observed. We propose a simple hypothesis that there is no change in search strategy. The reorientation rate decays in time, and due to the stochastic nature of this behavior, what appears as a sudden change for individual observations is not due to an underlying decision, but rather the result of a stochastic process.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      • This manuscript represents a full revision incorporating all reviewer recommendations; the additional follow-up experiments and expanded analyses will be presented in dedicated subsequent manuscripts.
      • Congenital dyserythropoietic anemia type I (CDA-I) is a rare hereditary disease characterized by ineffective erythropoiesis and mutations in Codanin1 and CDIN1.
      • Our study reveals the structural and functional dynamics of the CDIN1-Codanin1 complex, shedding light on the molecular mechanisms of protein-protein interactions implicated in CDA-I pathology.
      • The main goal of our study was to examine the interaction between CDIN1 and the C‑terminal binding domain of Codanin1 using complementary biophysical approaches.
      • We quantified binding and identified interacting regions of Codanin1 and CDIN1.
      • We found that CDA-I-associated mutations in interacting regions disturb CDIN1‑Codanin1 complex.
      • We proposed a hypothetical molecular model of CDIN1-Codanin1 role in CDA-I hallmarks development.
      • Our initial studies on BioRxiv (2023) have been cited by leading publications in the field (Jeong, Frater et al. 2025, Sedor and Shao 2025, Nature Communications) and prompted further research on this topic.

      2. Point-by-point description of the revisions

      *Here we provide a point-by-point reply describing the revisions already carried out and included in the transferred manuscript. *

      Reply to the reviewers

      Reviewer #1 – Evidence, reproducibility and clarity

      This is a rigorous biophysical characterization of a protein-protein interaction relevant to CDA-1 disease. The two proteins were purified in an E. coli host but CD and DLS was performed to ensure that the purified protein is well folded. An impressive native protein EMSA was used to show a 1:1 complex. While common for protein-nucleic acid complexes, EMSAs are much more challenging with protein complexes. A higher-running complex, likely a heterotetramer was implied at higher protein concentrations. These results were supported with SEC-MALS analysis and analytic ultracentrifugation analysis. Thermophoresis and ITC were used to report a nanomolar affinity of the proteins for each other. SEC-SAXS supported the conclusions about stoichiometry and composition inferred from the earlier methods and suggested that the dimerization interface comes from CDIN1. Next HDX-MS was used to identify putative interface residues, which were then mutated in each of the proteins and assessed for binding using coimmunoprecipitation. This study uses at least 10 orthogonal biophysical and/or biochemical methodologies to characterize an important protein-protein interaction and the analysis is clear and so is the writing. I couldn't (reading it once) find any grammatical or other errors in the text or figures. This manuscript is top-quality and suitable for publication.

      __Reviewer #1 – Significance __

      Such detailed structural and mechanistic studies are greatly lacking in many clinical conditions for which mutations are known (unless they cause cancer, neurodegenerative disease, and so on). We need more such studies on disease topics! This study will be of interest to the hematologic diseases community.

      1. Response – ____Significance

      We thank Reviewer #1 for the thoughtful and encouraging evaluation of our work. We are particularly grateful for recognizing the significance of studying protein-protein interaction in the context of CDA-I disease, as well as the rigor and clarity of our biophysical and biochemical characterization.

      We appreciate the reviewer's acknowledgment of the challenges associated with native protein EMSAs. We are pleased that our use of multiple orthogonal techniques was recognized as a strength of the study. We are gratified that the comprehensiveness and coherence of our data and the manuscript's clarity were well received.

      We thank the reviewer for noting the broader impact of our findings on the hematologic disease community. As highlighted, there is a pressing need for a mechanistic understanding of non-oncologic, non-neurodegenerative diseases, and our studies address this gap.

      We are honored by the reviewer's endorsement of our manuscript as "top-quality and suitable for publication". We value the reviewer's highly supportive and motivating feedback.

      __Reviewer #2 – 1. Evidence, reproducibility and clarity __

      This manuscript presents structural and biochemical characterization of the interaction between CDIN1 and the C-terminal domain of Codanin1, shedding light on a complex implicated in Congenital Dyserythropoietic Anemia Type I (CDA-I). While the authors provide valuable structural insights and identify disease-associated mutations that impair CDIN1-Codanin1 binding, I think several important concerns should be addressed to strengthen both the mechanistic claims and their functional relevance.

      Contradiction Between Stoichiometry Models:

      The authors propose that CDIN1 and Codanin1Cterm primarily form a heterodimer in vitro. However, this appears to contradict previous reports indicating a tetra-heteromeric arrangement. Additionally, while CDIN1 homodimerize seems confusing to me, do the authors suggest it is stable without Codanin1? This seems contrary to findings that CDIN1 is unstable in the absence of Codanin1 (Sedor, S.F., Shao, S. nature comm 2025, Swickley, G., Bloch, Y., Malka, L. et al 2020 BMC Mol and Cell Biol). These inconsistencies raise concerns about whether the observed stoichiometries are physiologically relevant or artifacts of in vitro reconstitution, especially since full-length Codanin1 was not studied.

      2.1 Response ____– Consistent stoichiometry of Codanin1Cterm

      We thank Reviewer #2 for raising critical points regarding the stoichiometry and physiological relevance of the CDIN1-Codanin1 interaction. The following response clarifies the rationale and interpretation in relation to previous findings.

      Stoichiometry of CDIN1-Codanin1Cterm complex:

      Recent Cryo-EM studies of full-length Codanin1 (Jeong, Frater et al. 2025, Sedor and Shao 2025) suggest independent internal dimerization domains (452-798 and 841-1000 amino acid residue) driving homodimer formation, with each Codanin1 monomer binding one CDIN1 via the C-terminal region (1005-1227 amino acid residue), resulting in a tetra-heteromeric complex. Therefore, the complete assembly appears as a dimer of heterodimers in the full-length context.

      In our study, Codanin1 was truncated to retain only the CDIN1-binding C-terminus (1005-1227 amino acid residues), eliminating the homodimerization ability of Codanin1. Hence, in the case of truncated Codanin1Cterm, the minimal complex we observe is a 1:1 heterodimer of CDIN1-Codanin1Cterm, which is fully consistent with the equimolar stoichiometry of CDIN1-Codanin1 complex seen in the full-length structure.

      Stability and oligomeric state of CDIN1 in the absence of Codanin1:

      We concur with the reviewer that Sedor et al. (2025) and Swickley et al. (2020) reported decreased CDIN1 levels in cells lacking Codanin1, implying in vivo dependence of CDIN1 on Codanin1 partner for stability (Swickley, Bloch et al. 2020, Sedor and Shao 2025). The purified CDIN1 is monodisperse (Supplementary Figure 2D), exhibits thermal stability with a melting temperature of 48 °C (Supplementary Figure 2E), and displays proper folding as indicated by CD measurements (Supplementary Figure 2B). Additionally, SAXS profiles of CDIN1 correspond to AlphaFold predictions (Fig. 2B). Together, our findings indicate that the recombinant CDIN1 forms a stable conformation in vitro without Codanin1. To the best of our knowledge, no previous research has directly identified the endogenous oligomeric states of CDIN1 within cellular content.

      We fully acknowledge that future analysis of the full-length Codanin1-CDIN1 assembly in a cellular context will be necessary for understanding physiological stoichiometries. As outlined in the General statements, our study focuses on the C-terminus of Codanin1 to describe the binding interface and complex biophysical properties of the CDIN-Codanin1Cterm complex.

      __Reviewer #2 – ____2. Unvalidated Functional Claims: __

      The manuscript identifies several CDA-I-associated mutations that disrupt CDIN1-Codanin1 interaction. However, the authors do not test how these mutations affect the biological function of the complex, particularly its role in ASF1 sequestration or histone trafficking. Given the central importance of this axis in their disease model, functional validation (e.g., ASF1 localization, histone deposition assays) is necessary to support these mechanistic conclusions.

      2.2 Response – ____Hypothetical model as discussion merit

      We thank the reviewer for the comment regarding the functional implications of CDA-I-associated mutations and their potential impact on ASF1 sequestration and histone trafficking hypothesized within the Discussion. We fully agree that understanding the downstream biological consequences of disrupted CDIN1-Codanin1 interaction is critical for elucidating the full molecular basis of CDA-I pathogenesis.

      In the Future research directions of the Discussion, we have acknowledged and emphasized the need for follow-up studies using erythroblast cell lines to determine whether specific disease-associated mutations disrupt CDIN1-Codanin1 binding, leading to functional defects relevant to erythropoiesis and nuclear architecture typical for CDA-I disease.

      However, as we respectfully note in General Statements, the main aim of the present study was to provide a rigorous biophysical characterization of the CDIN1-Codanin1Cterm interaction. Proposed cellular experiments, though relevant, are beyond the conceptual scope of the presented studies.

      Reviewer #2 – ____3. Speculative and Potentially Contradictory Model:

      The proposed model suggests that CDIN1 competes with ASF1 for Codanin1 binding, thereby indirectly promoting histone delivery to the nucleus. However, emerging data indicate that Codanin1, CDIN1, and ASF1 can form a stable ternary complex, calling into question this competitive binding hypothesis (Sedor, S.F., Shao, S. nature comm 2025). The authors do not acknowledge or discuss these findings, and the model in its current form may therefore be oversimplified or inaccurate.

      2.3 Response – ____Hypothetical model fully aligned with current knowledge

      We fully acknowledged and discussed in the current manuscript the recent findings demonstrating that Codanin1, CDIN1, and ASF1 can form a ternary complex (Sedor, S.F., Shao, S. Nature Comm. 2025; Jeong, T. K. et al. Nature Comm. 2025). Our revised model was updated accordingly to reflect the collaborative binding of Codanin1, CDIN1, and ASF1, and is presented in alignment with published data.

      While earlier versions of our work published on the BioRxiv server (May 26, 2023) proposed a competitive hypothesis, the current manuscript incorporates recent literature and prior reviewer feedback to offer a refined model. We believe that the updated hypothesis suggests a plausible mechanism for how CDIN1 modulates Codanin1 function, which will be further tested in future cellular studies.

      Reviewer #2 – 4. Significance:

      Overall, the study adds to our structural understanding of CDIN1 and Codanin1 interactions, but the functional interpretations are currently speculative, and in some cases in conflict with existing literature. The manuscript would benefit significantly from addressing these discrepancies, incorporating relevant data on ASF1, and clarifying whether the observed assemblies reflect physiological complexes.

      __2.4 Response – Significance __

      We thank Reviewer #2 for the constructive feedback. As noted in General Statements, our current manuscript is primarily dedicated to defining the molecular architecture and interactions of the CDIN1–Codanin1Cterm core interface. We agree that follow-up ASF1‑dependent functional assays will be critical to fully validate observed assemblies, but these experiments lie outside the scope of the present study and are ongoing in our laboratory.

      To address the reviewer's concern about possible speculative interpretation, we have:

      • Used cautious language in Results and Discussion to prevent overstatement (e.g., page 31, line 754, “leads” exchanged to “may contribute” in legend of Fig. 4).
      • Described in the Discussion how our results enhance and add understanding to the body of published structural data of CDIN1–Codanin1Cterm.
      • Updated our hypothetical model in Fig. 4 to be fully in line with published data.
      • Clearly stated that the working hypothesis is connected with a subset of CDA-I mutations (p. 31, l. 758-759, “The proposed model represents a working hypothesis relating to a subset of CDA-I mutations and is not currently substantiated by experimental evidence at the cellular level.”)
      • Stated in Future research directions of Discussion that functional validation, including ASF1, will motivate future critical studies, p. 32, l. 771-773: “The ability of Codanin1 to interact with both CDIN1 and ASF1 motivates further investigation of how CDIN1 and ASF1 affect the function of full-length Codanin1, which even recent cryo-EM data has not addressed yet.”
      • Highlighted the necessity of complementary in vivo studies in erythroblast cell lines to determine if CDA-I-related mutations in CDIN1-Codanin1 interaction region cause typical CDA-I phenotypes, aiming to clarify the molecular mechanisms of inherited CDA-I anemia. We state in Future research directions in Discussion, p. 32, l. 774-780: “…follow-up research utilizing erythroblast model cell lines must be conducted to determine if specific mutations that disrupt CDIN1-Codanin1 binding also affect ASF1 localization and cause a phenotype typical of CDA-I. In future work, additional Codanin1 mutations, including those outside the C-terminal region, should be evaluated to determine how the mutations affect ASF1’s nuclear concentration and subcellular localization. The proposed research directions will provide additional deeper insights into the underlying mechanisms of the molecular origin of inherited anemia CDA-I.” We believe that the revisions objectively clarify the significance and the limits of the current work and set the stage for the detailed functional studies to follow.

      __Reviewer #3 – Evidence, reproducibility and clarity: __

      Congenital Dyserythropoietic Anemia Type I (CDA I) is an autosomal recessive disorder characterized by ineffective erythropoiesis and distinctive nuclear morphology ("Swiss cheese" heterochromatin) in erythroblasts. CDA I is caused by mutations in CDAN1 and CDIN1. Codanin1, encoded by CDAN1, is part of the cytosolic ASF1-H3.1-H4-Importin-4 complex, which regulates histone trafficking to the nucleus. CDIN1 has been shown to bind the C-terminal domain of Codanin-1, but until now, pathogenic mutations had not been directly linked to the disruption of this interaction.

      In this study, the authors used biophysical techniques to characterize the interaction between Codanin-1's C-terminal region (residues 1005-1227) and CDIN1, demonstrating high-affinity, equimolar binding. HDX-MS identified interaction hotspots, and disease-associated mutations in these regions disrupted complex formation. The authors propose that such disruption prevents ASF1 sequestration in the cytoplasm, thereby reducing nuclear histone levels and contributing to the chromatin abnormalities seen in CDA I.

      Major Comments:

      1. Use of Codanin-1 Fragment:

      Most experiments were conducted using only the C-terminal 223 amino acids of Codanin-1. While this region is known to bind CDIN1, it is unclear whether its conformation is maintained in the context of the full-length protein. This could affect binding properties and structural interpretations. The authors should discuss how structural differences between the isolated C-terminus and the full-length Codanin-1 may influence the conclusions.

      Response of authors ____#3

      3.1 Response: Use of Codanin-1 Fragment as biding part to CDIN1

      We thank the reviewer for the important observation regarding the use of the C-terminal fragment of Codanin1. As noted in the manuscript (e.g., p. 30, line 721 and p. 32, line 761), we fully acknowledge that the truncation of Codanin1 may influence its conformational dynamics or contextual folding relative to the full-length protein.

      However, several lines of evidence suggest that the C-terminal 223 amino acid residues—responsible for CDIN1 binding—are structurally autonomous and have minimal intramolecular contacts with upstream regions. Published cryo-EM and biochemical data (Jeong, Frater et al. 2025, Sedor and Shao 2025), in conjunction with AlphaFold structural predictions (Fig. 2D) and our co-immunoprecipitation assays (Fig. 3F), consistently support a model wherein the CDIN1-binding region is flexible and spatially isolated from the core structural domains of Codanin1. Additionally, results from our co-immunoprecipitation assay (Fig. 3F) indicate that full-length Codanin1 and truncated Codanin1Cterm interact with CDIN1 similarly, further supporting the isolated manner of the C-terminal fragment. The available data together imply that the C-terminal fragment used in our study retains its native conformation and binding properties when expressed independently.

      While our findings are confined to the interaction domain and do not reflect full-length Codanin1’s architecture, we believe the use of the C-terminal minimal fragment of Codanin1 enables precise dissection of the CDIN1-binding interface and yields mechanistic insights without introducing significant structural artifacts.

      We agree with the reviewer that future work incorporating full-length Codanin1, especially in a cellular context, will be instrumental to fully characterize higher-order assembly and regulatory functions.

      __Reviewer #3 – 2. ____Graphical Abstract and Domain Independence: __

      The graphical abstract presents the Codanin-1 C-terminus as an independent domain, but no direct evidence is provided to support its structural autonomy in vivo.

      The authors should clarify whether the C-terminal region functions as a distinct domain in the context of the full-length protein.

      __3.2 Response –____ Independent C-terminal domain __

      We thank the reviewer for bringing up the question of the independence of the C-terminal domain. Although direct in vivo proof of C-terminal autonomy is not yet available, published cryo-EM structures of full-length Codanin1, our biophysical characterization, and AlphaFold models all consistently indicate that the C-terminal 223 amino acid residues of Codanin1 form a structurally independent binding module. In the graphical abstract, we illustrated the C‑terminal domain as a loosely connected part of Codanin1 to highlight its independence and to emphasize the specific focus of our studies.

      To articulate limitations of our studies focused on the C-terminal part of Codanin1, we stated in the Functional implications of CDA-I-related mutations in the Discussion, p. 30, l. 721-724: “However, our measurements do not exclude the possible role of the disordered regions in full-length Codanin1. For example, CDIN1 could potentially stabilize full-length Codanin1 by rearranging the disordered regions into a more condensed structure, thereby augmenting the structural stability of Codanin1.”

      Reviewer #3 – 3.____Pathogenic Mutations Beyond the Binding Site:

      The study highlights a triplet mutation that impairs CDIN1 binding. However, most CDA I‑associated mutations in CDAN1 are dispersed across the entire protein and may not affect CDIN1 interaction directly.

      The authors should discuss alternative mechanisms by which mutations in other regions of Codanin-1 might cause disease.

      3.3 Response – Pathogenic mutations outside the binding site – alternative mechanisms

      We appreciate the reviewer noting that most CDA-I-associated CDAN1 mutations are outside the CDIN1-Codanin1 binding site and suggesting alternative mechanisms. In the revised Discussion, we added a paragraph on alternative pathogenic models, p. 29, l. 702-713:

      "Our study centers on the CDIN1-binding C-terminus, however, most CDA-I-associated CDAN1 mutations lie elsewhere and probably act through alternative mechanisms. Mutations such as P672L and F868I in the LOBE2 (452-798 amino acid residue) and F868I in the coiled-coil (841-1000 amino acid residue) domains may disturb Codanin1 homodimerization and higher-order complex assembly, directly affecting ASF1 sequestration (Jeong, T. K. et al. Nature Comm. 2025). Other mutant variants may also interfere with ASF1 sequestration, nuclear targeting, or chromatin-remodeling functions, while destabilizing mutations may induce misfolding and proteasomal degradation. Moreover, CDA-I-associated mutations, such as R714W and R1042W, might compromise the interaction between Codanin1 and ASF1 (Ask, Jasencakova et al. 2012). Collectively, the complementary alternative pathogenic mechanisms associated with Codanin1 mutations in distal regions and mutations in CDIN1‑binding C-terminus of Codanin1 may contribute to erythroid dysfunction in CDA-I."

      Reviewer #3 – 4. ____Contradictory Functional Models:

      Ask et al. (EMBO J, 2012) reported that Codanin-1 depletion increases nuclear ASF1 and accelerates DNA replication. This contrasts with the current hypothesis that disruption of the Codanin-1/CDIN1 complex reduces nuclear ASF1.

      The authors should attempt to reconcile this apparent contradiction, possibly by proposing a context-specific or dual-function model for Codanin-1 in histone trafficking.

      3.4 Response – ____Clarified explanation of hypothetical functional model

      We thank the reviewer for raising this point, which improved the clarity of our work. There is no real discrepancy between Ask et al. and our findings; both agree that Codanin1 restrains ASF1 in the cytoplasm. Ask et al. examined the complete loss of Codanin1, which abolishes cytoplasmic ASF1 sequestration and thus leads to maximal nuclear accumulation. We suggest the CDA-I-associated mutations selectively disrupt the CDIN1-Codanin1 interface, releasing ASF1 from the cytoplasm into the nucleus.

      To enhance clarity, we now state in the legend of Figure 4 describing the hypothesis (p. 31, l. 752-753): "…CDA-I-associated mutations prevent CDIN1-Codanin1 complex formation, thus prevent ASF1 sequestration to cytoplasm; ASF1 remains accumulated in nucleus."

      Reviewer #3 – 5. ____Conclusions and Claims:

      The proposed model of CDA I pathogenesis (Fig. 4) is plausible but not yet fully supported by the available data. The authors suggest that disruption of the Codanin-1/CDIN1 interaction leads to nuclear histone depletion, but this has not been experimentally confirmed.

      Claims about the general pathogenesis of CDA I should be clearly qualified as hypothetical and applicable to a subset of mutations. The presence and localization of ASF1 in the nucleus following disruption of the Codanin-1/CDIN1 complex should be tested experimentally.

      3.5 Response – __Tempered ____conclusions and claims: __

      We thank the reviewer for underscoring the need to temper our conclusions and to distinguish hypotheses from available results. We fully agree that our Fig. 4 model—linking disruption of the Codanin1-CDIN1 interface to nuclear histone imbalance—remains a working hypothesis, currently supported by indirect biochemical and structural data.

      Accordingly, we have:

      • Revised the text to explicitly state that this model is hypothetical and pertains to a subset of CDA-I-associated CDAN1 mutations. Specifically, we

      • Added to the last paragraph of the section Functional implications of CDA-I-related mutations in Discussion (p. 31, l. 744-749): “In considering functional implications of our findings within available data, it is essential to qualify that mechanistic claims regarding the general pathogenesis of CDA-I remain hypothetical and are restricted to a specific subset of mutations. Furthermore, direct experimental validation, such as immunolocalization or live-cell imaging, to assess ASF1’s nuclear presence and distribution following disruption of the CDIN1-Codanin1 complex is required to substantiate the proposed model.”

      • Included in the legend of Fig. 4: ”The proposed model represents a working hypothesis relating to a subset of CDA-I mutations and is not currently substantiated by experimental evidence at the cellular level.”
      • Replaced any associated definitive language (e.g., “leads to”) with qualified phrasing (e.g., “may contribute to”) in the legend of Fig. 4.
      • Clarified in the Discussion that direct measurement of nuclear ASF1 redistribution and histone levels following interface disruption has not yet been performed. Specifically, we added to the section Functional implications of CDA-I-related mutations in Discussion (p. 30, l. 734-735): “It should be noted, however, that direct quantification of nuclear ASF1 redistribution and histone levels after CDIN1-Codanin1 disruption has not yet been conducted.” Although experimental verification of nuclear ASF1 localization upon CDIN1-Codanin1 complex disruption falls beyond the current manuscript’s scope, we acknowledge its importance and have emphasized the need for such studies in future work within the Future research directions of the Discussion. Specifically, we concluded by stating (p. 32, l. 774-776): “Finally, follow‑up research utilizing erythroblast model cell lines must be conducted to determine if specific mutations that disrupt CDIN1-Codanin1 binding, also affect ASF1 localization and cause a phenotype typical of CDA-I.”

      __Reviewer #3 – 6.____Broader Mutation Analysis and ASF1 Localization: __

      To strengthen the link between Codanin-1/CDIN1 disruption and disease pathogenesis, it would be important to test the effects of additional CDAN1 mutations, including those outside the C-terminal region. Similarly, the impact on ASF1 nuclear concentration and localization should be directly assessed. These experiments would significantly bolster the central hypothesis. If feasible, they should be pursued or at least acknowledged as important future directions.

      3.6 Response – Broader mutation analysis and ASF1 localization in future directions

      We thank Reviewer #3 for emphasizing the value of a broader mutation survey and direct ASF1 localization studies. As noted above, our current manuscript is centered on delineating the molecular architecture of the CDIN1-Codanin1Cterm core interface; comprehensive mutational analyses outside the C-terminal binding region and ASF1-dependent functional assays will be critical to extend these findings but fall beyond the scope of the present work and will be the objective of our following studies. To address the reviewer’s concern, we have:

      • Expanded the Future Directions section to specify that additional CDA-I-linked CDAN1 variants, including non-C-terminal mutations, and quantitative assessments of ASF1 nuclear localization will be the subject of ongoing and planned investigations. Specifically, we added (p. 32, l. 776-778):” In future work, additional Codanin1 mutations, including those outside the C-terminal region, should be evaluated to determine how the mutations affect ASF1’s nuclear concentration and subcellular localization.”

      • Emphasized the need for complementary in vivo validation in erythroblast models to confirm whether the disturbance of CDIN1-Codanin1 binding recapitulates CDA-I phenotypes. We acknowledged the need for cell-line studies in future work within the Future research directions of Discussion (p. 32, l. 774-776): “Finally, follow-up research utilizing erythroblast model cell lines must be conducted to determine if specific mutations that disrupt CDIN1-Codanin1 binding, also affect ASF1 localization and cause a phenotype typical of CDA-I.” We believe these changes more precisely delimit the scope and significance of the current study while laying out a clear roadmap for the essential follow-up experiments.

      Reviewer #3 – 7. ____Rigor and Presentation and Cross-commenting

      __Minor Comments: __

      • Methods and Reproducibility:

      The experimental methods are well described, and the results appear reproducible.

      • Presentation:

      The text and figures are clear and well organized.

      Referee Cross-commenting

      I agree with reviewer 1 that the paper presents detailed structure study of Codanin-1 and CDIN1 protein. However, as reviewer 2 claims functional studies are missing and therefore the hypothesis regarding the pahtogenesis of CDAI is speculaltive especially with no studies regarding ASF1.

      3____.7 Response ____–____ Rigor and Presentation and Cross-commenting:

      We thank the reviewers for their positive appraisal of our results' reproducibility, presentation, and method descriptions. We also appreciate the cross-comment that, while our structural analysis of the CDIN1-Codanin1 complex is thorough, functional validation, particularly regarding ASF1, remains to be addressed.

      As outlined above, we have revised the manuscript to:

      • Emphasize that pathogenic hypotheses drawn from structural data are provisional (refer to Responses 2.2, 2.3, and 3.5).
      • Include follow-up studies for ASF1 localization assays and broader mutation profiling in our Future Directions (refer to Responses 2.4, 3.5, 3.6).
      • Integrate cautious language throughout to clearly delineate verified findings from model-based speculation (refer to Responses 2.4, 3.5, 3.6). The implemented adjustments ensure that the current work is positioned as a detailed structural and interaction foundation, upon which the essential functional studies will build. We believe that all extensions and clarifications fully satisfy the reviewers’ collective recommendations.

      __Reviewer #3 –____ Significance: __

      Nature and Significance of the Advance:

      This study extends prior work (e.g., Swickley et al., BMC Mol Cell Biol 2020; Shroff et al., Biochem J 2020) on Codanin-1/CDIN1 interaction by applying high-resolution biophysical techniques to identify mutations that disrupt this complex. It provides a plausible cellular mechanism by which specific mutations may lead to CDA I through impaired histone trafficking.

      Nevertheless, key question remains: How do mutations outside the Codanin-1 C-terminus contribute to the pathology?

      3.8 Response – Significance:

      • We thank Reviewer #3 for this important point. Although our work specifically dissects the C-terminal CDIN1-binding domain of Codanin1, we fully acknowledge that CDA-I-associated mutations throughout Codanin1 may operate via additional mechanisms. To address the additional mechanisms, we have added a new paragraph describing other possible pathogenic models to the Discussion (please refer to Response 3.3).
      • We also fully acknowledged the need for systematic functional assays of non-C-terminal mutations and their impact on ASF1 localization (please refer to Response 3.6).
      • We revised the text to clarify how mutations beyond the C-terminus may contribute to CDA-I pathogenesis and present the significance of our current structural analyses, biophysical characterizations, and molecular insights as a foundation for future research (please refer to Response 3.6). __Audience: __

      • Molecular and cellular biologists investigating nuclear-cytoplasmic trafficking mechanisms

      • Hematologists and geneticists studying rare red cell disorders
      • Clinicians managing CDA I patients and researchers exploring targeted therapies __Reviewer Expertise: __

      Pediatric hematologist with over 20 years of research experience in CDA I, including the initial identification of CDAN1 and the elucidation of Codanin-1's role in embryonic erythropoiesis. Not a specialist in the biophysical techniques used in this study.

      References

      Ask, K., Z. Jasencakova, P. Menard, Y. Feng, G. Almouzni and A. Groth (2012). "Codanin-1, mutated in the anaemic disease CDAI, regulates Asf1 function in S-phase histone supply." The EMBO Journal 31(8): 2013–2023.

      Jeong, T.-K., R. C. M. Frater, J. Yoon, A. Groth and J.-J. Song (2025). "CODANIN-1 sequesters ASF1 by using a histone H3 mimic helix to regulate the histone supply." Nature Communications 16(1): 2181.

      Sedor, S. F. and S. Shao (2025). "Mechanism of ASF1 engagement by CDAN1." Nature Communications 16(1): 2599.

      Swickley, G., Y. Bloch, L. Malka, A. Meiri, S. Noy-Lotan, A. Yanai, H. Tamary and B. Motro (2020). "Characterization of the interactions between Codanin-1 and C15Orf41, two proteins implicated in congenital dyserythropoietic anemia type I disease." Molecular and Cell Biology 21(1).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      I would like to thank the reviewers for their comments and interest in the manuscript and the study.

      Reviewer #1

      1. I would assume that there are RNA-seq and/or ChIP-seq data out there produced after knockdown of one or more of these DBPs that show directional positioning.

      The directional positioning of CTCF-binding sites at chromatin interaction sites was analyzed by CRISPR experiment (Guo Y et al. Cell 2015). We found that the machine learning and statistical analysis showed the same directional bias of CTCF-binding motif sequence and RAD21-binding motif sequence at chromatin interaction sites as the experimental analysis of Guo Y et al. (lines 229-253, Figure 3b, c, d and Table 1). Since CTCF is involved in different biological functions (Braccioli L et al. Essays Biochem. 2019 ResearchGate webpage), the directional bias of binding sites may be reduced in all binding sites including those at chromatin interaction sites (lines 68-73). In our study, we investigated the DNA-binding sites of proteins using the ChIP-seq data of DNA-binding proteins and DNase-seq data. We also confirmed that the DNA-binding sites of SMC3 and RAD21, which tend to be found in chromatin loops with CTCF, also showed the same directional bias as CTCF by the computational analysis.

      __2. Figure 6 should be expanded to incorporate analysis of DBPs not overlapping CTCF/cohesin in chromatin interaction data that is important and potentially more interesting than the simple DBPs enrichment reported in the present form of the figure. __

      Following the reviewer's advice, I performed the same analysis with the DNA-binding sites that do no overlap with the DNA-binding sites of CTCF and cohesin (RAD21 and SMC3) (Fig. 6 and Supplementary Fig. 4). The result showed the same tendency in the distribution of DNA-binding sites. The height of a peak on the graph became lower for some DNA-binding proteins after removing the DNA-binding sites that overlapped with those of CTCF and cohesin. I have added the following sentence on lines 435 and 829: For the insulator-associated DBPs other than CTCF, RAD21, and SMC3, the DNA-binding sites that do not overlap with those of CTCF, RND21, and SMC3 were used to examine their distribution around interaction sites.

      3. Critically, I would like to see use of Micro-C/Hi-C data and ChIP-seq from these factors, where insulation scores around their directionally-bound sites show some sort of an effect like that presumed by the authors - and many such datasets are publicly-available and can be put to good use here.

      As suggested by the reviewer, I have added the insulator scores and boundary sites from the 4D nucleome data portal as tracks in the UCSC genome browser. The insulator scores seem to correspond to some extent to the H3K27me3 histone marks from ChIP-seq (Fig. 4a and Supplementary Fig. 3). We found that the DNA-binding sites of the insulator-associated DBPs were statistically overrepresented in the 5 kb boundary sites more than other DBPs (Fig. 4d). The direction of DNA-binding sites on the genome can be shown with different colors (e.g. red and green), but the directionality of insulator-associated DNA-binding sites is their overall tendency, and it may be difficult to notice the directionality from each binding site because the directionality may be weaker than that of CTCF, RAD21, and SMC3 as shown in Table 1 and Supplementary Table 2. We also observed the directional biases of CTCF, RAD21, and SMC3 by using Micro-C chromatin interaction data as we estimated, but the directionality was more apparent to distinguish the differences between the four directions of FR, RF, FF, and RR using CTCF-mediated ChIA-pet chromatin interaction data (lines 287 and 288).

       I found that the CTCF binding sites examined by a wet experiment in the previous study may not always overlap with the boundary sites of chromatin interactions from Micro-C assay (Guo Y et al. *Cell* 2015). The chromatin interaction data do not include all interactions due to the high sequencing cost of the assay, and include less long-range interactions due to distance bias. The number of the boundary sites may be smaller than that of CTCF binding sites acting as insulators and/or some of the CTCF binding sites may not be locate in the boundary sites. It may be difficult for the boundary location algorithm to identify a short boundary location. Due to the limitations of the chromatin interaction data, I planned to search for insulator-associated DNA-binding proteins without using chromatin interaction data in this study.
      
       I discussed other causes in lines 614-622: Another reason for the difference may be that boundary sites are more closely associated with topologically associated domains (TADs) of chromosome than are insulator sites. Boundary sites are regions identified based on the separation of numerous chromatin interactions. On the other hand, we found that the multiple DNA-binding sites of insulator-associated DNA-binding proteins were located close to each other at insulator sites and were associated with distinct nested and focal chromatin interactions, as reported by Micro-C assay. These interactions may be transient and relatively weak, such as tissue/cell type, conditional or lineage-specific interactions.
      
       Furthermore, I have added the statistical summary of the analysis in lines 372-395 as follows: Overall, among 20,837 DNA-binding sites of the 97 insulator-associated proteins found at insulator sites identified by H3K27me3 histone modification marks (type 1 insulator sites), 1,315 (6%) overlapped with 264 of 17,126 5kb long boundary sites, and 6,137 (29%) overlapped with 784 of 17,126 25kb long boundary sites in HFF cells. Among 5,205 DNA-binding sites of the 97 insulator-associated DNA-binding proteins found at insulator sites identified by H3K27me3 histone modification marks and transcribed regions (type 2 insulator sites), 383 (7%) overlapped with 74 of 17,126 5-kb long boundary sites, 1,901 (37%) overlapped with 306 of 17,126 25-kb long boundary sites. Although CTCF-binding sites separate active and repressive domains, the limited number of DNA-binding sites of insulator-associated proteins found at type 1 and 2 insulator sites overlapped boundary sites identified by chromatin interaction data. Furthermore, by analyzing the regulatory regions of genes, the DNA-binding sites of the 97 insulator-associated DNA-binding proteins were found (1) at the type 1 insulator sites (based on H3K27me3 marks) in the regulatory regions of 3,170 genes, (2) at the type 2 insulator sites (based on H3K27me3 marks and gene expression levels) in the regulatory regions of 1,044 genes, and (3) at insulator sites as boundary sites identified by chromatin interaction data in the regulatory regions of 6,275 genes. The boundary sites showed the highest number of overlaps with the DNA-binding sites. Comparing the insulator sites identified by (1) and (3), 1,212 (38%) genes have both types of insulator sites. Comparing the insulator sites between (2) and (3), 389 (37%) genes have both types of insulator sites. From the comparison of insulator and boundary sites, we found that (1) or (2) types of insulator sites overlapped or were close to boundary sites identified by chromatin interaction data.
      

      4. The suggested alternative transcripts function, also highlighted in the manuscripts abstract, is only supported by visual inspection of a few cases for several putative DBPs. I believe this is insufficient to support what looks like one of the major claims of the paper when reading the abstract, and a more quantitative and genome-wide analysis must be adopted, although the authors mention it as just an 'observation'.

      According to the reviewer's comment, I performed the genome-wide analysis of alternative transcripts where the DNA-binding sites of insulator-associated proteins are located near splicing sites. The DNA-binding sites of insulator-associated DNA-binding proteins were found within 200 bp centered on splice sites more significantly than the other DNA-binding proteins (Fig. 4e and Table 2). I have added the following sentences on lines 405 - 412: We performed the statistical test to estimate the enrichment of insulator-associated DNA-binding sites compared to the other DNA-binding proteins, and found that the insulator-associated DNA-binding sites were significantly more abundant at splice sites than the DNA-binding sites of the other proteins (Fig 4e and Table 2; Mann‒Whitney U test, p value 5. Figure 1 serves no purpose in my opinion and can be removed, while figures can generally be improved (e.g., the browser screenshots in Figs 4 and 5) for interpretability from readers outside the immediate research field.

      I believe that the Figure 1 would help researchers in other fields who are not familiar with biological phenomena and functions to understand the study. More explanation has been included in the Figures and legends of Figs. 4 and 5 to help readers outside the immediate research field understand the figures.

      6. Similarly, the text is rather convoluted at places and should be re-approached with more clarity for less specialized readers in mind.

      Reviewer #2's comments would be related to this comment. I have introduced a more detailed explanation of the method in the Results section, as shown in the responses to Reviewer #2's comments.

      Reviewer #2

      1. Introduction, line 95: CTCF appears two times, it seems redundant.

      On lines 91-93, I deleted the latter CTCF from the sentence "We examine the directional bias of DNA-binding sites of CTCF and insulator-associated DBPs, including those of known DBPs such as RAD21 and SMC3".

      2. Introduction, lines 99-103: Please stress better the novelty of the work. What is the main focus? The new identified DPBs or their binding sites? What are the "novel structural and functional roles of DBPs" mentioned?

      Although CTCF is known to be the main insulator protein in vertebrates, we found that 97 DNA-binding proteins including CTCF and cohesin are associated with insulator sites by modifying and developing a machine learning method to search for insulator-associated DNA-binding proteins. Most of the insulator-associated DNA-binding proteins showed the directional bias of DNA-binding motifs, suggesting that the directional bias is associated with the insulator.

       I have added the sentence in lines 96-99 as follows: Furthermore, statistical testing the contribution scores between the directional and non-directional DNA-binding sites of insulator-associated DBPs revealed that the directional sites contributed more significantly to the prediction of gene expression levels than the non-directional sites. I have revised the statement in lines 101-110 as follows: To validate these findings, we demonstrate that the DNA-binding sites of the identified insulator-associated DBPs are located within potential insulator sites, and some of the DNA-binding sites in the insulator site are found without the nearby DNA-binding sites of CTCF and cohesin. Homologous and heterologous insulator-insulator pairing interactions are orientation-dependent, as suggested by the insulator-pairing model based on experimental analysis in flies. Our method and analyses contribute to the identification of insulator- and chromatin-associated DNA-binding sites that influence EPIs and reveal novel functional roles and molecular mechanisms of DBPs associated with transcriptional condensation, phase separation and transcriptional regulation.
      

      3. Results, line 111: How do the SNPs come into the procedure? From the figures it seems the input is ChIP-seq peaks of DNBPs around the TSS.

      On lines 121-124, to explain the procedure for the SNP of an eQTL, I have added the sentence in the Methods: "If a DNA-binding site was located within a 100-bp region around a single-nucleotide polymorphism (SNP) of an eQTL, we assumed that the DNA-binding proteins regulated the expression of the transcript corresponding to the eQTL".

      4. Again, are those SNPs coming from the different cell lines? Or are they from individuals w.r.t some reference genome? I suggest a general restructuring of this part to let the reader understand more easily. One option could be simplifying the details here or alternatively including all the necessary details.

      On line 119, I have included the explanation of the eQTL dataset of GTEx v8 as follows: " The eQTL data were derived from the GTEx v8 dataset, after quality control, consisting of 838 donors and 17,382 samples from 52 tissues and two cell lines". On lines 681 and 865, I have added the filename of the eQTL data "(GTEx_Analysis_v8_eQTL.tar)".

      5. Figure 1: panel a and b are misleading. Is the matrix in panel a equivalent to the matrix in panel b? If not please clarify why. Maybe in b it is included the info about the SNPs? And if yes, again, what is then difference with a.

      The reviewer would mention Figure 2, not Figure 1. If so, the matrices in panels a and b in Figure 2 are equivalent. I have shown it in the figure: The same figure in panel a is rotated 90 degrees to the right. The green boxes in the matrix show the regions with the ChIP-seq peak of a DNA-binding protein overlapping with a SNP of an eQTL. I used eQTL data to associate a gene with a ChIP-seq peak that was more than 2 kb upstream and 1 kb downstream of a transcriptional start site of a gene. For each gene, the matrix was produced and the gene expression levels in cells were learned and predicted using the deep learning method. I have added the following sentences to explain the method in lines 133 - 139: Through the training, the tool learned to select the binding sites of DNA-binding proteins from ChIP-seq assays that were suitable for predicting gene expression levels in the cell types. The binding sites of a DNA-binding protein tend to be observed in common across multiple cell and tissue types. Therefore, ChIP-seq data and eQTL data in different cell and tissue types were used as input data for learning, and then the tool selected the data suitable for predicting gene expression levels in the cell types, even if the data were not obtained from the same cell types.

      6. Line 386-388: could the author investigate in more detail this observation? Does it mean that loops driven by other DBPs independent of the known CTCF/Cohesin? Could the author provide examples of chromatin structural data e.g. MicroC?

      As suggested by the reviewer, to help readers understand the observation, I have added Supplementary Fig. S4c to show the distribution of DNA-binding sites of "CTCF, RAD21, and SMC3" and "BACH2, FOS, ATF3, NFE2, and MAFK" around chromatin interaction sites. I have modified the following sentence to indicate the figure on line 501: Although a DNA-binding-site distribution pattern around chromatin interaction sites similar to those of CTCF, RAD21, and SMC3 was observed for DBPs such as BACH2, FOS, ATF3, NFE2, and MAFK, less than 1% of the DNA-binding sites of the latter set of DBPs colocalized with CTCF, RAD21, or SMC3 in a single bin (Fig. S4c).

       In Aljahani A et al. *Nature Communications* 2022, we find that depletion of cohesin causes a subtle reduction in longer-range enhancer-promoter interactions and that CTCF depletion can cause rewiring of regulatory contacts. Together, our data show that loop extrusion is not essential for enhancer-promoter interactions, but contributes to their robustness and specificity and to precise regulation of gene expression. Goel VY et al. *Nature Genetics* 2023 mentioned in the abstract: Microcompartments frequently connect enhancers and promoters and though loss of loop extrusion and inhibition of transcription disrupts some microcompartments, most are largely unaffected. These results suggested that chromatin loops can be driven by other DBPs independent of the known CTCF/Cohesin.
      
      I added the following sentence on lines 569-577: The depletion of cohesin causes a subtle reduction in longer-range enhancer-promoter interactions and that CTCF depletion can cause rewiring of regulatory contacts. Another group reported that enhancer-promoter interactions and transcription are largely maintained upon depletion of CTCF, cohesin, WAPL or YY1. Instead, cohesin depletion decreased transcription factor binding to chromatin. Thus, cohesin may allow transcription factors to find and bind their targets more efficiently. Furthermore, the loop extrusion is not essential for enhancer-promoter interactions, but contributes to their robustness and specificity and to precise regulation of gene expression.
      
       FOXA1 pioneer factor functions as an initial chromatin-binding and chromatin-remodeling factor and has been reported to form biomolecular condensates (Ji D et al. *Molecular Cell* 2024). CTCF have also found to form transcriptional condensate and phase separation (Lee R et al. *Nucleic acids research* 2022). FOS was found to be an insulator-associated DNA-binding protein in this study and is potentially involved in chromatin remodeling, transcription condensation, and phase separation with the other factors such as BACH2, ATF3, NFE2 and MAFK. I have added the following sentence on line 556: FOXA1 pioneer factor functions as an initial chromatin-binding and chromatin-remodeling factor and has been reported to form biomolecular condensates.
      

      7. In general, how the presented results are related to some models of chromatin architecture, e.g. loop extrusion, in which it is integrated convergent CTCF binding sites?

      Goel VY et al. Nature Genetics 2023 identified highly nested and focal interactions through region capture Micro-C, which resemble fine-scale compartmental interactions and are termed microcompartments. In the section titled "Most microcompartments are robust to loss of loop extrusion," the researchers noted that a small proportion of interactions between CTCF and cohesin-bound sites exhibited significant reductions in strength when cohesin was depleted. In contrast, the majority of microcompartmental interactions remained largely unchanged under cohesin depletion. Our findings indicate that most P-P and E-P interactions, aside from a few CTCF and cohesin-bound enhancers and promoters, are likely facilitated by a compartmentalization mechanism that differs from loop extrusion. We suggest that nested, multiway, and focal microcompartments correspond to small, discrete A-compartments that arise through a compartmentalization process, potentially influenced by factors upstream of RNA Pol II initiation, such as transcription factors, co-factors, or active chromatin states. It follows that if active chromatin regions at microcompartment anchors exhibit selective "stickiness" with one another, they will tend to co-segregate, leading to the development of nested, focal interactions. This microphase separation, driven by preferential interactions among active loci within a block copolymer, may account for the striking interaction patterns we observe.

       The authors of the paper proposed several mechanisms potentially involved in microcompartments. These mechanisms may be involved in looping with insulator function. Another group reported that enhancer-promoter interactions and transcription are largely maintained upon depletion of CTCF, cohesin, WAPL or YY1. Instead, cohesin depletion decreased transcription factor binding to chromatin. Thus, cohesin may allow transcription factors to find and bind their targets more efficiently (Hsieh TS et al. *Nature Genetics* 2022). Among the identified insulator-associated DNA-binding proteins, Maz and MyoD1 form loops without CTCF (Xiao T et al. *Proc Natl Acad Sci USA* 2021 ; Ortabozkoyun H et al. *Nature genetics* 2022 ; Wang R et al. *Nature communications* 2022). I have added the following sentences on lines 571-575: Another group reported that enhancer-promoter interactions and transcription are largely maintained upon depletion of CTCF, cohesin, WAPL or YY1. Instead, cohesin depletion decreased transcription factor binding to chromatin. Thus, cohesin may allow transcription factors to find and bind their targets more efficiently. I have included the following explanation on lines 582-584: Maz and MyoD1 among the identified insulator-associated DNA-binding proteins form loops without CTCF.
      
       As for the directionality of CTCF, if chromatin loop anchors have some structural conformation, as shown in the paper entitled "The structural basis for cohesin-CTCF-anchored loops" (Li Y et al. *Nature* 2020), directional DNA binding would occur similarly to CTCF binding sites. Moreover, cohesin complexes that interact with convergent CTCF sites, that is, the N-terminus of CTCF, might be protected from WAPL, but those that interact with divergent CTCF sites, that is, the C-terminus of CTCF, might not be protected from WAPL, which could release cohesin from chromatin and thus disrupt cohesin-mediated chromatin loops (Davidson IF et al. *Nature Reviews Molecular Cell Biology* 2021). Regarding loop extrusion, the 'loop extrusion' hypothesis is motivated by in vitro observations. The experiment in yeast, in which cohesin variants that are unable to extrude DNA loops but retain the ability to topologically entrap DNA, suggested that in vivo chromatin loops are formed independently of loop extrusion. Instead, transcription promotes loop formation and acts as an extrinsic motor that extends these loops and defines their final positions (Guerin TM et al. *EMBO Journal* 2024). I have added the following sentences on lines 543-547: Cohesin complexes that interact with convergent CTCF sites, that is, the N-terminus of CTCF, might be protected from WAPL, but those that interact with divergent CTCF sites, that is, the C-terminus of CTCF, might not be protected from WAPL, which could release cohesin from chromatin and thus disrupt cohesin-mediated chromatin loops. I have included the following sentences on lines 577-582: The 'loop extrusion' hypothesis is motivated by in vitro observations. The experiment in yeast, in which cohesin variants that are unable to extrude DNA loops but retain the ability to topologically entrap DNA, suggested that in vivo chromatin loops are formed independently of loop extrusion. Instead, transcription promotes loop formation and acts as an extrinsic motor that extends these loops and defines their final positions.
      
       Another model for the regulation of gene expression by insulators is the boundary-pairing (insulator-pairing) model (Bing X et al. *Elife* 2024) (Ke W et al. *Elife* 2024) (Fujioka M et al. *PLoS Genetics* 2016). Molecules bound to insulators physically pair with their partners, either head-to-head or head-to-tail, with different degrees of specificity at the termini of TADs in flies. Although the experiments do not reveal how partners find each other, the mechanism unlikely requires loop extrusion. Homologous and heterologous insulator-insulator pairing interactions are central to the architectural functions of insulators. The manner of insulator-insulator interactions is orientation-dependent. I have summarized the model on lines 559-567: Other types of chromatin regulation are also expected to be related to the structural interactions of molecules. As the boundary-pairing (insulator-pairing) model, molecules bound to insulators physically pair with their partners, either head-to-head or head-to-tail, with different degrees of specificity at the termini of TADs in flies (Fig. 7). Although the experiments do not reveal how partners find each other, the mechanism unlikely requires loop extrusion. Homologous and heterologous insulator-insulator pairing interactions are central to the architectural functions of insulators. The manner of insulator-insulator interactions is orientation-dependent.
      

      8. Do the authors think that the identified DBPs could work in that way as well?

      The boundary-pairing (insulator-pairing) model would be applied to the insulator-associated DNA-binding proteins other than CTCF and cohesin that are involved in the loop extrusion mechanism (Bing X et al. Elife 2024) (Ke W et al. Elife 2024) (Fujioka M et al. PLoS Genetics 2016).

       Liquid-liquid phase separation was shown to occur through CTCF-mediated chromatin loops and to act as an insulator (Lee, R et al. *Nucleic Acids Research* 2022). Among the identified insulator-associated DNA-binding proteins, CEBPA has been found to form hubs that colocalize with transcriptional co-activators in a native cell context, which is associated with transcriptional condensate and phase separation (Christou-Kent M et al. *Cell Reports* 2023). The proposed microcompartment mechanisms are also associated with phase separation. Thus, the same or similar mechanisms are potentially associated with the insulator function of the identified DNA-binding proteins. I have included the following information on line 554: CEBPA in the identified insulator-associated DNA-binding proteins was also reported to be involved in transcriptional condensates and phase separation.
      

      9. Also, can the authors comment about the mechanisms those newly identified DBPs mediate contacts by active processes or equilibrium processes?

      Snead WT et al. Molecular Cell 2019 mentioned that protein post-transcriptional modifications (PTMs) facilitate the control of molecular valency and strength of protein-protein interactions. O-GlcNAcylation as a PTM inhibits CTCF binding to chromatin (Tang X et al. Nature Communications 2024). I found that the identified insulator-associated DNA-binding proteins tend to form a cluster at potential insulator sites (Supplementary Fig. 2d). These proteins may interact and actively regulate chromatin interactions, transcriptional condensation, and phase separation by PTMs. I have added the following explanation on lines 584-590: Furthermore, protein post-transcriptional modifications (PTMs) facilitate control over the molecular valency and strength of protein-protein interactions. O-GlcNAcylation as a PTM inhibits CTCF binding to chromatin. We found that the identified insulator-associated DNA-binding proteins tend to form a cluster at potential insulator sites (Fig. 4f and Supplementary Fig. 3c). These proteins may interact and actively regulate chromatin interactions, transcriptional condensation, and phase separation through PTMs.

      10. Can the author provide some real examples along with published structural data (e.g. the mentioned micro-C data) to show the link between protein co-presence, directional bias and contact formation?

      Structural molecular model of cohesin-CTCF-anchored loops has been published by Li Y et al. Nature 2020. The structural conformation of CTCF and cohesin in the loops would be the cause of the directional bias of CTCF binding sites, which I mentioned in lines 539 - 543 as follows: These results suggest that the directional bias of DNA-binding sites of insulator-associated DBPs may be involved in insulator function and chromatin regulation through structural interactions among DBPs, other proteins, DNAs, and RNAs. For example, the N-terminal amino acids of CTCF have been shown to interact with RAD21 in chromatin loops.

       To investigate the principles underlying the architectural functions of insulator-insulator pairing interactions, two insulators, Homie and Nhomie, flanking the *Drosophila even skipped *locus were analyzed. Pairing interactions between the transgene Homie and the eve locus are directional. The head-to-head pairing between the transgene and endogenous Homie matches the pattern of activation (Fujioka M et al. *PLoS Genetics* 2016).
      

      Reviewer #3

      Major Comments:

      1. Some of these TFs do not have specific direct binding to DNA (P300, Cohesin). Since the authors are using binding motifs in their analysis workflow, I would remove those from the analysis.

      When a protein complex binds to DNA, one protein of the complex binds to the DNA directory, and the other proteins may not bind to DNA. However, the DNA motif sequence bound by the protein may be registered as the DNA-binding motif of all the proteins in the complex. The molecular structure of the complex of CTCF and Cohesin showed that both CTCF and Cohesin bind to DNA (Li Y et al. Nature 2020). I think there is a possibility that if the molecular structure of a protein complex becomes available, the previous recognition of the DNA-binding ability of a protein may be changed. Therefore, I searched the Pfam database for 99 insulator-associated DNA-binding proteins identified in this study. I found that 97 are registered as DNA-binding proteins and/or have a known DNA-binding domain, and EP300 and SIN3A do not directory bind to DNA, which was also checked by Google search. I have added the following explanation in line 257 to indicate direct and indirect DNA-binding proteins: Among 99 insulator-associated DBPs, EP300 and SIN3A do not directory interact with DNA, and thus 97 insulator-associated DBPs directory bind to DNA. I have updated the sentence in line 20 of the Abstract as follows: We discovered 97 directional and minor nondirectional motifs in human fibroblast cells that corresponded to 23 DBPs related to insulator function, CTCF, and/or other types of chromosomal transcriptional regulation reported in previous studies.

      2. I am not sure if I understood correctly, by why do the authors consider enhancers spanning 2Mb (200 bins of 10Kb around eSNPs)? This seems wrong. Enhancers are relatively small regions (100bp to 1Kb) and only a very small subset form super enhancers.

      As the reviewer mentioned, I recognize enhancers are relatively small regions. In the paper, I intended to examine further upstream and downstream of promoter regions where enhancers are found. Therefore, I have modified the sentence in lines 929 - 931 of the Fig. 2 legend as follows: Enhancer-gene regulatory interaction regions consist of 200 bins of 10 kbp between -1 Mbp and 1 Mbp region from TSS, not including promoter.

      3. I think the H3K27me3 analysis was very good, but I would have liked to see also constitutive heterochromatin as well, so maybe repeat the analysis for H3K9me3.

      Following the reviewer's advice, I have added the ChIP-seq data of H3K9me3 as a truck of the UCSC Genome Browser. The distribution of H3K9me3 signal was different from that of H3K27me3 in some regions. I also found the insulator-associated DNA-binding sites close to the edges of H3K9me3 regions and took some screenshots of the UCSC Genome Browser of the regions around the sites in Supplementary Fig. 3b. I have modified the following sentence on lines 974 - 976 in the legend of Fig. 4: a Distribution of histone modification marks H3K27me3 (green color) and H3K9me3 (turquoise color) and transcript levels (pink color) in upstream and downstream regions of a potential insulator site (light orange color). I have also added the following result on lines 356 - 360: The same analysis was performed using H3K9me3 marks, instead of H3K27me3 (Fig. S3b). We found that the distribution of H3K9me3 signal was different from that of H3K27me3 in some regions, and discovered the insulator-associated DNA-binding sites close to the edges of H3K9me3 regions (Fig. S3b).

      4. I was not sure I understood the analysis in Figure 6. The binding site is with 500bp of the interaction site, but micro-C interactions are at best at 1Kb resolution. They say they chose the centre of the interaction site, but we don't know exactly where there is the actual interaction. Also, it is not clear what they measure. Is it the number of binding sites of a specific or multiple DBP insulator proteins at a specific distance from this midpoint that they recover in all chromatin loops? Maybe I am missing something. This analysis was not very clear.

      The resolution of the Micro-C assay is considered to be 100 bp and above, as the human nucleome core particle contains 145 bp (and 193 bp with linker) of DNA. However, internucleosomal DNA is cleaved by endonuclease into fragments of multiples of 10 nucleotides (Pospelov VA et al. Nucleic Acids Research 1979). Highly nested focal interactions were observed (Goel VY et al. Nature Genetics 2023). Base pair resolution was reported using Micro Capture-C (Hua P et al. Nature 2021). Sub-kilobase (20 bp resolution) chromatin topology was reported using an MNase-based chromosome conformation capture (3C) approach (Aljahani A et al. Nature Communications 2022). On the other hand, Hi-C data was analyzed at 1 kb resolution. (Gu H et al. bioRxiv 2021). If the resolution of Micro-C interactions is at best at 1 kb, the binding sites of a DNA-binding protein will not show a peak around the center of the genomic locations of interaction edges. Each panel shows the number of binding sites of a specific DNA-binding protein at a specific distance from the midpoint of all chromatin interaction edges. I have modified and added the following sentences in lines 593-597: High-resolution chromatin interaction data from a Micro-C assay indicated that most of the predicted insulator-associated DBPs showed DNA-binding-site distribution peaks around chromatin interaction sites, suggesting that these DBPs are involved in chromatin interactions and that the chromatin interaction data has a high degree of resolution. Base pair resolution was reported using Micro Capture-C.

      Minor Comments:

      1. PIQ does not consider TF concentration. Other methods do that and show that TF concentration improves predictions (e.g., ____https://www.biorxiv.org/content/10.1101/2023.07.15.549134v2____or ____https://pubmed.ncbi.nlm.nih.gov/37486787____/). The authors should discuss how that would impact their results.

      The directional bias of CTCF binding sites was identified by ChIA-pet interactions of CTCF binding sites. The analysis of the contribution scores of DNA-binding sites of proteins considering the binding sites of CTCF as an insulator showed the same tendency of directional bias of CTCF binding sites. In the analysis, to remove the false-positive prediction of DNA-binding sites, I used the binding sites that overlapped with a ChIP-seq peak of the DNA-binding protein. This result suggests that the DNA-binding sites of CTCF obtained by the current analysis have sufficient quality. Therefore, if the accuracy of prediction of DNA-binding sites is improved, although the number of DNA-binding sites may be different, the overall tendency of the directionality of DNA-binding sites will not change and the results of this study will not change significantly.

       As for the first reference in the reviewer's comment, chromatin interaction data from Micro-C assay does not include all chromatin interactions in a cell or tissue, because it is expensive to cover all interactions. Therefore, it would be difficult to predict all chromatin interactions based on machine learning. As for the second reference in the reviewer's comment, pioneer factors such as FOXA are known to bind to closed chromatin regions, but transcription factors and DNA-binding proteins involved in chromatin interactions and insulators generally bind to open chromatin regions. The search for the DNA-binding motifs is not required in closed chromatin regions.
      

      2. DeepLIFT is a good approach to interpret complex structures of CNN, but is not truly explainable AI. I think the authors should acknowledge this.

      In the DeepLIFT paper, the authors explain that DeepLIFT is a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input (Shrikumar A et al. ICML 2017). DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. DeepLIFT calculates a metric to measure the difference between an input and the reference of the input.

       Truly explainable AI would be able to find cause and reason, and to make choices and decisions like humans. DeepLIFT does not perform causal inferences. I did not use the term "Explainable AI" in our manuscript, but I briefly explained it in Discussion. I have added the following explanation in lines 623-628: AI (Artificial Intelligence) is considered as a black box, since the reason and cause of prediction are difficult to know. To solve this issue, tools and methods have been developed to know the reason and cause. These technologies are called Explainable AI. DeepLIFT is considered to be a tool for Explainable AI. However, DeepLIFT does not answer the reason and cause for a prediction. It calculates scores representing the contribution of the input data to the prediction.
      
       Furthermore, to improve the readability of the manuscript, I have included the following explanation in lines 159-165: we computed DeepLIFT scores of the input data (i.e., each binding site of the ChIP-seq data of DNA-binding proteins) in the deep leaning analysis on gene expression levels. DeepLIFT compares the importance of each input for predicting gene expression levels to its 'reference or background level' and assigns contribution scores according to the difference. DeepLIFT calculates a metric to measure the difference between an input and the reference of the input.
      
    1. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      (1) I think the article is a little too immature in its current form. I'd recommend that the authors work on their writing. For example, the objectives of the article are not completely clear to me after reading the manuscript, composed of parts where the authors seem to focus on SGCs, and others where they study "engram" neurons without differentiating the neuronal type (Figure 5). The next version of the manuscript should clearly establish the objectives and sub-aims.

      We now provide clarification for focusing on the labeling status versus the cell types in figure 5. Since figure 5 focuses on inputs to labeled pairs versus Labeledunlabeled pairs the pairs include mixed groups with GCs and SGCs. Since the question pertains to inputs rather than cell types, we did not specifically distinguish the cell types. This is now explained in the text on page 15:  “Note that since the intent was to determine the input correlation depending on labeling status of the cell pairs rather than based on cell type, we do not explicitly consider whether analyzed cell pairs included GCs or SGCs.”

      (2) In addition, some results are not entirely novel (e.g., the disproportionate recruitment as well as the distinctive physiological properties of SGCs), and/or based on correlations that do not fully support the conclusions of the article. In addition to re-writing, I believe that the article would benefit from being enriched with further analyses or even additional experiments before being resubmitted in a more definitive form.

      We now indicate the data comparing labeled versus unlabeled SGCs is novel. Moreover, we also highlight that (1) recruitment of SGCs has not been previously examined in Barnes Maze or Enriched Environment, (2) that our unbiased morphological analysis of SGC recruitment is more robust than subsampling of recorded neurons in prior studies and (3) that our data show that prior may have overestimated SGC recruitment to engrams. Thus, the data characterized as “not novel” are essential for appropriate analysis of behaviorally tagged neurons which is the thrust of our study.  

      Reviewer #2 (Public Review):

      (1) The authors conclude that SGCs are disproportionately recruited into cfos assemblies during the enriched environment and Barnes maze task given that their classifier identifies about 30% of labelled cells as SGCs in both cases and that another study using a different method (Save et al., 2019) identified less than 5% of an unbiased sample of granule cells as SGCs. To make matters worse, the classifier deployed here was itself established on a biased sample of GCs patched in the molecular layer and granule cell layer, respectively, at even numbers (Gupta et al., 2020). The first thing the authors would need to show to make the claim that SGCs are disproportionately recruited into memory ensembles is that the fraction of GCs identified as SGCs with their own classifier is significantly lower than 30% using their own method on a random sample of GCs (e.g. through sparse viral labelling). As the authors correctly state in their discussion, morphological samples from patch-clamp studies are problematic for this purpose because of inherent technical issues (i.e. easier access to scattered GCs in the molecular layer).

      We now clarify, on page 9, that a trained investigator classified cell types based on predefined morphological criteria.  No automated classifiers were used to assign cell types in the current study.

      (2) The authors claim that recurrent excitation from SGCs onto GCs or other SGCs is irrelevant because they did not find any connections in 32 simultaneous recordings (plus 63 in the next experiment). Without a demonstration that other connections from SGCs (e.g. onto mossy cells or interneurons) are preserved in their preparation and if so at what rates, it is unclear whether this experiment is indicative of the underlying biology or the quality of the preparation. The argument that spontaneous EPSCs are observed is not very convincing as these could equally well arise from severed axons (in fact we would expect that the vast majority of inputs are not from local excitatory cells). The argument on line 418 that SGCs have compact axons isn't particularly convincing either given that the morphologies from which they were derived were also obtained in slice preparations and would be subject to the same likelihood of severing the axon. Finally, even in paired slice recordings from CA3 pyramidal cells the experimentally detected connectivity rates are only around 1% (Guzman et al., 2016). The authors would need to record from a lot more than 32 pairs (and show convincing positive controls regarding other connections) to make the claim that connectivity is too low to be relevant.

      We have conducted additional control experiments (detailed in response to Editorial comment #3), in which we replicated the results of Stefanelli et al (2016) identifying that optogenetic activation of a focal cohort of ChR2 expressing granule cells leads to robust feedback inhibition of adjacent granule cells. These control experiments demonstrate that the slice system supports the feedback inhibitory circuit which requires GC/SGC to hilar neuron synapses.

      (3) Another troubling sign is the fact that optogenetic GC stimulation rarely ever evokes feedback inhibition onto other cells which contrasts with both other in vitro (e.g. Braganza et al., 2020) and in vivo studies (Stefanelli et al., 2016) studies. Without a convincing demonstration that monosynaptic connections between SGCs/GCs and interneurons in both directions is preserved at least at the rates previously described in other slice studies (e.g. Geiger et al., 1997, Neuron, Hainmueller et al., 2014, PNAS, Savanthrapadian et al., 2014, J. Neurosci), the notion that this setting could be closer to naturalistic memory processing than the in vivo experiments in Stefanelli et al. (e.g. lines 443-444) strikes me as odd. In any case, the discussion should clearly state that compromised connectivity in the slice preparation is likely a significant confound when comparing these results.

      We have conducted additional control experiments (detailed in response to Editorial comment #3), in which we replicated the results of Stefanelli et al identifying that optogenetic activation of a focal cohort of ChR2 expressing granule cells leads to robust feedback inhibition of adjacent granule cells. These control experiments demonstrate that the slice system in our studies support the feedback inhibitory circuit detailed in prior studies. We also clarify that Stefanelli study labeled random neurons and did not examine natural behavioral engrams and  discuss (on page 20) the correspondence/consistency of our results with that of Braganza et al 2020.

      (4) Probably the most convincing finding in this study is the higher zero-time lag correlation of spontaneous EPSCs in labelled vs. unlabeled pairs. Unfortunately, the fact that the authors use spontaneous EPSCs to begin with, which likely represent a mixture of spontaneous release from severed axons, minis, and coordinated discharge from intact axon segments or entire neurons, makes it very hard to determine the meaning and relevance of this finding. At the bare minimum, the authors need to show if and how strongly differences in baseline spontaneous EPSC rates between different cells and slices are contributing to this phenomenon. I would encourage the authors to use low-intensity extracellular stimulation at multiple foci to determine whether labelled pairs really share higher numbers of input from common presynaptic axons or cells compared to unlabeled pairs as they claim. I would also suggest the authors use conventional Cross correlograms (CCG; see e.g. English et al., 2017, Neuron; Senzai and Buzsaki, 2017, Neuron) instead of their somewhat convoluted interval-selective correlation analysis to illustrate codependencies between the event time series. The references above also illustrate a more robust approach to determining whether peaks in the CCGs exceed chance levels.

      We have included data on sEPSC frequency in the recorded cell pairs (Supplemental Fig 4) and have also conducted additional experiments and present data demonstrating that labeled cell show higher sEPSC frequency and amplitude than corresponding unlabeled cells in both cell types (new Fig 5).  We also include data from new  experiments to show that over 50% of the sEPSCs represent action potential driven events (Supplemental fig 3). 

      We thank the reviewer for the suggestion to explore alternative methods of analyses including CCGs to further strengthen our findings. We have now conducted CCGs on the same data set and report that “The dynamics of the cross-correlograms generated from our data sets using previously established methods to evaluate monosynaptic connectivity (Bartho et al., 2004; Senzai and Buzsaki, 2017) parallelled that of the CCP plots (Supplemental Fig. 6) illustrating that the methods similarly capture co-dependencies between event time series. We note, here, that while the CCG and CCP are qualitatively similar, the magnitude of the peaks were different, due to the sparseness of synaptic events. 

      (5) Finally, one of the biggest caveats of the study is that the ensemble is labelled a full week before the slice experiment and thereby represents a latent state of a memory rather than encoding consolidation, or recall processes. The authors acknowledge that in the discussion but they should also be mindful of this when discussing other (especially in vivo) studies and comparing their results to these. For instance, Pignatelli et al 2018 show drastic changes in GC engram activity and features driven by behavioral memory recall, so the results of the current study may be very different if slices were cut immediately after memory acquisition (if that was possible with a different labelling strategy), or if animals were re-exposed to the enriched environment right before sacrificing the animal.

      As noted by the reviewer, we fully acknowledge and are cognizant of the concern that slices prepared a week after labeling may not reflect ongoing encoding. Although our data show that labeled cells are reactivated in higher proportion during recall, we have discussed this caveat and will include alternative experimental strategies in the discussion.

      Reviewer #3 (Public Review):

      (1) Engram cells are (i) activated by a learning experience, (ii) physically or chemically modified by the learning experience, and (iii) reactivated by subsequent presentation of the stimuli present at the learning experience (or some portion thereof), resulting in memory retrieval. The authors show that exposure to Barnes Maze and the enriched environment-activated semilunar granule cells and granule cells preferentially in the superior blade of the dentate gyrus, and a significant fraction were reactivated on re-exposure. However, physical or chemical modification by experience was not tested. Experience modifies engram cells, and a common modification is the Hebbian, i.e., potentiation of excitatory synapses. The authors recorded EPSCs from labeled and unlabeled GCs and SGCs. Was there a difference in the amplitude or frequency of EPSCs recorded from labeled and unlabeled cells?

      We have included data on sEPSC frequency in the recorded cell pairs (Supplemental Fig 4) and have also conducted additional experiments and report and present data demonstrating that labeled cell show higher sEPSC frequency and amplitude than corresponding unlabeled cells in both cell types (new Fig 5).  We also include data from new  experiments to show that over 50% of the sEPSCs represent action potential driven events (Supplemental fig 3).

      (2) The authors studied five sequential sections, each 250 μm apart across the septotemporal axis, which were immunostained for c-Fos and analyzed for quantification. Is this an adequate sample? Also, it would help to report the dorso-ventral gradient since more engram cells are in the dorsal hippocampus. Slices shown in the figures appear to be from the dorsal hippocampus. 

      We thank the reviewer for the comment. We analyzed sections along the dorsoventral gradient. As explained in the methods, there is considerable animal to animal variability in the number of labeled cells which was why we had to use matched littermate pairs in our experiments This variability could render it difficult to tease apart dorsoventral differences. 

      (3) The authors investigated the role of surround inhibition in establishing memory engram SGCs and GCs. Surprisingly, they found no evidence of lateral inhibition in the slice preparation. Interneurons, e.g., PV interneurons, have large axonal arbors that may be cut during slicing.

      Similarly, the authors point out that some excitatory connections may be lost in slices. This is a limitation of slice electrophysiology.

      We have conducted additional control experiments (detailed in response to Editorial comment #3), in which we replicated the results of Stefanelli et al identifying that optogenetic activation of a focal cohort of ChR2 expressing granule cells leads to robust feedback inhibition of adjacent granule cells. These control experiments demonstrate that the slice system supports the feedback inhibitory circuit detailed in prior studies. 

      We now discuss (page 21) that “the possibility that slice recordings lead to underestimation of feedback dendritic inhibition cannot be ruled out.”

      Reviewer #1 (Recommendations for the authors):

      (1) I struggle to understand the added value of the Barnes Maze data (Figures 1 and S1), since the authors then focus on the EE for practical reasons. In particular, the analysis of mouse performance (presented in supplemental Figure 1) does not seem traditional to me. For example, instead of the 3 classical exploration strategies (i.e., random, serial, direct), the authors describe 6, and assign each of these strategies a score based on vague criteria (why are "long corrected" and "focused research" both assigned a score of 0.5?). Unless I'm mistaken, no other classic parameters are described (e.g., success rate, latency, number of errors). If the authors decide to keep the BM results, I recommend better justifying its existence and adding more details, including in the method section. Otherwise, perhaps they should consider withdrawing it. Even if we had to use two different behavioral contexts, wouldn't it have made sense to use, in addition to the EE, the fear conditioning test, which is widely used in the study of engrams? Under these conditions (Stefanelli et al., 2016), the number of cells recruited after fear conditioning seems sufficient to reproduce the analyses presented in Figures 2-5 and determine whether or not lateral inhibition is dependent on the type of context (Stefanelli and colleagues suggest significant strong lateral inhibition during fear conditioning, whereas the data from Dovek and colleagues suggest quite the opposite after exposure to EE).

      The Barnes Maze data was included to evaluate the DG ensemble activation during a dentate dependent non-fear based behavioral task. This is now introduced and explained in the results. We have now included plots of the primary latency and number of errors in finding the escape hole to confirm the improvement over time (Supplemental Fig. 1). We specifically used the BUNS analysis to evaluate the use of spatial strategy and show that by day 6, day of tamoxifen induction, the mice are using a spatial strategy for navigation. Our approach to evaluate exploration strategy is based on criteria published in Illouz et al 2016. This is now detailed in the methods on page 25. We hope that  the inclusion of the supplemental data and revisions to methods and results address the concerns regarding Barnes Maze experiments. 

      Regarding Stefanelli et al., 2016, please note that the study adopted random labeling of neurons using a CaMKII promotor driven reporter expression which they activated during spatial exploration of fear conditioning behaviors. As such labeled neurons in the Stefanelli study were NOT behaviorally driven, rather they were optically activated. This is now clarified in the text. The main drive for our study was to evaluate behaviorally tagged neurons which is novel, distinct from the Stefanelli study, and, we would argue, more behaviorally realistic and relevant.

      Additionally, the lateral inhibition observed in Stafanelli et al was in response to activation of GCs labeled by virally mediate CAMKII-driven ChR2 expression. Using a similar labeling approach, new control data presented in Supplemental fig. 3 show that we are fully able to replicate the lateral inhibition observed by Stefanalli et al. These control experiments further suggest that the sparse and distributed GC/SGC ensembles activated during non-aversive behavioral tasks may not be sufficient to elicit robust lateral inhibition as has been observed when a random population of adjacent neurons are activated. Our findings are also consistent with observations by Barganza et al., 2020. This is now Discussed on page 21.

      (2) The authors recorded sEPSCs received by recruited and non-recruited GCs and SGCs after EE exposure. However, it appears that they studied them very little, apart (from a temporal correlation analysis (Figure 5). Yet it would be interesting to determine whether or not the four neuronal populations possess different synaptic properties. 

      What is the frequency and amplitude of sEPSCs in GCs and SGCs recruited or not after EE exposure? Similarly, can the author record the sIPSCs received by dentate gyrus engram and non-engram GCs and SGCs? If so, what is their frequency and amplitude?

      As suggested by the editorial comment #2, we how include data on the frequency and amplitude of the sEPSCs in GCs and SGCs used in our analysis of figure 5. Given the low numbers of unlabeled SGCs and labeled GCs in our paired recordings (Supplemental Fig. 5), we choose not to use this data set for analysis of cell-type and labeling based differences in EPSC parameters. However, we have previously reported that sIPSC frequency is higher in SGCs than in GCs. Additionally, we have identified that sEPSC frequency in SGCs is higher than in GC (Dovek et al, in preprint, DOI: 10.1101/2025.03.14.643192).  

      To specifically address reviewer concerns, we have conducted new recorded EPSCs in a cohort of labeled and unlabeled GCs and SGCs and present data demonstrating that labeled cell show higher sEPSC frequency and amplitude than corresponding unlabeled cells in both cell types (new Fig 5). These experiments were conducted in TRAP2-tdT labeled cells which were not stable in cesium based recordings. As such we, we deferred the IPSC analysis for later and restricted analysis to sEPSCs for this study. 

      (3) Previous data showed that dentate gyrus neurons that are recruited or not in a given context could exhibit distinct morphological characteristics (Pléau et al. 2021) and biochemical content (Penk expression, Erwin et al., 2020). In order to enrich the electrophysiological data presented in Figure 2, could the authors take advantage of the biocytin filling to perform a morphological and biochemical comparison of the different neuronal types (i.e., GCs and SGCs recruited or not after EE)?

      Thank you for this suggestion. Unfortunately, detailed morphometry and biochemical analysis on labeled and unlabeled neurons was not conducted as part of this study as our focus was on circuit differences. In our experience, unless the sections are imaged soon after staining, the sections are suboptimal for detailed morphological reconstruction and analysis. Our ongoing studies suggest that PENK is an activity marker and not a selective marker for SGCs and we are undertaking transcriptomic analysis to identify molecular differences between GCs and SGCs. We respectfully submit that these experiments are outside the scope of this study.

      (4) Figures 3 and 4 show only schematic diagrams and representative data. No quantification is shown. Instead of pie charts showing the identity of each pair (which I find unnecessary), I'll use pie charts representing the % of each pair in which an excitatory or inhibitory drive was recorded (with the corresponding n).

      Please note that we did not observe evoked synaptic potentials in any except one pair precluding the possibility of quantification. However, we submit that it is important for the readers to have information on the number of pairs and the types of pre-post synaptic pairs in which the connections were tested.

      (5) Figure 3: Given that GCs form very few recurrences in non-pathological conditions, it hardly surprises me that they form few or no local glutamatergic connections. In contrast, this result surprises me more for SGCs, whose axons form collaterals in the dentate gyrus granular and molecular layers (Williams et al., 2007; Save et al., 2019). To control the reliability of their conditions, could the authors check whether SGCs do indeed form connections with hilar mossy cells, as has been reported in the past? To test whether this lack of interconnectivity is specific to neurons belonging to the same engram (or not), could the authors test whether or not the stimulation of labeled GCs/SGCs (via membrane depolarization or even optogenetics) generates EPSCs in unlabeled GCs?

      As suggested by the reviewer, we have examined whether widefield optical activation of all labeled neurons including GCs and SGCs lead to EPSCs in unlabeled GCs (63 cells tested). However, we did not observe eEPSCs. This data is presented on page 13, (Fig 4F) in the results and discussed on page 20. Since the wide field stimulation should activate terminals and lead to release even if the axon is severed, our data suggest the glutamatergic drive from SGC to GC may be limited.

      As noted above, we have demonstrated the presence of lateral inhibition consistent with data in Stefanelli et al in our new supplementary figure 3. We have also shown that sustained SGC firing upon perforant path stimulations is associated with sustained firing in hilar interneurons (Afrasiabi et al., 2022) indicating presence of the SGC to hilar connectivity in our slice preparation. Therefore, we choose not to undertake challenging 2P guided paired recording of SGCs and mossy cells adjacent to SGC axon terminals reported in Williams et al 2007 to replicate the 9%  SGC to MC synaptic connections. These 2P guided slice physiology studies are outside the technical scope of our study.

      (6) Figure 4: The results are relatively in contradiction with the strong lateral inhibition reported in the past (Stefanelli et al., 2016), but the experimental conditions are different in the two studies. Stimulation of a single labeled GC or SGC may not be sufficient to activate an inhibitory neuron, and for the latter to inhibit an unlabeled GC or SGC. Is it possible to measure the sIPSCs received by unlabelled neurons during optogenetic stimulation of all labelled neurons? Could the authors verify whether under their experimental conditions GCs and SGCs do indeed form connections with interneurons, as reported before? Finally, Stefanelli and colleagues (2016) suggest that lateral inhibition is provided by dendrites- targeting somatostatin interneurons. If the authors are recording in the soma, could they underestimate more distal inhibitory inputs? If so, could they record the dendrites of unlabeled neurons?

      Our new control data (Supplementary Fig. 3) using an AAV mediated CAMKII promotor driven random expression of ChR2 on GCs, similar to Stefanelli et al (2016) demonstrates our ability replicate the lateral inhibition observed by Stefanalli et al. (2016). Thus, our findings more accurately represent lateral inhibition supported by a sparse behaviorally labeled cohort than findings of Stefanelli et al based on randomly labeled neurons. This is now discussed on page 22-23. We respectfully submit that dendritic recordings are outside the scope of the current study.

      We also discuss the possibility that somatic recordings may under sample dendritic inhibitory inputs on page 23 “the possibility that slice recordings lead to underestimation of feedback dendritic inhibition cannot be ruled out.”

      (7) Figure 5: For ease of reading, I would substantially simplify the Results section related to Figure 5, keeping only the main general points of the analysis and the results themselves. The details of the analysis strategy, and the justification for the choices made, are better placed in the Method section (I advise against "data not shown").

      We thank the reviewer for the suggestion to improve accessibility of the results and have moved text related to justification of strategy and controls to the methods. We have also removed references to data not shown.

      (8) Figure 5: why do the authors no longer discriminate between GCs and SGCs?

      Since figure 5 focuses on inputs to labeled pairs versus labeled-unlabeled pairs the pairs include mixed groups with GCs and SGCs. Since the question pertains to inputs rather than cell types, we did not specifically distinguish the cell types. This is now explained in the text on page 15.

      (9) Figure 5: I would like to know more about the temporally connected inputs and their implication in context-dependent recruitment of dentate gyrus neurons. What could be the origin of the shared input received by the neurons recruited after EE exposure? For example, do labeled neurons receive more (temporally correlated or not) inputs from the entorhinal cortex (or any other upstream brain region) than unlabeled neurons? Is there any way (e.g., PP stimulation or any kind of manipulation) to test the causal relationship between temporally correlated input and the context-dependent recruitment of a given neuron?

      We appreciate the reviewer’s comments on the need to examine the source and nature of the correlated inputs to behaviorally labeled neurons. However, the suggested experiments are nontrivial as artificial stimulation of afferent fibers is unlikely to be selective for labeled and unlabeled cells. Given the complexities in design, implementation and interpretation of these experiments we respectfully submit that these are outside the scope of the current study.

      Reviewer #2 (Recommendations for the authors):

      There are a few minor issues limiting the extent of interpretations of the data:

      (1) Only about 7% of the 'engram' cells are re-activated one week after exposure (line 147), it is unclear how meaningful this assembly is given the high number of cells that may either be labelled unrelated to the EE or no longer be part of the memory-related ensemble.

      We now discuss (page 22-23) that the % labeling is consistent with what has been observed in the DG 1 week after fear conditioning (DeNardo et al., 2019) and discuss the caveat that all labeled cells may not represent an engram.  

      (2) Line 215: The wording '32 pairwise connections examined' suggests that there actually were synaptic connections, would recommend altering the wording to 'simultaneously recorded cells examined' to avoid confusion.

      Revised as suggested

    1. Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As it had been nearly two years since I last saw the manuscript, I reread the full text to familiarise myself again with the findings presented. While I appreciate the changes made and think they have strengthened the manuscript, I still find parts of it a bit too speculative or hyperbolic. In particular, I think claims of evolutionary acceleration and adaptation require more careful integration with existing human/chimpanzee genetics and functional genomics literature. For example:

      Line 155: "About 5% of genes have significant sequence differences in humans and chimpanzees," This statement needs a citation, and a definition of what is meant by 'significant', especially as multiple lines below instead mention how it's not clear how many differences matter, or which of them, etc.

      line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      line 210: "Based on a recent study that identified 5,984 genes differentially expressed between human-only and chimpanzee-only iPSC lines (Song et al., 2021), we estimated that the top 20% (4248) genes in chimpanzees may well characterize the human-chimpanzee differences" I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below. I also find that my previous concerns with the very disparate numbers of results across the three archaics have not been suitably addressed.

      I also think that there is still too much of a tendency to assume that adaptive evolutionary change is the only driving force behind the observed results in the results. As I've stated before, I do not doubt that lncRNAs contribute in some way to evolutionary divergence between these species, as do other gene regulatory mechanisms; the manuscript leans down on it being the sole, or primary force, however, and that requires much stronger supporting evidence. Examples include, but are not limited to:

      line 230: "These results reveal when and how HS lncRNA-mediated epigenetic regulation influences human evolution." This statement is too speculative.

      Line 268: "yet the overall results agree well with features of human evolution." What does this mean? This section is too short and unclear.

      Line 325: "and form 198876 HS lncRNA-DBS pairs with target transcripts in all tissues." This has not been shown in this paper - sequence based analyses simply identify the *potential* to form pairs.

      Line 423: "Our analyses of these lncRNAs, DBSs, and target genes, including their evolution and interaction, indicate that HS lncRNAs have greatly promoted human evolution by distinctly rewiring gene expression." I do not agree that this conclusion is supported by the findings presented - this would require significant additional evidence in the form of orthogonal datasets.

      I also return briefly to some of my comments before, in particular on the confounding effects of gene length and transcript/isoform number. In their rebuttal the authors argued that there was no need to control for this, but this does in fact matter. A gene with 10 transcripts that differ in the 5' end has 10 times as many chances of having a DBS than a gene with only 1 transcript, or a gene with 10 transcripts but a single annotated TSS. When the analyses are then performed at the gene level, without taking into account the number of transcripts, this could introduce a bias towards genes with more annotated isoforms. Similarly, line 246 focuses on genes with "SNP numbers in CEU, CHB, YRI are 5 times larger than the average." Is this controlled for length of the DBS? All else being equal a longer DBS will have more SNPs than a shorter one. It is therefore not surprising that the same genes that were highlighted above as having 'strong' DBS, where strength is impacted by length, show up here too.

    2. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary

      While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      I no longer have any concerns about the manuscript as the authors have addressed my comments in the first round of review.

      We thank the reviewer for the valuable comments, which have helped us improve the manuscript.

      Reviewer #2 (Public Review):

      Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lncRNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

      I thank the authors for their revisions to the manuscript; however, I find that the bulk of my comments have not been addressed to my satisfaction. As such, I am afraid I cannot say much more than what I said last time, emphasising some of my concerns with regards to the robustness of some of the analyses presented. I appreciate the new data generated to address some questions, but think it could be better incorporated into the text - not in the discussion, but in the results.

      We thank the reviewer for the careful reading and valuable comments. In this round of revision, we address the two main concerns: (1) there is a lack of suitable methods and/or relevant controls at many points, and (2) the interpretation is too quick to infer selection. Based on these comments, we have carefully revised all sections of the manuscript, including the Introduction, Results, Discussion, and Materials and Methods.

      In addition, we have performed two new analyses. Based on the two analyses, we have added one figure and two sections to Results, two sections to Materials and Methods, one figure to Supplementary Notes, and two tables to Supplementary Tables. These results were obtained using new methods and provided more support to the main conclusion.

      To be more responsible, we re-look into the comments made in the first round and respond to them further. The following are point-to-point responses to comments.

      Since many of the details in the Responses-To-Comments are available in published papers and eLife publishes Responses-To-Comments, we do not greatly revise supplementary notes to avoid ostensibly repeating published materials.

      “lack of suitable methods and/or relevant controls”.

      We carefully chose the methods, thresholds, and controls in the study; now, we provide clearer descriptions and explanations.

      (1) We have expanded the last paragraph in Introduction to briefly introduce the methods, thresholds, and controls.

      (2) In many places in Results and Materials and Methods, revisions are made to describe and justify methods, thresholds, and controls.

      (3) Some methods, thresholds, and controls have good consensus, such as FDR and genome-wide background, but others may not, such as the number of genes that greatly differ between humans and chimpanzees. Now, we describe our reasons for the latter situation. For example, we explain that “About 5% of genes have significant sequence differences in humans and chimpanzees, but more show expression differences due to regulatory sequences. We sorted target genes by their DBS affinity and, to be prudential, chose the top 2000 genes (DBS length>252 bp and binding affinity>151) and bottom 2000 genes (DBS length<60 bp but binding affinity>36) to conduct over-representation analysis”.

      (4) We also carefully choose proper words to make descriptions more accurate.

      Responses to the suggestion “new data generated could be better incorporated into the text”.

      (1) We think that this sentence “The occurrence of HS lncRNAs and their DBSs may have three situations – (a) HS lncRNAs preceded their DBSs, (b) HS lncRNAs and their DBSs co-occurred, (c) HS lncRNAs succeeded their DBSs. Our results support the third situation and the rewiring hypothesis”, previously in Discussion, should be better in section 2.3. We have revised it and moved it into the second paragraph of section 2.3.

      (2) Our two new analyses generated new data, and we describe them in Results.

      (3) It is possible to move more materials from Supplementary Notes to the main text, but it is probably unnecessary because the main text currently has eight sub-sections, two tables, and four figures.

      Responses to the comment “the interpretation is too quick to infer selection”.

      (1) When using XP-CLR, iSAFE, Tajima's D, Fay-Wu's H, the fixation index (Fst), and linkage disequilibrium (LD) to detect selection signals, we used the widely adopted parameters and thresholds but did not mention this clearly in the original manuscript. Now, in the first sentence of the second paragraph of section 2.4, we add the phrase “with widely-used parameters and thresholds” (more details are available in section 4.7 and Supplementary Notes).

      (2) It is not the first time we used these tests. Actually, we used these tests in two other studies (Tang et al. Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep. 2022; Tang et al. PopTradeOff: A database for exploring population-specificity of adaptive evolution, disease susceptibility, and drug responsiveness. Comput Struct Biotechnol J. 2023). In this manuscript, section 2.5 and section 4.12 describe how we use these tests to detect signals and infer selection. We also cite the above two published papers from which the reader can obtain more details.

      (3) Also, in section 2.4, we stress that “Signals in considerable DBSs were detected by multiple tests, indicating the reliability of the analysis”.

      To further respond to the comments of “lack of suitable methods” and “this paper would benefit from a more rigorous approach to tackling it”, we have performed two new analyses. The results of the new analyses agree well with previous results and provide new support for the main conclusion. The result of section 2.5 is novel and interesting.

      We write in Discussion “Two questions are how mouse-specific lncRNAs specifically rewire gene expression in mice and how human- and mouse-specific rewiring influences the cross-species transcriptional differences”. To investigate whether the rewiring of gene expression by HS lncRNA in humans is accidental in evolution, we have made further genomic and transcriptomic analyses (Lin et al. Intrinsically linked lineage-specificity of transposable elements and lncRNAs reshapes transcriptional regulation species- and tissue-specifically. doi: https://doi.org/10.1101/2024.03.04.583292). To verify the obtained conclusions, we analyzed the spermatogenesis data from multiple species and obtained supporting evidence (not published).

      I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

      Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

      (1) Using RNA:DNA base-pairing rules, other DBS prediction programs return just DBSs with lengths. Using RNA:DNA base-pairing rules and a variant of Smith-Waterman local alignment, LongTarget returns DBSs with lengths and identity values together with DBDs (local alignment makes DBDs and DBSs predicted simultaneously). Thus, instead of measuring lncRNA/DNA binding based on DBS length, we measure lncRNA/DNA binding based on both DBS length and DBD/DBS identity (simply called identity, which is the percentage of paired nucleotides in the RNA and DNA sequences). This allows us to define “binding affinity”. One may think that binding affinity is a more complex function of length and identity. But, according to in vitro studies (see the review Abu Almakarem et al. 2012 and citations therein, and see He et al. 2015 and citations therein), the strength of a triplex is determined by all paired nucleotides (i.e., triplet). Thus, binding affinity=length * identity is biologically reasonable.

      (2) Further, different from predicting DBS upon individual base-pairing rules such as AT-G and CG-C, LongTarget integrates base-pairing rules into rulesets, each covering A, T, C, and G (see the two figures below, which are from He et al 2015). This makes every nucleotide in the RNA and DNA sequences comparable and allows the computation of identity.

      (3) On whether LongTarget may predict unreasonably long DBSs. Three technical features of LongTarget make this highly unlikely (and more unlikely than other programs). The three features are (a) local alignment, (b) gap penalty, and (c) TT penalty (He et al. 2015).

      (4) Some researchers may think that a higher identity threshold (e.g., 0.8 or even higher) makes the predicted DBSs more reliable. This is not true. To explore plausible identity values, we analyzed the distribution of Kcnq1ot1’s DBSs in the large Kcnq1 imprinting region (which contains many known imprinted genes). We found that a high threshold for identity (e.g., 0.8) will make DBSs in many known imprinted genes fail to be predicted. Upon our analysis of many lncRNAs and upon early in vitro experiments, plausible identity values range from 0.4 to 0.8.

      (5) Is it necessary or advisable to define an identity threshold? Since identity values from 0.4 to 0.8 are plausible and identity is a property of a DBS but does not reflect the strength of the whole triplex, it is more reasonable to define a threshold for binding affinity to control predicted DBSs. As explained above, binding affinity = length*identity is a reasonable measure of the strength of a triplex. The default threshold is 60, and given an identity of 0.6 in many triplexes, a DBS with affinity=60 is about 100 bp. Compared with TF binding sites (TFBS), 100 bp is quite long. As we explain in the main text, “taking a DBS of 147 bp as an example, it is extremely unlikely to be generated by chance (p < 8.2e-19 to 1.5e-48)”.

      (6) How to validate predicted DBSs? Validation faces these issues. (a) DBDs are predicted on the genome level, but target transcripts are expressed in different tissues and cells. So, no single transcriptomic dataset can validate all predicted DBSs of a lncRNA. No matter using what techniques and what cells, only a small portion of predicted DBSs can be experimentally captured (validated). (b) The resolution of current experimental techniques is limited; thus, experimentally identified DBSs (i.e., “peaks”) are much longer than computationally predicted DBSs. (c) Experimental results contain false positives and false negatives. So, validation (or performance evaluation) should also consider the ROC curves (Wen et al. 2022).

      (7) As explained above, a long DBS may have a lower binding affinity than a short DBS. A biological interpretation is that the long DBS may accumulate mutations that decrease its binding ability gradually.

      There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

      (1) We did not make this assumption. Apparently, binding depends on multiple factors, including co-expression of genes and specific cellular context.

      (2) On the second issue, “this is not done systematically, or genome-wide”. We did genome-wide but did not show all results (supplementary fig 2 shows three genomic regions, which are impressively good). In Wen et al. 2022, we describe the overall results.

      It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

      (1) We predicted DBSs in the promoter region of 179128 Ensembl-annotated transcripts and did not merge DBSs (there is no need to merge them). If multiple transcripts share the same TSS, they may share the same DBS, which is natural.

      (2) If the DBSs of multiple transcripts of a gene overlap, the overlap does not raise a problem for lncRNA/DNA binding analysis in specific tissues because usually only one transcript is expressed in a tissue. Therefore, there is no such situation “If, e.g., a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one?”

      (3) It is unclear to us what “it seems like not correcting for this uneven distribution of transcripts is likely to skew results” means. Regarding testing against genome-wide distributions, statistically, it is beneficial to make many rounds of random draws genome-wide, but this will take a huge amount of time. Since more variables demand more rounds of drawing, to our knowledge, this is not widely practiced in large-scale transcriptomic data analyses.

      (4) If the difference (result) is small thus calls for rigorous statistical testing, making many rounds of random draws genome-wide is necessary. In our results, “45% of these pairs show a significant expression correlation in specific tissues (Spearman's |rho| >0.3 and FDR <0.05). In contrast, when randomly sampling 10000 pairs of lncRNAs and protein-coding transcripts genome-wide, the percent of pairs showing this level of expression correlation (Spearman's |rho| >0.3 and FDR <0.05) is only 2.3%”.

      Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

      (1) We used the g:Profiler program to perform over-representation analysis to identify enriched GO terms. This analysis is used to determine what pre-defined gene sets (GO terms) are more present (over-represented) in a list of “interesting” genes than what would be expected by chance. Specifically, this analysis is often used to examine whether the majority of genes in a pre-defined gene set fall in the extremes of a list: the top and bottom of the list, for example, may correspond to the largest differences in expression between the two cell types. g:Profiler always takes the whole genome as the reference; that is why we did not mention the whole genome reference. We now add in section 2.2 “(with the whole genome as the reference)”.

      (2) Why choosing 2000 but not 2500 genes is somewhat subjective. We now explain that “About 5% of genes have significant sequence differences in humans and chimpanzees, but more show expression differences due to regulatory sequences. We sorted target genes by their DBS affinity and, to be prudential, chose the top 2000 genes (DBS length>252 bp and binding affinity>151) and bottom 2000 genes (DBS length<60 bp but binding affinity>36) to conduct over-representation analysis”.

      Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

      We used these tests to detect selection signals in DBSs but not in the whole promoter regions. Using promoters without HS lncRNA DBS as the control also has risks because promoter regions contain other kinds of regulatory sequences.

      There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same threshold of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

      We feel it is hard to know whether or not the temporal separation between these specimens is sufficient to explain the differences because many details of archaic humans and their genomes remain unknown and because mechanisms determining genotype-phenotype relationships remain poorly known. After 0.034 was determined, these numbers of genes were determined accordingly. We chose parameters and thresholds that best suit the most important requirements, but these parameters and thresholds may not best suit other requirements; this is a problem for all large-scale studies.     

      Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

      (1) We add in Discussion that “Finally, not all detected signals reliably indicate positive selection”.

      (2) Our results are that more signals are detected in CEU and CHB than in YRI; this agrees all population genetics studies and implies that our results are not wrongly biased because more samples and larger samples were obtained from CEU and CHB.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of volatile emission due to physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of volatiles, such as green leaf volatiles (GLVs). We ensured the revised manuscript reflects this nuanced interpretation (lines 282 – 286). However, we also explained more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause (lines 115 – 117). We revised relevant passages throughout the results and discussion to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we provide a new supplementary table S2 that additionally shows the classification of the identified substances, which also shows that the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are monoterpenes, sesquiterpenes, or aromatic compounds, and not GLVs (that are typically emitted following damage).

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      2a The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes (lines 115 – 117). We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      2b It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We pointed out this justification for our focus on Desmodium in the manuscript (lines 435 - 439). Additionally, we suggested in the discussion that future studies should measure volatile profiles from all plants commonly used in push-pull systems alongside Desmodium (lines 267 – 269).

      2c To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      We very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive" solely when referring to findings from the literature (lines 99–103), in order to avoid making any definitive claims about our own results.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We think our revisions made the specific aims of this study more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we ensured a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis than what the data show in all the lines that were mentioned.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in lines 275 – 279. Furthermore, we included detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we pointed out for our specific case. We stated that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering. We would like to point out that this common caveat does not invalidate experimental designs when practicing replication and randomization. We assume that insects are able to select suitable oviposition sites in the background of such confounding factors under realistic conditions.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      5a. The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and changed the terminology accordingly from ‘wind tunnel’ to ‘dual choice assay’. We have now conducted an additional experiment which we called ‘no-choice assay’ that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) was tested separately, with only one treatment conducted per evening to avoid cross-contamination, as described in the methods section of the no-choice assay.

      5b. There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We added this to the discussion (lines 369 – 374). The new no-choice assays also address this concern by using a setup with laminar flow.

      5c. Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We added these limitations to the discussion and addressed these concerns with new experiments (see answer 5a).

      5d. The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants? 

      We added the missing information to the methods and provided details about types of bags, manufacturers, and pre-treatments in the method section. In short, PTFE tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      5e. The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We included a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during the dual-choice and the no-choice assays reported in the revised manuscript. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The plants were grown in insect-proof screen houses to prevent damage by insects and carefully curved without damaging them to fit into the bag. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits detectable or “significant” amounts of volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test f. or behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3, which shows the mixed model applied for oviposition bioassays. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We provided a cross-reference to Table 3 from the figure description of Figure 4 (D) for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We tried to keep our figures as clear as possible.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we provide an additional table S2 which identifies the overlap with volatiles previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The points raised are largely self-explanatory as to what needs to be done to fully resolve them. At a minimum the text needs to be seriously revised to:

      (1) reflect the data obtained.

      (2) reflect on the limitations of their experimental setup and data obtained.

      (3) put the data obtained and its limitations in what these tell us and particularly what not. Ideally, additional headspace measurements are taken, including from herbivory and 'clean' maize and Desmodium (in which there is better control of biotic and abiotic stress), as well as other crops commonly planted as companion crops with maize (but none of them reducing pest pressure).

      Thank you for this summary. Please see our detailed responses above.

      In addition to the main points of critique provided above, I have provided additional comments in the text (https://elife-rp.msubmit.net/elife-rp_files/2024/07/18/00134767/00/134767_0_attach_28_25795_convrt.pdf). These elaborate on the above points and include some new ones too. These are the major points of critique, which I hope the authors can address.

      Thank you very much for these detailed comments.

      Reviewer #2 (Recommendations for the authors):

      It is important to note that the original push-pull system was developed against stemborers and involved Napier grass (still used) around the field, which attracts stemborer moths, and Molasses grass as the intercrop that repels the moths and attracts parasitoids. Later, Molasses grass was replaced by desmodiums because it is a legume that fixes nitrogen and therefore can increase nitrate levels in the soil, but most importantly because it prevents germination of the parasitic Striga weed. The possible repellent effect of desmodium on pests and attraction of natural enemies was never properly tested but assumed, probably to still be able to use the push-pull terminology. This "mistake" should be recognized here and in future publications. It is a real pity that the controversy over the repellent effect of desmodium distracts from the amazing success of the push-pull system, also against the fall armyworm.

      We thank the reviewer for pointing out these issues, which are part of the reason for our Figure 1 and why we would like to keep it. We have described this development of the system in the introduction to better present the push-pull system. Our aim in Figure 1 and Table S1 is to highlight both the evidence of the system's success, and the gaps in our understanding, regarding specifically control of damage from the FAW.

    1. Reviewer #1 (Public review):

      Summary:

      This paper addresses an important and topical issue: how temporal context, at various time scales, affects various psychophysical measures, including reaction times, accuracy, and localization. It offers interesting insights, with separate mechanisms for different phenomena, which are well discussed.

      Strengths:

      The paradigm used is original and effective. The analyses are rigorous.

      Weaknesses:

      Here I make some suggestions for the authors to consider. Most are stylistic, but the issue of precision may be important.

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above).

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (1) Developmental time series:

      It was not entirely clear how this experiment relates to the rest of the manuscript, as it does not compare any effects of transport within or across species.

      Implemented Changes:  

      The importance of species arrival timing for community assembly is addressed in both the introduction and discussion. To accommodate the reviewer’s concerns and further emphasize this point, we have added a clarifying sentence to the results section and included an illustrative example with supporting literature in the discussion.

      Results: Clarifying the timing of initial microbial colonization is essential for determining whether and how priority effects mediate community assembly of vertically transmitted microbes in early life, or whether these microbes arrive into an already established microbial landscape. We used non-sterile frogs of our captive laboratory colony (…)

      Discussion: For example, early microbial inoculation has been shown to increase the relative abundance of beneficial taxa such as Janthinobacterium lividum (Jones et al., 2024), whereas efforts to introduce the same probiotic into established adult communities have not led to long-term persistence (Bletz, 2013; Woodhams et al., 2016).  

      (2) Cross-foster experiment:

      The "heterospecific transport" tadpoles were manually brushed onto the back of the surrogate frog, while the "biological transport" tadpoles were picked up naturally by the parent. It is a little challenging to interpret the effect of caregiver species since it is conflated with the method of attachment to the parent. I noticed that the uptake of Os-associated microbes by Os-transported tadpoles seemed to be higher than the uptake of Rv-associated microbes by Rv-associated tadpoles (comparing the second box from the left to the rightmost boxplot in panel S2C). Perhaps this could be a technical artifact if manual attachment to Os frogs was more efficient than natural attachment to Rv frogs.

      I was also surprised to see so much of the tadpole microbiome attributed to Os in tadpoles that were not transported by Os frogs (25-50% in many cases). It suggests that SourceTracker may not be effectively classifying the taxa.

      Implemented Changes:  

      Methods (Study species, reproductive strategies and life history): Oophaga sylvatica (Os) (Funkhouser, 1956; CITES Appendix II, IUCN Conservation status: Near Threatened) is a large, diurnal poison frog (family Dendrobatidae) inhabiting lowland and submontane rainforests in Colombia and Ecuador. While male Os care for the clutch of up to seven eggs, females transport 1-2 tadpoles at a time to water-filled leaf axils where tadpoles complete their development (Pašukonis et al., 2022; Silverstone, 1973; Summers, 1992). Notably, females return regularly to these deposition sites to provision their offspring with unfertilized eggs.

      Discussion: Most poison frogs transport tadpoles on their backs, but the mechanism of adherence remains unclear. Similar to natural conditions, tadpoles that are experimentally placed onto a caregiver’s back also gradually adhere to the dorsal skin, where they remain firmly attached for several hours as the adult navigates dense terrain. Although transport durations were standardized, species-specific factors- such as microbial density at the contact site, microbial taxa identity, and skin physiology such as moisture -could influence microbial transmission between the transporting frog and the tadpole. While these differences may have contributed to varying transmission efficacies observed between the two frog species in our experiment, none of these factors should compromise the correct microbial source assignment. We thus conclude that transporting frogs serve as a source of microbiota for transported tadpoles. However, further studies on species-specific physiological traits and adherence mechanisms are needed to clarify what modulates the efficacy of microbial transmission during transport, both under experimental and natural conditions.  

      Methods (Vertical transmission): Cross-fostering tadpoles onto non-parental frogs has been used previously to study navigation in poison frogs (Pašukonis et al., 2017). According to our experience, successful adherence to both parent and heterospecific frogs depends on the developmental readiness of tadpoles, which must have retracted their gills and be capable of hatching from the vitelline envelope through vigorous movement. Another factor influencing cross-fostering success is the docility of the frog during initial attachment, as erratic movements easily dislodge tadpoles before adherence is established. Rv are small, jumpy frogs that are easily stressed by handling, making experimental fostering of tadpoles—even their own— impractical. Therefore, we favored an experimental design where tadpoles initiate natural transport and parental frogs pick them up with a 100% success rate. We chose the poison frog Os as foster frogs because adults are docile, parental care in this species involves transporting tadpoles, and skin microbial communities differ from Rv- a critical prerequisite for our SourceTracker analysis. The use of the docile Os as the foster species enabled a 100% cross-fostering success rate, with no notable differences in adherence strength after six hours.

      Methods (Sourcetracker Analysis): To assess training quality, we evaluated model selfassignment using source samples. We selected the model trained on a dataset rarefied to the read depth of the adult frog sample with the lowest read count (48162 reads), as it showed the best overall self-assignment performance, whereas models trained on datasets rarefied to the lowest overall read depth performed worse. Unlike studies using technical replicates, our source samples represent distinct biological individuals and sampling timepoints, where natural microbiome variability is expected within each source category. Consequently, we considered self-assignment rates above 70% acceptable. All source samples were correctly assigned to their respective categories (Rv, Os, or control), but with varying proportions of reads assigned as 'Unknown'. Adult frog sources were reliably selfidentified with high confidence (Os: 97.2% median, IQR = 1.4; Rv: 76.3% median, IQR = 38.1). Adult R. variabilis frogs displayed a higher proportion of 'Unknown' assignments compared to O. sylvatica, likely reflecting greater biological variability among individuals and/or a higher proportion of rare taxa not well captured in the training set. The control tadpole source showed lower self-assignment accuracy (median = 30.5%, IQR = 17.1), as expected given the low microbial biomass of these samples, which resulted in low read depth. Low readdepth limits the information available to inform the iterative updating steps in Gibbs sampling and reduces confidence in source assignments. We therefore verified the robustness of our results by performing the second Sourcetracker analysis as described above, training the model only on adult sources and assigning all tadpoles, including lowbiomass controls, as sinks (as described above). Self-assignment rates for the second training set varied (O. sylvatica: 79.2% median, IQR = 29; R. variabilis: 96.6% median, IQR = 3.7), while results remained consistent across analyses, supporting the reliability of our findings.

      (3) Cross-species analysis:

      Like the developmental time series, this analysis doesn't really address the central question of the manuscript. I don't think it is fair for the authors to attribute the difference in diversity to parental care behavior, since the comparison only includes n=2 transporting species and n=1 non-transporting species that differ in many other ways. I would also add that increased diversity is not necessarily an expectation of vertical transmission. The similarity between adults and tadpoles is likely a more relevant outcome for vertical transmission, but the authors did not find any evidence that tadpole-adult similarity was any higher in species with tadpole transport. In fact, tadpoles and adults were more similar in the non-transporting species than in one of the transporting species (lines 296-298), which seems to directly contradict the authors' hypothesis. I don't see this result explained or addressed in the Discussion.

      To address the reviewer’s concerns, we implemented the following changes:  

      Results:

      We rephrased the following sentence from the results part:  

      “These variations may therefore be linked to differing reproductive traits: Af and Rv lay terrestrial egg clutches and transport hatchlings to water, whereas Ll, a non-transporting species, lays eggs directly in water.”

      To read

      “These variations may therefore reflect differences in life history traits among the three species.”

      We moved the information on differing reproductive strategies into the Discussion, where it contributes to a broader context alongside other life history traits that may influence community diversity.

      Discussion (1): We added to our discussion that increased microbial diversity was not an expected outcome of vertical transmission.

      “However, increased microbial diversity is not a known outcome of vertical transmission, and further studies across a broader range of transporting and non-transporting species are needed to assess the role of transport in shaping diversity of tadpole-associated microbial communities.”

      Discussion (2): Likewise, communities associated with adults and tadpoles of transporting species were no more similar than those of non-transporting species. While poison frog tadpoles do acquire caregiver-specific microbes during transport, most of these microbes do not persist on the tadpoles' skin long-term. This pattern can likely be attributed to the capacity of tadpole skin- and gut microbiota to flexibly adapt to environmental changes (Emerson & Woodley, 2024; Santos et al., 2023; Scarberry et al., 2024). It may also reflect the limited compatibility of skin microbiota from terrestrial adults with aquatic habitats or tadpole skin, which differs structurally from that of adults (Faszewski et al., 2008). As a result, many transmitted microbes are probably outcompeted by microbial taxa continuously supplied by the aquatic environment. Interestingly, microbial communities of the non-transporting Ll were more similar to their adult counterparts than those of poison frogs. This pattern might reflect differences in life history among the species. While adult Ll commonly inhabit the rock pools where their tadpoles develop, adults of the two poison frog species visit tadpole nurseries only sporadically for deposition. These differences in habitat use may result in adult Ll hosting skin microbiota that are better adapted to aquatic environments as compared to Rv and Af. Additionally, their presence in the tadpoles’ habitat could make Ll a more consistent source of microbiota for developing tadpoles.

      (4) Field experiment: The rationale and interpretation of the genus-level network are not clear, and the figure is not legible. What does it mean to "visualize the microbial interconnectedness" or to be a "central part of the community"? The previous sentences in this paragraph (lines 337-343) seem to imply that transfer is parent-specific, but the genuslevel network is based on the current adult frogs, not the previous generation of parents that transported them. So it is not clear that the distribution or co-distribution of these taxa provides any insight into vertical transmission dynamics.

      Implemented Changes:  

      We appreciate the reviewer’s close reading and understand how the inclusion of the network visualization without further clarification may have led to confusion. To clarify, the network was constructed from all adult frogs in the population, including—but not limited to—the parental frogs examined in the field experiment. We do not make any claims about the origin of the microbial taxa found on parental frogs. Rather, our aim was to illustrate how genera retained on tadpoles (following potential vertical transmission) contribute to the skin microbial communities of adult frogs of this population beyond just the parental individuals. This finding supports the observation that these retained taxa are generally among the most abundant in adult frogs. However, since this information is already presented in Table S8 and the figure is not essential to the main conclusions, we have removed Supplementary Figure S5 and the accompanying sentence: “A genus-level network constructed from 44 adult frogs shows that the retained genera make up a central part of the community of adult Rv in wild populations (Fig. S5).” We have adjusted the Methods section accordingly.

      Reviewer #2:

      I did not find any major weaknesses in my review of this paper. The work here could potentially benefit from absolute abundance levels for shared ASVs between adults and tadpoles to more thoroughly understand the influences of vertical transmission that might be masked by relative abundance counts. This would only be a minor improvement as I think the conclusions from this work would likely remain the same, however.

      In response to the reviewer’s suggestion, we estimated the absolute abundance of specific ASVs for all samples of tadpoles in which Sourcetracker identified shared ASVs between adults and tadpoles. The resulting scaled absolute abundance values (in copies/μL and copies per tadpole) are provided in Table S10, and a description of the method has been incorporated into the revised Methods section of the manuscript. To support the robustness of this approach in our dataset, we additionally designed an ASV-specific system for ASV24902-Methylocella. Candidate primers were assessed for specificity by performing local BLASTn alignments against the full set of ASV sequences identified in the respective microbial communities of tadpoles. We optimized the annealing temperature via gradient PCR and confirmed primer specificity through Sanger sequencing of the PCR product (Forward: 5′–GAGCACGTAGGCGGATCT–3′ Reverse: 5′–GGACTACNVGGGTWTCTAAT–3′). Using this approach, we confirmed that the relative abundance of ASV24902 (18.05% in the amplicon sequencing data) closely matched its proportion of the absolute 16S rRNA copy number in transported tadpole 6 (18.01%). While we intended to quantify all shared ASVs, we were limited to this single target due to insufficient material for optimizing the assays. As this particular ASV was also detected in the water associated with the same tadpole, we chose not to include this confirmation in the manuscript. Nevertheless, the close match supports the reliability of our approach for scaling absolute abundances in this dataset.

      Results: Absolute abundances of shared ASVs likely originating from the parental source pool (as identified by Sourcetracker) after one month of growth ranged from 7804 to 172326 copies per tadpole (Table S10).

      Methods: Quantitative analysis of 16S rRNA copy numbers with digital PCR (dPCR)

      Absolute abundances were estimated for ASVs that were shared between tadpoles after a one-month growth period and their respective caregivers, and for which Sourcetracker analysis identified the caregiver as a likely source of microbiota. We followed the quantitative sequencing framework described by Barlow et al. (2020), measuring total microbial load via digital PCR (dPCR) with the same universal 16S rRNA primers used to amplify the v4 region in our sequencing dataset. Absolute 16S rRNA copy numbers obtained from dPCR were then multiplied by the relative abundances from our amplicon sequencing dataset to calculate ASV-specific scaled absolute abundances. All dPCR reactions were carried out on a QIAcuity Digital PCR System (Qiagen) using Nanoplates with a 8.5K partition configuration, using the following cycling program: 95°C for 2 minutes, 40 cycles of 95°C for 30 seconds and 52°C for 30 seconds and 72°C for 1 minute, followed by 1 cycle of 40°C for 5 minutes. Reactions were prepared using the QIAcuity EvaGreen PCR Kit (Qiagen, Cat. No. 250111) with 2 µL of DNA template per reaction, following the manufacturer's protocol, and included a negative no-template control and a cleaned and sequenced PCR product as positive control. Samples were measured in triplicates and serial dilutions were performed to ensure accurate quantification. Data were processed with the QIAcuity Software Suite (v3.1.0.0). The threshold was set based on the negative and positive controls in 1D scatterplots. We report mean copy numbers per microliter with standard deviations, correcting for template input, dPCR reaction volume, and dilution factor. Mean copy numbers per tadpole were additionally calculated by accounting for the DNA extraction (elution) volume.  

      Recommendations for the authors:

      Reviewer #1:

      (1) Figure 1b summarizes the ddPCR data as a binary (detected/not detected), but this contradicts the main text associated with this figure, which describes bacteria as present, albeit in low abundances, in unhatched embryos (lines 145-147). Could the authors keep the diagram of tadpole development, which I find very useful, but add the ddPCR data from Figure S1c instead of simply binarizing it as present/absent?

      We appreciate the reviewer’s positive feedback on the clarity of the figure. We agree that presenting the ddPCR data in a more quantitative manner provides a more accurate representation of bacterial abundance across developmental stages. In response, we have retained the developmental diagram, as suggested, and replaced the binary (detected/not detected) information in Figure 1B with rounded mean values for each stage. To complement this, we have included mean values and standard deviations in Table S1. The corresponding text in the main manuscript and legends has been revised accordingly to reflect these changes.  

      (2) More information about the foster species, Oophaga sylvatica, would be helpful. Are they sympatric with Rv? Is their transporting behavior similar to that of Rv?

      We thank the reviewer for this helpful comment. In response, we have added further details on the biology and parental care behavior of Oophaga sylvatica, including information on its distribution range. The species does not overlap with Ranitomeya variabilis at the specific study site where the field work was conducted, although the species are sympatric in other countries. These additions have been incorporated into the Methods section under "Study species, reproductive strategies, and life history."  

      (3) Plotting the proportion of each tadpole microbiome attributed to R. variabilis and the proportion attributed to O. sylvatica on the same plot is confusing, as these points are nonindependent and there is no way for the reader to figure out which points originated from the same tadpole. I would suggest replacing Figure 1D with Figure S2C, which (if I understand correctly) displays the same data, but is separated according to source.

      We agree with the reviewer that Figure S2C allows for clearer interpretation of our results. In response, we implemented the suggested change and replaced Figure 1D with the alternative visualization previously shown in Figure S2C, which displays the same data separated by source. To provide readers with a complementary overview of the full dataset, we have retained the original combined plot in the supplementary material as Figure S2D.

      (4) On the first read, I found the use of "transport" in the cross-fostering experiment confusing until I understood that they weren't being transported "to" anywhere in particular, just carried for 6 hours. A change of phrasing might help readers here.

      We acknowledge the reviewer’s concern and have replaced “transported” with “carried” to avoid confusion for readers who may be unfamiliar with the behavioral terminology. However, because “transport” is the term widely used by specialists to describe this behavior, we now introduce it in the context of the experimental design with the following phrasing:

      “For this design, sequence-based surveys of amplified 16S rRNA genes were used to assess the composition of skin-associated microbial communities on tadpoles and their adult caregivers (i.e., the frogs carrying the tadpoles, typically referred to as ‘transporting’ frogs).”

      (5) "Horizontal transfer" typically refers to bacteria acquired from other hosts, not environmental source pools (line 394).

      We addressed this concern by rephrasing the sentence in the Discussion to avoid potential confusion. The revised text now reads:

      “Across species, newborns might acquire bacteria not only through transfer from environmental source pools and other hosts (…)”  

      (6) The authors suggest that tadpole transport may have evolved in Rv and Af to promote microbial diversity because "increased microbial diversity is linked to better health outcomes" (lines 477-479). It is often tempting to assume that more diversity is always better/more adaptive, but this is not universally true. The fact that the Ll frogs seem to be doing fine in the same environment despite their lower microbiome diversity suggests that this interpretation might be too far of a reach based on the data here.

      We appreciate the reviewer’s concern, agree that increased microbial diversity is not inherently advantageous and have revised the paragraph to make this clearer.  

      “While increased microbial diversity is not inherently advantageous, it has been associated with beneficial outcomes such as improved immune function, lower disease risk, and enhanced fitness in multiple other vertebrate systems.”

      However, rather than claiming that greater diversity is always advantageous, we suggest that this possibility should not be excluded and consider it a relevant aspect of a comprehensive discussion. We also note that whether poison frog tadpoles perform equally well with lower microbial diversity remains an open question. Drawing such conclusions would require experimental validation and cannot be inferred from comparisons with an evolutionarily distant species that differs in life history.

      Reviewer #2:

      (1) Figure 2: Are the data points in C a subset (just the tadpoles for each species) of B? The numbers look a little different between them. The number of observed ASVs in panel B for Rv look a bit higher than the observed ASVs in panel C.

      The data shown in panel C are indeed a subset of the samples presented in panel B, focusing specifically on tadpoles of each species. The slight differences in the number of observed ASVs between panels result from differences in rarefaction depth between comparisons: due to variation in sequencing depth across species and life stages, we performed rarefaction separately for each comparison in order to retain the highest number of taxa while ensuring comparability within each group. Although we acknowledge that this is not a standard approach, we found that results were consistent when rarefying across the full dataset, but chose the presented approach to better accommodate variation in our sample structure. This methodological detail is described in the Methods section:

      “All alpha diversity analyses were conducted with datasets rarefied to 90% of the read number of the sample with the fewest reads in each comparison and visualized with boxplots.”

      It is also noted in the figure legend: “The dataset was separately rarefied to the lowest read depth f each comparison.” We hope this clarification adequately addresses the reviewer’s concern and therefore have not made additional changes.

      (2) Lines 304-305: in the Figure 4B plot, there appear to be 12 transported tadpoles and 8 non-transported tadpoles.

      Thank you for catching this. We have corrected the plot and the associated statistics (alpha and beta diversity) in the results section as well as in the figure. Importantly, the correction did not affect any other results, and the overall findings and interpretations remain unchanged.  

      (3) Line 311: I think this should be Figure 4B.

      (4) Line 430: tadpole transport.

      (5) Line 431: I believe commas need to surround this phrase "which range from a few hours to several days depending on the species (Lötters et al., 2007; McDiarmid & Altig, 1999; Pašukonis et al., 2019)".

      We thank the reviewer for the thorough review and have corrected all typographical and formatting errors noted in comments (3) – (5).

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors): 

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2 (Recommendations for the authors):

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3 (Recommendations for the authors):

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Genome-wide association studies have been an important approach to identifying the genetic basis of human traits and diseases. Despite their successes, for many traits, a substantial amount of variation cannot be explained by genetic factors, indicating that environmental variation and individual 'noise' (stochastic differences as well as unaccounted for environmental variation) also play important roles. The authors' goal was to address whether gene expression variation in genetically identical individuals, driven by historical environmental differences and 'noise', could be used to predict reproductive trait differences. 

      Strengths: 

      To address this question, the authors took advantage of genetically identical C. elegans individuals to transcriptionally profile 180 adult hermaphrodite individuals that were also measured for two reproductive traits. A major strength of the paper is its experimental design. While experimenters aim to control the environment that each worm experiences, it is known that there are small differences that each worm experiences even when they are grown together on the same agar plate - e.g. the age of their mother, their temperature, the amount of food they eat, and the oxygen and carbon dioxide levels depending on where they roam on the plate. Instead of neglecting this unknown variation, the authors design the experiment up front to create two differences in the historical environment experienced by each worm: 1) the age of its mother and 2) 8 8-hour temperature difference, either 20 or 25 {degree sign}C. This helped the authors interpret the gene expression differences and trait expression differences that they observed. 

      Using two statistical models, the authors measured the association of gene expression for 8824 genes with the two reproductive traits, considering both the level of expression and the historical environment experienced by each worm. Their data supports several conclusions. They convincingly show that gene expression differences are useful for predicting reproductive trait differences, predicting ~25-50% of the trait differences depending on the trait. Using RNAi, they also show that the genes they identify play a causal role in trait differences. Finally, they demonstrate an association with trait variation and the H3K27 trimethylation mark, suggesting that chromatin structure can be an important causal determinant of gene expression and trait variation. 

      Overall, this work supports the use of gene expression data as an important intermediate for understanding complex traits. This approach is also useful as a starting point for other labs in studying their trait of interest. 

      We thank the reviewer for their thorough articulation of the strengths of our study.  

      Weaknesses: 

      There are no major weaknesses that I have noted. Some important limitations of the work (that I believe the authors would agree with) are worth highlighting, however: 

      (1) A large remaining question in the field of complex traits remains in splitting the role of non-genetic factors between environmental variation and stochastic noise. It is still an open question which role each of these factors plays in controlling the gene expression differences they measured between the individual worms. 

      Yes, we agree that this is a major question in the field. In our study, we parse out differences driven between known historical environmental factors and unknown factors, but the ‘unknown factors’ could encompass both unknown environmental factors and stochastic noise.

      (2) The ability of the authors to use gene expression to predict trait variation was strikingly different between the two traits they measured. For the early brood trait, 448 genes were statistically linked to the trait difference, while for egg-laying onset, only 11 genes were found. Similarly, the total R2 in the test set was ~50% vs. 25%. It is unclear why the differences occur, but this somewhat limits the generalizability of this approach to other traits. 

      We agree that the difference in predictability between the two traits is interesting. A previous study from the Phillips lab measured developmental rate and fertility across Caenorhabditis species and parsed sources of variation (1). Results indicated that 83.3% of variation in developmental rate was explained by genetic variation, while only 4.8% was explained by individual variation. In contrast, for fertility, 63.3% of variation was driven by genetic variation and 23.3% was explained by individual variation. Our results, of course, focus only on predicting the individual differences, but not genetic differences, for these two traits using gene expression data. Considering both sets of results, one hypothesis is that we have more power to explain nongenetic phenotypic differences with molecular data if the trait is less heritable, which is something that could be formally interrogated with more traits across more strains.

      (3) For technical reasons, this approach was limited to whole worm transcription. The role of tissue and celltype expression differences is important to the field, so this limitation is important. 

      We agree with this assessment, and it is something we hope to address with future work.

      Reviewer #2 (Public review): 

      Summary: 

      This paper measures associations between RNA transcript levels and important reproductive traits in the model organism C. elegans. The authors go beyond determining which gene expression differences underlie reproductive traits, but also (1) build a model that predicts these traits based on gene expression and (2) perform experiments to confirm that some transcript levels indeed affect reproductive traits. The clever study design allows the authors to determine which transcript levels impact reproductive traits, and also which transcriptional differences are driven by stochastic vs environmental differences. In sum, this is a rather comprehensive study that highlights the power of gene expression as a driver of phenotype, and also teases apart the various factors that affect the expression levels of important genes. 

      Strengths: 

      Overall, this study has many strengths, is very clearly communicated, and has no substantial weaknesses that I can point to. One question that emerges for me is about the extent to which these findings apply broadly. In other words, I wonder whether gene expression levels are predictive of other phenotypes in other organisms. I

      think this question has largely been explored in microbes, where some studies (PMID: 17959824) but not others (PMID: 38895328) find that differences in gene expression are predictive of phenotypes like growth rate. Microbes are not the primary focus here, and instead, the discussion is mainly focused on using gene expression to predict health and disease phenotypes in humans. This feels a little complicated since humans have so many different tissues. Perhaps an area where this approach might be useful is in examining infectious single-cell populations (bacteria, tumors, fungi). But I suppose this idea might still work in humans, assuming the authors are thinking about targeting specific tissues for RNAseq. 

      In sum, this is a great paper that really got me thinking about the predictive power of gene expression and where/when it could inform about (health-related) phenotypes. 

      We thank the reviewer for recognizing the strengths of our study. We are also interested in determining the extent to which predictive gene expression differences operate in specific tissues.

      Reviewer #3 (Public review): 

      Summary: 

      Webster et al. sought to understand if phenotypic variation in the absence of genetic variation can be predicted by variation in gene expression. To this end they quantified two reproductive traits, the onset of egg laying and early brood size in cohorts of genetically identical nematodes exposed to alternative ancestral (two maternal ages) and same generation life histories (either constant 20C temperature or 8-hour temperature shift to 25C upon hatching) in a two-factor design; then they profiled genome-wide gene expression in each individual. 

      Using multiple statistical and machine learning approaches, they showed that, at least for early brood size, phenotypic variation can be quite well predicted by molecular variation, beyond what can be predicted by life history alone. 

      Moreover, they provide some evidence that expression variation in some genes might be causally linked to phenotypic variation. 

      Strengths: 

      (1) Cleverly designed and carefully performed experiments that provide high-quality datasets useful for the community. 

      (2) Good evidence that phenotypic variation can be predicted by molecular variation. 

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses:  

      What drives the molecular variation that impacts phenotypic variation remains unknown. While the authors show that variation in expression of some genes might indeed be causal, it is still not clear how much of the molecular variation is a cause rather than a consequence of phenotypic variation. 

      We agree that the drivers of molecular variation remain unknown. While we addressed one potential candidate (histone modifications), there is much to be done in this area of research. We agree that, while some gene expression differences cause phenotypic changes, other gene expression differences could in principle be downstream of phenotypic differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      I have a number of suggestions that I believe will improve the Methods section. 

      (1) Strain N2-PD1073 will probably be confusing to some readers. I recommend spelling out that this is the Phillips lab version of N2.

      Thank you for this suggestion; we have added additional explanation of this strain in the Methods.

      (2) I found the details of the experimental design confusing, and I believe a supplemental figure will help. I have listed the following points that could be clarified: 

      a. What were the biological replicates? How many worms per replicate?

      Biological replicates were defined as experiments set up on different days (in this case, all biological replicates were at least a week apart), and the biological replicate of each worm can be found in Supplementary File 1 on the Phenotypic Data tab.

      b. I believe that embryos and L4s were picked to create different aged P0s, and eggs and L4s were picked to separate plates? Is this correct?

      Yes, this is correct.

      c. What was the spread in the embryo age?

      We assume this is asking about the age of the F1 embryos, and these were laid over the course of a 2-hour window.  

      d. While the age of the parents is different, there are also features about their growth plates that will be impacted by the experimental design. For example, their pheromone exposure is different due to the role that age plays in the combination of ascarosides that are released. It is worth noting as my reading of the paper makes it seem that parental age is the only thing that matters.

      The parents (P0) of different ages likely have differential ascaroside exposure because they are in the vicinity of other similarly aged worms, but the F1 progeny were exposed to their parents for only the 2-hour egg-laying window, in an attempt to minimize this type of effect as much as possible.  

      e. Were incubators used for each temperature?

      Yes.

      f. In line 443, why approximately for the 18 hours? How much spread?

      The approximation was based on the time interval between the 2-hour egg-laying window on Day 4 and the temperature shift on Day 5 the following morning. The timing was within 30 minutes of 18 hours either direction.

      g.  In line 444, "continually left" is confusing. Does this mean left in the original incubator?

      Yes, this means left in the incubator while the worms shifted to 25°C were moved. To avoid confusion, we re-worded this to state they “remained at 20°C while the other half were shifted to 25°C”.

      h. In line 445, "all worms remained at 20 {degree sign}C" was confusing to me as to what it indicated. I assume, unless otherwise noted, the animals would not be moved to a new temperature.

      This was an attempt to avoid confusion and emphasize that all worms were experiencing the same conditions for this part of the experiment.  

      i. What size plates were the worms singled onto?

      They were singled onto 6-cm plates.

      j. If a figure were to be made, having two timelines (with respect to the P0 and F1) might be useful.

      We believe the methods should be sufficient for someone who hopes to repeat the experiment, and we believe the schematic in Figure 1A labeling P0 and F1 generations is sufficient to illustrate the key features of the experimental design.

      k. Not all eggs that are laid end up hatching. Are these censored from the number of progeny calculations?

      Yes, only progeny that hatched and developed were counted for early brood.

      (3) For the lysis, was the second transfer to dH20 also a wash step?

      Yes.

      (4) What was used for the Elution buffer?

      We used elution buffer consisting of 10 mM Tris, 0.1 mM EDTA. We have added this to the “Cell lysate generation” section of the methods

      (5) The company that produced the KAPA mRNA-seq prep kit should be listed.

      We added that the kit was from Roche Sequencing Solutions.

      (6) For the GO analysis - one potential issue is that the set of 8824 genes might also be restricted to specific GO categories. Was this controlled for?

      We originally did not explicitly control for this and used the default enrichGO settings with OrgDB = org.Ce.eg.db as the background set for C. elegans. We have now repeated the analysis with the “universe” set to the 8824-gene background set. This did not qualitatively change the significant GO terms, though some have slightly higher or lower p-values. For comparison purposes, we have added the background-corrected sets to the GO_Terms tab of Supplementary File 1 with each of the three main gene groups appended with “BackgroundOf8824”.

      Reviewer #2 (Recommendations for the authors): 

      (1) The abstract, introduction, and experimental design are well thought through and very clear.

      Thank you.

      (2) Figure 1B could use a clearer or more intuitive label on the horizontal axis. The two examples help. Maybe the genes (points) on the left side should be blue to match Figure 1C, where the genes with a negative correlation are in the blue cluster.

      Thank you for these suggestions. We re-labeled the x-axis as “Slope of early brood vs. gene expression (normalized by CPM)”, which we hope gives readers a better intuition of what the coefficient from the model is measuring. We also re-colored the points previously colored red in Figure 1B to be color-coded depending on the direction of association to match Figure 1C, so these points are now color-coded as pink and purple.  

      (3) If red/blue are pos/neg correlated genes in 1C, perhaps different colors should be used to label ELO and brood in Figures 2 and 3. Green/purple?

      We appreciate this point, but since we ended up using the cluster colors of pink and purple in Figure 1, we opted to leave Figures 2 and 3 alone with the early brood and ELO colorcoding of red and blue.

      (4) I am unfamiliar with this type of beta values, but I thought the explanation and figure were very clear. It could be helpful to bold beta1 and beta2 in the top panels of Figure 2, so the readers are not searching around for those among all the other betas. It could also be helpful to add an English phrase to the vertical axes inFigures 2C and 2D, in addition to the beta1 and beta2. Something like "overall effect (beta1)" and"environment-controlled effect (beta2)". Or maybe "effect of environment + stochastic expression differences

      (beta1)" and "effect of stochastic expression differences alone (beta2)". I guess those are probably too big to fit on the figure, but it might be nice to have a label somewhere on this figure connecting them to the key thing you are trying to measure - the effect of gene expression and environment.

      Thank you for these suggestions. We increased the font sizes and bolded β1 and β2 in Figure 2A-B. In Figure 2C-D, we added a parenthetical under β1 to say “(env + noise)” and β2 to say “(noise)”. We agree that this should give the reader more intuition about what the β values are measuring.  

      Reviewer #3 (Recommendations for the authors): 

      The authors collected individuals 24 hours after the onset of egg laying for transcriptomic profiling. This is a well-designed experiment to control for the physiological age of the germline. However, this does not properly control for somatic physiological age. Somatic age can be partially uncoupled from germline age across individuals, and indeed, this can be due to differences in maternal age (Perez et al, 2017). This is because maternal age is associated with increased pheromone exposure (unless you properly controlled for it by moving worms to fresh plates), which causes a germline-specific developmental delay in the progeny, resulting in a delayed onset of egg production compared to somatic development (Perez et al. 2021). You control for germline age, therefore, it is likely that the progeny of day 1 mothers are actually somatically older than the progeny of day 3 mothers. This would predict that many genes identified in these analyses might just be somatic genes that increase or decrease their expression during the young adult stage. 

      For example, the abundance of collagen genes among the genes negatively associated (including col-20, which is the gene most significantly associated with early brood) is a big red flag, as collagen genes are known to be changing dynamically with age. If variation in somatic vs germline age is indeed what is driving the expression variation of these genes, then the expectation is that their expression should decrease with age. Vice versa, genes positively associated with early brood that are simply explained by age should be increasing.  So I would suggest that the authors first check this using time series transcriptomic data covering the young adult stage they profiled. If this is indeed the case, I would then suggest using RAPToR ( https://github.com/LBMC/RAPToR ), a method that, using reference time series data, can estimate physiological age (including tissue-specific one) from gene expression. Using this method they can estimate the somatic physiological age of their samples, quantify the extent of variation in somatic age across individuals, quantify how much of the observed differences in expressions are explained just by differences in somatic age and correct for them during their transcriptomic analysis using the estimated soma age as a covariate (https://github.com/LBMC/RAPToR/blob/master/vignettes/RAPToR-DEcorrection-pdf.pdf). 

      This should help enrich a molecular variation that is not simply driven by hidden differences between somatic and germline age. 

      To first address some of the experimental details mentioned for our paper, parents were indeed moved to fresh plates where they were allowed to lay embryos for two hours and then removed. Thus, we believe this minimizes the effects of ascarosides as much as possible within our design. As shown in the paper, we also identified genes that were not driven by parental age and for all genes quantified to what extent each gene’s association was driven by parental age. Thus, it is unlikely that differences in somatic and germline age is the sole explanatory factor, even if it plays some role. We also note that we accounted for egg-laying onset timing in our experimental design, and early brood was calculated as the number of progeny laid in the first 24 hours of egg-laying, where egg-laying onset was scored for each individual worm to the hour. The plot of each worm’s ELO and early brood traits is in Figure S1. Nonetheless, we read the RAPToR paper with interest, as we highlighted in the paper that germline genes tend to be positively associated with early brood while somatic genes tend to be negatively associated. While the RAPToR paper discusses using tissue-specific gene sets to stage genetically diverse C. elegans RILs, the RAPToR reference itself was not built using gene expression data acquired from different C. elegans tissues and is based on whole worms, typically collected in bulk. I.e., age estimates in RILs differ depending on whether germline or somatic gene sets are used to estimate age when the the aging clock is based on N2 samples. Thus, it is unclear whether such an approach would work similarly to estimate age in single worm N2 samples. In addition, from what we can tell, the RAPToR R package appears to implement the overall age estimate, rather than using the tissue-specific gene sets used for RILs in the paper. Because RAPToR would be estimating the overall age of our samples using a reference that is based on fewer samples than we collected here, and because we already know the overall age of our samples measured using standard approaches, we believe that estimating the age with the package would not give very much additional insight.  

      Bonferroni correction: 

      First, I think there is some confusion in how the author report their p-values: I don't think the authors are using a cut-off of Bonferroni corrected p-value of 5.7 x 10-6 (it wouldn't make sense). It's more likely that they are using a Bonferroni corrected p of 0.05 or 0.1, which corresponds to a nominal p value of 5.7 x 10-6, am I right?

      Yes, we used a nominal p-value of 5.7 x 10-6 to correspond to a Bonferroni-corrected p-value of 0.05, calculated as 0.05/8824. We have re-worded this wherever Bonferroni correction was mentioned.

      Second, Bonferroni is an overly stringent correction method that has now been substituted by the more powerful Benjamini Hochberg method to control the false discovery rate. Using this might help find more genes and better characterize the molecular variation, especially the one associated with ELO?

      We agree that Bonferroni is quite stringent and because we were focused on identifying true positives, we may have some false negatives. Because all nominal p-values are included in the supplement, it is straightforward for an interested reader to search the data to determine if a gene is significant at any other threshold.   

      Minor comments: 

      (1) "In our experiment, isogenic adult worms in a common environment (with distinct historical environments) exhibited a range of both ELO and early brood trait values (Fig S1A)" I think this and the figure is not really needed, Figure S1B is already enough to show the range of the phenotypes and how much variation is driven by the life history traits.

      We agree that the information in S1A is also included in S1B, but we think it is a little more straightforward if one is primarily interested in viewing the distribution for a single trait.

      (2) Line 105 It should be Figure S2, not S3.

      Thank you for catching this mistake.

      (3) Gene Ontology on positive and negatively associated genes together: what about splitting the positive and negative?

      We have added a split of positive and negative GO terms to the GO_Terms tab of Supplement File 1. Broadly speaking, the most enriched positively associated genes have many of the same GO terms found on the combined list that are germline related (e.g., involved in oogenesis and gamete generation), whereas the most enriched negatively associated genes have GO terms found on the combined list that are related to somatic tissues (e.g., actin cytoskeleton organization, muscle cell development). This is consistent with the pattern we see for somatic and germline genes shown in Figure 4.

      (4) A lot of muscle-related GOs, can you elaborate on that?

      Yes, there are several muscle-related GOs in addition to germline and epidermis. While we do not know exactly why from a mechanistic perspective these muscle-related terms are enriched, it may be important to note that many of these terms have highly overlapping sets of genes which are listed in Supplementary File 1. For example, “muscle system process” and “muscle contraction” have the exact same set of 15 genes causing the term to be significantly enriched. Thus, we tend to not interpret having many GO terms on a given tissue as indicating that the tissue is more important than others for a given biological process. While it is clear there are genes related to muscle that are associated with early brood, it is not yet clear that the tissue is more important than others.  

      (5) "consistent with maternal age affecting mitochondrial gene expression in progeny " - has this been previously reported?

      We do not believe this particular observation has been reported. It is important to note that these genes are involved in mitochondrial processes, but are expressed from the nuclear rather than mitochondrial genome. We re-worded the quoted portion of the sentence to say “consistent with parental age affecting mitochondria-related gene expression in progeny”.

      (6) PCA: "Therefore, the optimal number of PCs occurs at the inflection points of the graph, which is after only7 PCs for early brood (R2 of 0.55) but 28 PCs for ELO (R2 of 0.56)." 

      Not clear how this is determined: just graphically? If yes, there are several inflection points in the plot. How did you choose which one to consider? Also, a smaller component is not necessarily less predictive of phenotypic variation (as you can see from the graph), so instead of subsequently adding components based on the variance, they explain the transcriptomic data, you might add them based on the variance they explain in the phenotypic data? To this end, have you tried partial least square regression instead of PCA? This should give gene expression components that are ranked based on how much phenotypic variance they explain.  

      Thank you for this thoughtful comment. We agree that, unlike for Figure 3B, there is some interpretation involved on how many PCs is optimal because additional variance explained with each PC is not strictly decreasing beyond a certain number of PCs. Our assessment was therefore made both graphically and by looking at the additional variance explained with each additional PC. For example, for early brood, there was no PC after PC7 that added more than 0.04 to the R2. We could also have plotted early brood and ELO separately and had a different ordering of PCs on the x-axis. By plotting the data this way, we emphasized that the factors that explain the most variation in the gene expression data typically explain most variation in the phenotypic data.  

      (7) The fact that there are 7 PC of molecular variation that explain early brood is interesting. I think the authors can analyze this further. For example, could you perform separate GO enrichment for each component that explains a sizable amount of phenotypic variance? Same for the ELO.  

      Because each gene has a PC loading in for each PC, and each PC lacks the explanatory power of combined PCs, we believe doing GO Terms on the list of genes that contribute most to each PC is of minimal utility. The power of the PCA prediction approach is that it uses the entire transcriptome, but the other side of the coin is that it is perhaps less useful to do a gene-bygene based analysis with PCA. This is why we separately performed individual gene associations and 10-gene predictive analyses. However, we have added the PC loadings for all genes and all PCs to Supplementary File 1.

      (8) Avoid acronyms when possible (i.e. ELO in figures and figure legends could be spelled out to improve readability).

      We appreciate this point, but because we introduced the acronym both in Figure 1 and the text and use it frequently, we believe the reader will understand this acronym. Because it is sometimes needed (especially in dense figures), we think it is best to use it consistently throughout the paper.

      (9) Multiple regression: I see the most selected gene is col-20, which is also the most significantly differentially expressed from the linear mixed model (LMM). But what is the overlap between the top 300 genes in Figure 3F and the 448 identified by the LMM? And how much is the overlap in GO enrichment?

      Genes that showed up in at least 4 out of 500 iterations were selected more often than expected by chance, which includes 246 genes (as indicated by the red line in Figure 3F). Of these genes, 66 genes (27%) are found in the set of 448 early brood genes. The proportion of overlap increases as the number of iterations required to consider a gene predictive increases, e.g., 34% of genes found in 5 of 500 iterations and 59% of genes found in 10 of 500 iterations overlap with the 448 early brood genes. However, likely because of the approach to identify groups of 10 genes that are predictive, we do not find significant GO terms among the 246 genes identified with this approach after multiple test correction. We think this makes sense because the LMM identifies genes that are individually associated with early brood, whereas each subsequent gene included in multiple regression affects early brood after controlling for all previous genes. These additional genes added to the multiple regression are unlikely to have similar patterns as genes that are individually correlated with early brood.  

      (10) Elastic nets: prediction power is similar or better than multiple regression, but what is the overlap between genes selected by the elastic net (not presented if I am not mistaken) and multiple regression and the linear mixed model?

      For the elastic net models, we used a leave-one-out cross validation approach, meaning there were separate models fit by leaving out the trait data for each worm, training a model using the trait data and transcriptomic data for the other worms, and using the transcriptomic data of the remaining worm to predict the trait data. By repeating this for each worm, the regressions shown in the paper were obtained. Each of these models therefore has its own set of genes. Of the 180 models for early brood, the median model selects 83 genes (range from 72 to 114 genes). Across all models, 217 genes were selected at least once. Interestingly, there was a clear bimodal distribution in terms of how many models a given gene was selected for: 68 genes were selected in over 160 out of 180 models, while 114 genes were selected in fewer than 20 models (and 45 genes were selected only once). Therefore, we consider the set of 68 genes as highly robustly selected, since they were selected in the vast majority of models. This set of 68 exhibits substantial overlap with both the set of 448 early brood-associated genes (43 genes or 63% overlap) and the multiple regression set of 246 genes (54 genes or 79% overlap). For ELO, the median model selected 136 genes (range of 96 to 249 genes) and a total of 514 genes were selected at least once. The distribution for ELO was also bimodal with 78 genes selected over 160 times and 255 genes selected fewer than 20 times. This set of 78 included 6 of the 11 significant ELO genes identified in the LMM.  We have added tabs to Supplementary File 1 that include the list of genes selected for the elastic net models as well as a count of how many times they were selected out of 180 models.

      (11) In other words, do these different approaches yield similar sets of genes, or are there some differences?

      In the end, which approach is actually giving the best predictive power? From the perspective of R2, both the multiple regression and elastic net models are similarly predictive for early brood, but elastic net is more predictive for ELO. However, in presenting multiple approaches, part of our goal was identifying predictive genes that could be considered the ‘best’ in different contexts. The multiple regression was set to identify exactly 10 genes, whereas the elastic net model determined the optimal number of genes to include, which was always over 70 genes. Thus, the elastic net model is likely better if one has gene expression data for the entire transcriptome, whereas the multiple regression genes are likely more useful if one were to use reporters or qRTPCR to measure a more limited number of genes.  

      (12) Line 252: "Within this curated set, genes causally affected early brood in 5 of 7 cases compared to empty vector (Figure 4A).

      " It seems to me 4 out of 7 from Figure 4A. In Figure 4A the five genes are (1) cin-4, (2) puf5; puf-7, (3) eef-1A.2, (4) C34C12.8, and (5) tir-1. We did not count nex-2 (p = 0.10) or gly-13 (p = 0.07), and empty vector is the control.

      (13) Do puf-5 and -7 affect total brood size or only early brood size? Not clear. What's the effect of single puf-5 and puf-7 RNAi on brood?

      We only measured early brood in this paper, but a previous report found that puf-5 and puf-7 act redundantly to affect oogenesis, and RNAi is only effective if both are knocked down together(2). We performed pilot experiments to confirm that this was the case in our hands as well.  

      (14)  To truly understand if the noise in expression of Puf-5 and /or -7 really causes some of the observed difference in early brood, could the author use a reporter and dose response RNAi to reduce the level of puf-5/7 to match the lower physiological noise range and observe if the magnitude of the reduction of early brood by the right amount of RNAi indeed matches the observed physiological "noise" effect of puf-5/7 on early brood?

      We agree that it would be interesting to do the dose response of RNAi, measure early brood, and get a readout of mRNA levels to determine the true extent of gene knockdown in each worm (since RNAi can be noisy) and whether this corresponds to early brood when the knockdown is at physiological levels. While we believe we have shown that a dose response of gene knockdown results in a dose response of early brood, this additional analysis would be of interest for future experiments.

      (15) Regulated soma genes (enriched in H3K27me3) are negatively correlated with early brood. What would be the mechanism there? As mentioned before, it is more likely that these genes are just indicative of variation in somatic vs germline age (maybe due to latent differences in parental perception of pheromone).

      We can think of a few potential mechanisms/explanations, but at this point we do not have a decisive answer. Regulated somatic genes marked with H3K27me3 (facultative heterochromatin) are expressed in particular tissues and/or at particular times in development. In this study and others, genes marked with H3K27me3 exhibit more gene expression noise than genes with other marks. This could suggest that there are negative consequences for the animal if genes are expressed at higher levels at the wrong time or place, and one interpretation of the negative association is that higher expressed somatic genes results in lower fitness (where early brood is a proxy for fitness). Another related interpretation is that there are tradeoffs between somatic and germline development and each individual animal lands somewhere on a continuum between prioritizing germline or somatic development, where prioritizing somatic integrity (e.g. higher expression of somatic genes) comes at a cost to the germline resulting in fewer progeny. Additional experiments, including measurements of histone marks in worms measured for the early brood trait, would likely be required to more decisively answer this question.  

      (16) Line 151: "Among significant genes for both traits, β2 values were consistently lower than β1 (Figures 2CD), suggesting some of the total effect size was driven by environmental history rather than pure noise".

      We are interpreting this quote as part of point 17 below.

      (17) It looks like most of the genes associated with phenotypes from the univariate model have a decreased effect once you account for life history, but have you checked for cases where the life history actually masks the effect of a gene? In other words, do you have cases where the effect of gene expression on a phenotype is only (or more) significant after you account for the effect of life history (β2 values higher than β1)?

      This is a good question and one that we did not explicitly address in the paper because we focused on beta values for genes that were significant in the univariate analysis. Indeed, for the sets of 448 early brood genes ad 11 ELO genes, there are no genes for which β2 is larger than β1. In looking at the larger dataset of 8824 genes, with a Bonferroni-corrected p-value of 0.05, there are 306 genes with a significant β2 for early brood. The majority (157 genes) overlap with the 448 genes significant in the univariate analysis and do not have a higher β2 than β1. Of the remaining genes, 72 of these have a larger β2 than β1. However, in most cases, this difference is relatively small (median difference of 0.025) and likely insignificant. There are only three genes in which β1 is not nominally significant, and these are the three genes with the largest difference between β1 and β2 with β2 being larger (differences of 0.166, 0.155, and 0.12). In contrast, the median difference between β1 and β2 the 448 genes (in which β1 is larger) is 0.17, highlighting the most extreme examples of β2 > β1 are smaller in magnitude than the typical case of β1 > β2. For ELO, there are no notable cases where β2 > β1. There are eight genes with a significant β2 value, and all of these have a β1 value that is nominally significant. Therefore, while this phenomenon does occur, we find it to be relatively rare overall. For completeness, we have added the β1 and β2 values for all 8824 genes as a tab in Supplementary File 1.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and its loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data. The study could be further strengthened by a few additional controls and/or analyses.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      (1) The authors performed lipidomics and RTqPCR on whole larvae and larval CNS from which it is impossible to define the cell type-specific effects. Ideally, this could be further supported by performing single cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss.

      We agree that using scRNAseq or pairing FACS-sorting of individual glial subtypes with bulk RNAseq would help tease apart the cell-type specific effects of DEGS1/ifc loss on glial cells. At this time, however, this approach extends beyond the scope of the current paper and means of the lab. 

      (2) It's clear from the data that the accumulation of dihydroceramide in the ER triggers ER expansion but it remains unclear how or why this happens. Additionally, the authors assume that, because of the reduction in LD numbers, that the source of fatty acids comes from the LDs. But there is no data testing this directly.

      As CERT, the protein that transports ceramide from the ER to the Golgi, is far more efficient at transporting ceramide than dihydroceramide, we speculate that dihydroceramide accumulates in the ER due to inefficient transport from the ER to the Golgi by CERT. We state this model more explicitly in the results under the subheading “Reduction of dihydroceramide synthesis suppresses the ifc CNS phenotype”.

      We agree with the point on lipid droplet. We observe a correlation, not a causation, between reduction of lipid droplets and a large expansion of ER membrane. We have tried to clarify the text in the last paragraph of the discussion to make this point more clearly. See also response to reviewer 2 point 3. 

      (3) The authors performed a beautiful EMS screen identifying several LOF alleles in ifc. However, the authors decided to only use KO/ifcJS3. The paper could be strengthened if the authors could replicate some of the key findings in additional fly lines.

      We agree. We replicated the observed cortex glia swelling, ER expansion in cortex glia, and observed increase in neuronal cell death markers in late-third instar larvae mutant for either the ifcjs1 or ifcjs2 allele. These data are now provided as Supplementary Figure 7.

      (4) The authors use M{3xP3-RFP.attP}ZH-51D transgene as a general glial marker. However, it would be advised to show the % overlap between the glial marker and the RFP since a lot of cells are green positive but not per se RFP positive and vice versa.

      We visually reexamined the expression of the 3xP3 RFP transgene relative to FABP labeling for cortex glia, Ebony for astrocyte-like glia, and the Myr-GFP transgene driven by glial-subtype specific GAL4 driver lines for perineurial, subperineurial, and ensheathing glia. We note that RFP localizes to the nucleus cytoplasm while FABP and Ebony localize to the cytoplasm and Myr-GFP to the cell membrane. Thus, an observed lack of overlap of expression between RFP and the other markers can arise to differential localization of the two markers in the same cells (see, for example, Fig. S2D where Myr-GFP expression in the nuclear envelope encircles that of RFP in the nucleus. Through visual inspection of five larval-brain complexes for each glial subtype marker, we found that essentially all cortex, SPG, and ensheathing glia expressed RFP. Similarly, nearly all astrocyte-like glia also expressed RFP, but they expressed RFP at significantly lower levels than that observed for cortex, SPG, or ensheathing glia. This analysis also confirmed that most perineurial glia do not express RFP. The 3xP3 M{3xP3-RFP.attP}ZH-51D transgene then labels most glia in the Drosophila CNS. We have added text to Supplementary Figure 2 noting the above observations as to which glial cells express RFP. 

      (5) The authors indicate that other 3xP3 RFP and GFP transgenes at other genomic locations also label most glia in the CNS. Do they have a preferential overlap with the different glial subtypes?

      We assessed three different types of 3xP3 RFP and GFP transgenes: M{3xP3RFP.attp} transgenes (n=4), Mi{GFP[E.3xP3]=ET1} transgenes (n=3), and

      Tl{GFP[3xP3.cLa]=CRIMIC.TG4} transgenes (n>6). All labeled cortex glia, but different lines exhibited differential labeling of astrocyte and ensheathing glia. These data are now included as Supplementary Figure 3.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.

      The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:

      Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear.

      Weaknesses:

      (1) The authors clearly demonstrated a reduction in number of glia in the larval brains of ifc mutant flies. What remains unclear is whether ifc loss leads to glial apoptosis or a failure for glia to proliferate during development. The authors should distinguish between these two hypotheses using apoptotic markers and cell proliferation markers in glia.

      To address this point, we used phospho-histone H3 to assess mitotic index in the thoracic CNS of wild-type versus ifc mutant late third instar larvae and found a mild, but significant reduction in mitotic index in ifc mutant relative to wild-type nerve cords. We also assessed the ability of glial-specific expression of the potent anti-apoptotic gene p35 to rescue the observed loss of cortex glia phenotype in the thoracic region of the CNS of otherwise ifc mutant larvae and observed a clear increase in cortex glia in the presence versus the absence of glial-specific p35 expression (p<3 x 10-4). These data are now provided as Supplementary Figure S8 in the paper and referred to on page 8.

      (2) It is surprising that human DEGS1 expression in glia rescues the noted phenotypes despite the different preference for sphingoid backbone between flies and mammals. Though human DEGS1 rescued the glial phenotypes described, can animal lethality be rescued by glial expression of human DEGS1? Are there longer-term effects of loss of ifc that cannot be compensated by the overexpression of human DEGS1 in glia (age-dependent neurodegeneration, etc.)?

      We note explicitly that while glial expression of human DEGS1 does provide rescuing activity, it only partially rescues the ifc mutant CNS phenotype in contrast to glial expression of Drosophila ifc, which fully rescues this phenotype. Thus, the relative activity of human DEGS1 is far below that of Drosophila ifc when assayed in flies. To quantify the functional difference between the two transgenes, we assessed the ability of glial expression of fly ifc or of human DEGS1 to rescue the lethality of otherwise ifc mutant larvae: Glial expression of ifc was sufficient to rescue the adult viability of 57.9% of ifc mutant flies based on expected Mendelian ratios (n=2452), whereas glial expression of DEGS1 was sufficient to rescue just 3.9% of ifc mutant flies (n=1303), uncovering a ~15-fold difference in the ability of the two transgenes to rescue the lethality of otherwise ifc mutant flies. In the absence of either transgene, no ifc mutant larvae reached adulthood (n=1030). These data are now provided in the text on page 9 of the revised manuscript. 

      (3) The mechanistic link between the loss of ifc and lipid droplet defects is missing. How do defects in ceramide metabolism alter triglyceride utilization and storage? While the author's argument that the loss of lipid droplets in larval glia will lead to defects in neuronal ensheathment, a discussion of how this is linked to ceramides needs to be added.

      We have revised the text to address this point. We speculate that the apparent increased demand for membrane phospholipid synthesis may drive the depletion of lipid droplets, providing a link to ifc function and ceramides. Below we provide the rewritten last paragraph; the underlined section is the new text.  

      “The expansion of ER membranes coupled with loss of lipid droplets in ifc mutant larvae suggests that the apparent demand for increased membrane phospholipid synthesis may drive lipid droplet depletion, as lipid droplet catabolism can release free fatty acids to serve as substrates for lipid synthesis. At some point, the depletion of lipid droplets, and perhaps free fatty acids as well, would be expected to exhaust the ability of cortex glia to produce additional membrane phospholipids required for fully enwrapping neuronal cell bodies. Under wild-type conditions, many lipid droplets are present in cortex glia during the rapid phase of neurogenesis that occurs in larvae. During this phase, lipid droplets likely support the ability of cortex glia to generate large quantities of membrane lipids to drive membrane growth needed to ensheathe newly born neurons. Supporting this idea, lipid droplets disappear in the adult Drosophila CNS when neurogenesis is complete and cortex glia remodeling stops. We speculate that lipid droplet loss in ifc mutant larvae contributes to the inability of cortex glia to enwrap neuronal cell bodies. Prior work on lipid droplets in flies has focused on stress-induced lipid droplets generated in glia and their protective or deleterious roles in the nervous system. Work in mice and humans has found that more lipid droplets are often associated with the pathogenesis of neurodegenerative diseases, but our work correlates lipid droplet loss with CNS defects. In the future, it will be important to determine how lipid droplets impact nervous system development and disease.”

      (4) On page 10, the authors use the words "strong" and "weak" to describe where ifc is expressed. Since the use of T2A-GAL4 alleles in examining gene expression is unable to delineate the amount of gene expression from a locus, the terms "broad" and "sparse" labeling (or similar terms) should be used instead.

      The ifc T2A-GAL4 insert in the ifc locus reports on the transcription of the gene. We agree that GAL4 system will not reflect amount of gene expression differences when the expression levels are not dramatically different. However, when the expression levels differ dramatically, as in our case, GAL4 system can reflect this difference in the expression of a reporter gene.  We reworded this section to suggest that ifc is transcribed at higher levels in glia as compared to neurons. We can’t use sparse or broad, as ifc is expressed in all, or at least in most, glia and neurons. The new text is as follows:” Using this approach, we observed strong nRFP expression in all glial cells (Figures 4D and S10A) and modest nRFP expression in all neurons (Figures 4E and S10B), suggesting ifc is transcribed at higher levels in glial cells than neurons in the larval CNS.”  

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:

      This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

      Weaknesses:

      I didn't find any obvious weakness.

      Reviewer #1 (Recommendations For The Authors):

      Additional minor comments below:

      (1) The authors state that TGs are the building blocks of membrane phospholipids. This is not exactly true. The breakdown of TGs can result in free FAs which can be used for membrane phospholipid synthesis. Also, membrane phospholipids can also be generated from free FAs that were never in TGs.

      To address this point, we have reworked a number of sentences in the text. On page 12 we reworded two small sections to the following: 

      “In the CNS, lipid droplets form primarily in cortex glia[29] and are thought to contribute to membrane lipid synthesis through their catabolism into free fatty acids versus acting as an energy source in the brain.[41] Consistent with the possibility that increased membrane lipid synthesis drives lipid droplet reduction, RNA-seq assays of dissected nerve cords revealed that loss of ifc drove transcriptional upregulation of genes that promote membrane lipid biogenesis”

      As TG breakdown results in free fatty acids that can be used for membrane phospholipid synthesis, we asked if changes in TG levels and saturation were reflected in the levels or saturation of the membrane phospholipids phosphatidylcholine (PC), phosphatidylethanolamine (PE), and phosphatidylserine (PS).

      (2) Figure 5J what does the dotted line indicate? Please specify in the figure legend or remove it.

      We have added the following text in the figure legend: Dotted line indicates a log2 fold change of 0.5 in the treatment group compared to the control group.

      (3) The text for your graphs is hard to read. Please make the font larger.

      We have increased font size to enhance the readability of the figures.

      (4) The authors mentioned that driving ifc expression in neurons rescues the phenotypes (ref 17). While the glial-specific role presented in this study is robust. I think some readers would appreciate some discussion of this study in light of the data presented here.

      We have added the below text on page 10 to address this point.

      “Results of our gene rescue experiments conflict with a prior study on ifc in which expression of ifc in neurons was found to rescue the ifc phenotype. In this context, we note that elav-GAL4 drives UASlinked transgene expression not just in neurons, but also in glia at appreciable levels, and thus needs to be paired with repo-GAL80 to restrict GAL4-mediated gene expression to neurons. Thus, “off-target” expression in glial cells may account for the discrepant results. It is, however, more difficult to reconcile how neuronal or glial expression of ifc would rescue the observed lethality of the ifc-KO chromosome given the presence additional lethal mutations in the 21E2 region of the second chromosome.”

      (5) While the analysis of fatty acid saturation is experimentally well done. I'm not really sure what the significance of this data is.

      We included this information as a reference for future analysis of additional genes in the ceramide biogenesis pathway, as we expect that alteration of the levels and saturation levels of PE, PC, and PS in cell membranes may underlie key changes in the biophysical properties of glial cell membranes and their ability to enwrap or infiltrate their targets. Thus, we expect the significance of these data to grow as more work is done on additional members of the ceramide pathway in the nervous system in flies and other systems.  

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a typo at the top of page 11: "internal membranes and fail enwrap neurons" is missing the word "to" before "enwrap"

      The typo was fixed.

      (2)  PMID: 36718090 should be included in the discussion of SPT and ORMDL complex in human disease.

      The reference was added.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      In summary, this manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions. I have no additional comments and fully support the publication of this manuscript in eLife.

      The authors also note that they added one paragraph to the discussion that addresses the possibility that the increased detection of cell death markers could arise due to the inability of glial cells to remove cellular debris. The text of this paragraph is provided below:

      We note that cortex glia are the major phagocytic cell of the CNS and phagocytose neurons targeted for apoptosis as part of the normal developmental process.23-26  Thus, while we favor the model that ifc triggers neuronal cell death due to glial dysfunction, it is also possible that increased detection of dying neurons arises due at least in part to a decreased ability of cortex glia to clear dying neurons from the CNS. At present, the large number of neurons that undergo developmentally programmed cell death combined with the significant disruption to brain and ventral nerve cord morphology caused by loss of ifc function render this question difficult to address.Additional evidence does, however, support the idea that loss of ifc function drives excess neuronal cell death: Clonal analysis in the fly eye reveals that loss of ifc drives photoreceptor neuron degeneration17, indicating that loss of ifc function drives neuronal cell death; cortex-glia specific depletion of CPES, which acts downstream of ifc, disrupts neuronal function and induces photosensitive epilepsy in flies59, indicating that genes in the ceramide pathway can act nonautonomously in glia to regulate neuronal function; recent genetic studies reveal that other glial cells can compensate for impaired cortex glial cell function by phagocytosing dying neurons62, and we observe that the cell membranes of subperineurial glia enwrap dying neurons in ifc mutant larvae (Fig. S14), consistent with similar compensation occurring in this background, and in humans, loss of function mutations in DEGS1 cause neurodegeneration.7-9 Clearly, future work is required to address this question for ifc/DEGS1 and perhaps other members of the ceramide biogenesis pathway.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors revisit the specific domains/signals required for the redirection of an inner nuclear membrane protein, emerin, to the secretory pathway. They find that epitope tagging influences protein fate, serving as a cautionary tale for how different visualisation methods are used. Multiple tags and lines of evidence are used, providing solid evidence for the altered fate of different constructs.

      Strengths:

      This is a thorough dissection of domains and properties that confer INM retention vs secretion to the PM/lysosome, and will serve the community well as a caution regarding the placement of tags and how this influences protein fate.

      Weaknesses:

      Biogenesis pathways are not explored experimentally: it would be interesting to know if the lysosomal pool arrives there via the secretory pathway (eg by engineering a glycosylation site into the lumenal domain) or by autophagy, where failed insertion products may accumulate in the cytoplasm and be degraded directly from cytoplasmic inclusions.

      This manuscript is a Research Advance that follows previous work that we published in eLife on this topic (Buchwalter et al., eLife 2019; PMID 31599721). In that prior publication, we showed that emerin-GFP arrives at the lysosome by secretion and exposure at the PM, followed by internalization. While we state these previous findings in this manuscript, we did not explicitly restate here how we came to that conclusion. In the 2019 study, we (i) engineered in a glycosylation site, which demonstrated that emerin-GFP receives complex, Endo H-resistant N-glycans, indicating passage through the Golgi; (ii) performed cell surface labeling, which confirmed that emerin accesses the PM; and interfered with (iii) the early secretory pathway using brefeldin A and with (iv) lysosomal function using bafilomycin A1. Further, we ruled out autophagy as a major contributor to emerin trafficking by treating cells with the PI3K inhibitor KU55933, which had no effect on emerin’s lysosomal delivery.

      It would be helpful if the topology of constructs could be directly demonstrated by pulse-labelling and protease protection. It's possible that there are mixed pools of both topologies that might complicate interpretation.

      We demonstrate that emerin’s TMD inserts in a tail-anchored orientation (C terminus in ER lumen) by appending a GFP tag to either the N or C terminus, followed by anti-GFP antibody labeling of unpermeabilized cells (Fig. 1G). This shows the preferred topology of emerin’s wild type TMD.

      As the reviewer points out, it is possible that our manipulations of the TMD sequence (Fig. 2D-E) alter its preferred topology of membrane insertion. We addressed this question by performing anti-GFP and anti-emerin antibody labeling of the less hydrophobic TMD mutant (EMD-TMDm-GFP) after selective permeabilization of the plasma membrane (Figure 2 supplement, panel F). If emerin biogenesis is normal, the GFP tag should face the ER lumen while the emerin antibody epitope should be cytosolic. If the fidelity of emerin’s membrane insertion is impaired, the GFP tag could be exposed to the cytosol (flipped orientation), which would be detected by anti-GFP labeling upon plasma membrane permeabilization. We find that the C-terminal GFP tag is completely inaccessible to antibody when the PM is selectively permeabilized with digitonin, but is readily detected when all intracellular membranes are permeabilized with Triton-X-100. These data confirm that mutating emerin’s TMD does not disrupt the protein’s membrane topology.

      Reviewer #2 (Public review):

      In this manuscript, Mella et al. investigate the effect of GFP tagging on the localization and stability of the nuclear-localized tail-anchored (TA) protein Emerin. A previous study from this group showed that C-terminally GFP-tagged Emerin protein traffics to the plasma membrane and reaches lysosomes for degradation. It is suggested that the C-terminal tagging of tail-anchored proteins shifts their insertion from the post-translational TRC/GET pathway to the co-translational SRP-mediated pathway. The authors of this paper found that C-terminal GFP tagging causes Emerin to localize to the plasma membrane and eventually reach lysosomes. They investigated the mechanism by which Emerin-GFP moves to the secretory pathway. By manipulating the cytosolic domain and the hydrophobicity of the transmembrane domain (TMD), the authors identify that an ER retention sequence and strong TMD hydrophobicity contribute to Emerin trafficking to the secretory pathway. Overall, the data are solid, and the knowledge will be useful to the field. However, the authors do not fully answer the question of why C-terminally GFP-tagged Emerin moves to the secretory pathway. Importantly, the authors did not consider the possible roles of GFP in the ER lumen influencing Emerin trafficking to the secretory pathway.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) The authors suggest that an ER retention sequence and high hydrophobicity of Emerin TMD contribute to its trafficking to the secretory pathway. However, these two features are also present in WT Emerin, which correctly localizes to the inner nuclear membrane. Additionally, the authors show that the ER retention sequence is normally obscured by the LEM domain. The key difference between WT Emerin and Emerin-GFP is the presence of GFP in the ER lumen. The authors missed investigating the role of GFP in the ER lumen in influencing Emerin trafficking to the secretory pathway. It is likely that COPII carrier vesicles capture GFP protein in the lumen as part of the bulk flow mechanism for transport to the Golgi compartment. The authors could easily test this by appending a KDEL sequence to the C-terminus of GFP; this should now redirect the protein to the nucleus.

      We agree with the reviewer’s point that the presence of lumenal GFP somehow promotes secretion of emerin from the ER, likely at the stage of enhancing its packaging into COPII vesicles. We struggle to think about how to interpret the KDEL tagging experiment that the reviewer proposes, as the KDEL receptor predominantly recycles soluble proteins from the Golgi to the ER, while emerin is a membrane protein; and we have shown that emerin already contains a putative COPI-interacting RRR recycling motif in its cytosolic domain.

      Nevertheless, we agree with the reviewer that it is worthwhile to test the possibility that addition of GFP to emerin’s C-terminus promotes capture by COPII vesicles. We have evaluated this question by performing temperature block experiments to cause cargo accumulation within stalled COPII-coated ER exit sites, then comparing the propensity of various untagged and tagged emerin variants to enrich in ER exit sites as judged by colocalization with the COPII subunit Sec31a. These data now appear in Figure 4 supplement 1. These experiments indicate that emerin-GFP samples ER exit sites significantly more than does untagged emerin. Further, the ER exit site enrichment of emerin-GFP is dampened by shortening emerin’s TMD. We do not see further enrichment of any emerin variant in ER exit sites when COPII vesicle budding is stalled by low temperature incubation, implying that emerin lacks any positive sorting signals that direct its selective enrichment in COPII vesicles. Altogether, these data indicate that both emerin’s long and hydrophobic TMD and the addition of a lumenal GFP tag increase emerin’s propensity to sample ER exit sites and undergo non-selective, “bulk flow” ER export.

      (2) The authors nicely demonstrate that the hydrophobicity of Emerin TMD plays a role in its secretory trafficking. I wonder if this feature may be beneficial for cells to degrade newly synthesized Emerin via the lysosomal pathway during mitosis, as the nuclear envelope breakdown may prevent the correct localization of newly synthesized Emerin. The authors could test Emerin localization during mitosis. Such findings could add to the physiological significance of their findings. At the minimum, they should discuss this possibility.

      We thank the reviewer for this insightful suggestion. It is attractive to speculate that secretory trafficking might enable lysosomal degradation of emerin during mitosis, when its lamin anchor has been depolymerized. However, we think it is unlikely that mitotic trafficking contributes significantly to the turnover flux of untagged emerin; if it did, we would expect to see higher steady state levels and/or slowed turnover of emerin mutants that cannot traffic to the lysosome. We did not observe this outcome. Instead, mutations that enhance (RA) or impair (TMDm) emerin trafficking had no effect on the untagged protein’s steady-state levels (Fig. 4G).

      Minor concerns:

      (1) On page 7, the authors note that "FLAG-RA construct was not poorly expressed relative to WR, in contrast with RA-GFP (Figures S3C, 2I)." The expression levels of these proteins cannot be compared across two different blots.

      We apologize for this confusion; we were implying two distinct comparisons to internal controls present on each blot. We have adjusted the text to read “FLAG-RA construct was not poorly expressed relative to FLAG-WT (Fig. S3C) in contrast to RA-GFP compared to WT-GFP (Fig. 2I).”

      (2) In the first paragraph of the discussion, the authors suggest that aromatic amino acids facilitate trafficking to lysosomes. However, they only replaced aromatic amino acids with alanine residues. If they want to make this claim, they should test other amino acids, particularly hydrophobic amino acids such as leucine.

      The reviewer may be inferring more import from our statement than we intended. We focused on these aromatic residues within the TMD because they contribute strongly to its overall hydrophobicity. Experimentally, we determined that nonconservative alanine substitutions of these aromatic residues inhibited trafficking. We do not state and do not intend to imply that the aromatic character of these residues specifically influences trafficking propensity, and we agree with the reviewer that to test such a question would require additional substitutions with non-aromatic hydrophobic amino acids.

      We realize that our phrasing may have been misleading by opening with discussion of the aromatic amino acids; in the revised discussion paragraph, we instead lead with discussion of TMD hydrophobicity, and then state how the specific substitutions we made affect trafficking.

      Reviewing Editor comments:

      While reviewer 1 did not provide any recommendations to the authors, I agree with this reviewer that the authors should validate the topology of their tagged proteins (at least for the one used to draw key conclusions). Given that Emerin is a tail-anchored protein, having a big GFP tag at the C-terminus could mess up ER insertion, causing the protein to take a wrong topology or even be mislocalized in the cytosol, particularly under overexpression conditions. In either case, it can be subject to quality control-dependent clearance via either autophagy, ERphagy, or ER-to-lysosome trafficking. I think that the authors should try a few straightforward experiments such as brefeldin A treatment or dominant negative Sar1 expression to test whether blocking conventional ER-to-Golgi trafficking affects lysosomal delivery of Emerin. I also think that the authors should discuss their findings in the context of the RESET pathway reported previously (PMID: 25083867). The ER stress-dependent trafficking of tagged Emerin to the PM and lysosomes appears to follow a similar trafficking pattern as RESET, although the authors did not demonstrate that Emerin traffic to lysosomes via the PM. In this regard, they should tone down their conclusion and discuss their findings in the context of the RESET pathway, which could serve as a model for their substrate.

      We agree that validating the topology of TMD mutants is important, and now include these experiments in the revised manuscript (please see our response to Reviewer 1 above).

      Please see our response to Reviewer 1’s public review; we previously determined that emerin-GFP undergoes ER-to-Golgi trafficking (see our 2019 study).

      We recognize the major parallels between our findings and the RESET pathway. In our 2019 study, we found that similarly to other RESET cargoes, emerin-GFP travels through the secretory pathway, is exposed at the PM, and is then internalized and delivered to lysosomes. We discussed these strong parallels to RESET in our 2019 study. In this revised manuscript, we now also point out the parallels between emerin trafficking and RESET and cite the 2014 study by Satpute-Krishnan and colleagues (PMID 25083867)

    1. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of-season timing?

    1. Reviewer #3 (Public review):

      Summary:

      In this study, Hall and colleagues investigate how the coupling of activity from ACC to CA1is altered by fear learning, showing that during sleep immediately before learning, there is evidence for increased coupling of ACC activity with neurons that will subsequently be inhibited during the learning process. They go on to show that this effect seems to be mediated most by a subpopulation of neurons in the superficial layer of CA1. This fits with previous reports suggesting that these superficial neurons are key for the flexible updating of memory. The authors then go on to show that artificial activation of ACC using optogenetics results in varied effects in CA1, including a subtle decrease in activity of superficial neurons that lasts longer than the stimulus itself. Finally, the authors present some preliminary data suggesting that different interneurons may be recruited by this optogenetic stimulation in different ways and at different times.

      Overall, this is an interesting paper, but much of the analysis is very preliminary, and much of the crucial data about the learning effects and alterations to cell firing are not presented clearly and fully. This is further confounded by a rather opaque description of the results and analysis in the text. Overall, there is something very interesting here, but there needs to be a substantial series of extra analyses to clearly say what this is. In many cases, more robust analysis may render the results underpowered, which could dramatically change the conclusions of the paper.

      Strengths:

      The authors performed difficult, dual-location recordings across a multi-day learning paradigm, which seems like it could be a really nice dataset. They delve into the circuit basis of an interesting finding regarding ACC to CA1 connectivity and how this changes before and after fear conditioning. They provide data to suggest this connectivity may be through specific and distinct subcircuits in CA1.

      Weaknesses:

      (1) There is essentially no information in the text or figures about what the actual learning was, how it was done, how individual animals performed, and how any of these metrics related to learning. Looking at the methods, the authors did a number of things never mentioned anywhere in the text or figures, including novel arena exposure, contextual reexposure in extinction after learning, etc. It seems that this is a very rich dataset that has not been presented at all. I would recommend at the very least:<br /> a) Plot all of the behavioural training data, and how each mouse relates to one another - did the mice learn? At this stage, we don't know!<br /> b) Explain in the text in detail exactly what was done and why, and what this tells us about the neuronal activity.<br /> c) If there is variance in learning and or conditioning, does this relate to features in the analysis, such as the GLM result.

      (2) Along similar lines, a key metric for most of the paper is that neurons most coupled with ACC are more likely to be inhibited during training. However, there is nothing anywhere in the paper showing these data. How do neurons in general respond to contextual shocks? The methods describe this as the average firing rate during training, normalised to pre-sleep activity. This metric seems a bit coarse and may obscure really important task-relevant dynamics. Are the neurons active at specific times, are they tuned to relevant parts of the task, and do any of these features of the cell activity also relate to the coupling with ACC? Similarly, how did the authors mitigate the influence of electrical artefacts caused by the foot shock in their recordings? Again, there is a huge amount of data here that is not being described, and likely holds very valuable information about what is actually happening. The paper would really benefit from the inclusion of these data in an accessible form, such as heatmaps of spiking, how these patterns change over time, and around e.g., foot shock, etc. Also key is how these features are altered by the variability of learning across subjects.

      (3) A number of the effects are presented by comparing a statistically significant effect to a non-statistically significant effect (e.g. in Figure 2b, Figure 2d, Figure 4 b,c, and others). This isn't really valid - the key test that the two groups are different is either with a direct test of the difference or an interaction term in an e.g., ANOVA test. In some places, I am not sure the same conclusions will be drawn from the data with these tests.

      (4) To what extent is defining superficial and deep CA1 neurons solely by ripple waveform an accepted method? Of the two papers referenced for this approach, one is a 2-photon calcium imaging paper that does not do electrical recordings (as far as I am aware), and the second uses this as a descriptor after defining the positions of units on an array. It would be good to clarify how accepted this is, and also how robust this is. At the very least, some kind of metric or walkthrough in the supplement as to how this was done, and how well each cell was classified and with what confidence, or some metric of how distinct and separate the two populations were (or was it just a smudge).

      (5) In the optogenetic experiment in Figure 5, the effect on the CA1 sup neurons seems to be driven by changes in a small subpopulation of this group, with no change in the others. Related to point 2, is there anything else in the data that can pull out what these cells are? More detailed analysis of the firing of these neurons might pull out something really interesting.

      (6) Related to this - a number of comparisons simply pool neurons across mice and analyse them as if independent. This is done a lot in the past, but it would be better if an approach that included the interdependence of neurons recorded from the same mouse at the same time were used (such as a hierarchical model). While this is complex, a simpler approach would just be to plot the summary data also per mouse. For example, in Figure 5, how do the neurons inhibited by ACC activation spread across the different mice? Is the level of inhibition related to how well the mice learned the CS-US association?

      (7) Figure 6 is interesting, but very preliminary. None of the effects are quantified, and one of the cell types is not identified. I think some proper analysis needs to be done, again across mice, to be able to draw conclusions from these data.

      (8) Finally, in general, I felt that the way the paper was written was very hard to follow, often relying on very processed levels of analysis that were hard to relate back to the raw traces and their biological meaning. In general taking more words to really simply and fully explain each analysis, and taking the words and figures to walk through how each analysis was done and what it tells us about the neuronal data/biology would be really beneficial, especially to someone who is not an extracellular electrophysiologist or immersed in the immediate field.

      In summary, while this manuscript explores an intriguing hypothesis about pre-learning circuit dynamics, it is currently held back by insufficient clarity in behavioural analysis, data presentation, and statistical quantification. Addressing these core issues would greatly improve interpretability and confidence in the findings.

    1. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether hyperaltruistic preference is modulated by decision context and tested how oxytocin (OXT) may modulate this process. Using an adapted version of a previously well-established moral decision-making task, healthy human participants in this study undergo decisions that gain more (or lose less, termed as context) meanwhile inducing more painful shocks to either themselves or another person (recipient). The alternative choice is always less gain (or more loss) meanwhile less pain. Through a series of regression analyses, the authors reported that hyperaltruistic preference can only be found in the gain context but not in the loss context, however, OXT reestablished the hyperaltruistic preference in the loss context similar to that in the gain context.

      Strengths:

      This is a solid study that directly adapted a previously well-established task and the analytical pipeline to assess hyperaltruistic preference in separate decision contexts. Context-dependent decisions have gained more and more attention in literature in recent years, hence this study is timely. It also links individual traits (via questionnaires) with task performance, to test potential individual differences. The OXT study is done with great methodological rigor, including pre-registration. Both studies have proper power analysis to determine the sample size.

      Weaknesses:

      Despite the strengths, multiple analytical decisions have to be explained, justified, or clarified. Also, there is scope to enhance the clarity and coherence of the writing - as it stands, readers will have to go back and forth to search for information. Last, it would be helpful to add line numbers in the manuscript during the revision, as this will help all reviewers to locate the parts we are talking about.

      Introduction:<br /> (1) The introduction is somewhat unmotivated, with key terms/concepts left unexplained until relatively late in the manuscript. One of the main focuses in this work is "hyperaltruistic", but how is this defined? It seems that the authors take the meaning of "willing to pay more to reduce other's pain than their own pain", but is this what the task is measuring? Did participants ever need to PAY something to reduce the other's pain? Note that some previous studies indeed allow participants to pay something to reduce other's pain. And what makes it "HYPER-altruistic" rather than simply "altruistic"? Plus, in the intro, the authors mentioned that the "boundary conditions" remain unexplored, but this idea is never touched again. What do boundary conditions mean here in this task? How do the results/data help with finding out the boundary conditions? Can this be discussed within wider literature in the Discussion section? Last, what motivated the authors to examine decision context? It comes somewhat out of the blue that the opening paragraph states that "We set out to [...] decision context", but why? Are there other important factors? Why decision context is more important than studying those others?

      Experimental design:<br /> (2) The experiment per se is largely solid, as it followed a previously well-established protocol. But I am curious about how the participants got instructed? Did the experimenter ever mention the word "help" or "harm" to the participants? It would be helpful to include the exact instructions in the SI.

      (3) Relatedly, the experimental details were not quite comprehensive in the main text. Indeed, Methods come after the main text, but to be able to guide readers to understand what was going on, it would be very helpful if the authors could include some necessary experimental details at the beginning of the Results section.

      Statistical analysis<br /> (3) One of the main analyses uses the harm aversion model (Eq1) and the results section keeps referring to one of the key parameters of it (ie, k). However, it is difficult to understand the text without going to the Methods section below. Hence it would be very helpful to repeat the equation also in the main text. A similar idea goes to the delta_m and delta_s terms - it will be very helpful to give a clear meaning of them, as nearly all analyses rely on knowing what they mean.

      (4) There is one additional parameter gamma (choice consistency) in the model. Did the authors also examine the task-related difference of gamma? This might be important as some studies have shown that the other-oriented choice consistency may differ in different prosocial contexts.

      (5) I am not fully convinced that the authors included two types of models: the harm aversion model and logistic regression models. Indeed, the models look similar, and the authors have acknowledged that. But I wonder if there is a way to combine them? For example:<br /> Choice ~ delta_V * context * recipient (*Oxt_v._placebo)<br /> The calculation of delta_V follows Equation 1.<br /> Or the conceptual question is, if the authors were interested in the specific and independent contribution of dalta_m and dalta_s to behavior, as their logistic model did, why the authors examine the harm aversion first, where a parameter k is controlling for the trade-off? One way to find it out is to properly run different models and run model comparison. In the end, it would be beneficial to only focus on the "winning" model to draw inferences.

      (6) The interpretation of the main OXT results needs to be more cautious. According to the operationalization, "hyperaltruistic" is the reduction of pain of others (higher % of choosing the less painful option) relative to the self. But relative to the placebo (as baseline), OXT did not increase the % of choosing the less painful option for others, rather, it decreased the % of choosing the less painful option for themselves. In other words, the degree of reducing other's pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT. I think this needs to be unpacked, and some of the wording needs to be changed. I am not very familiar with the OXT literature, but I believe it is very important to differentiate whether OXT is doing something on self-oriented actions vs other-oriented actions. Relatedly, for results such as that in Fig5A, it would be helpful to not only look at the difference, but also the actual magnitude of the sensitivity to the shocks, for self and others, under OXT and placebo.

      Comments on revisions:

      I did not change my original public review, as I think it can still be helpful for the field to see the reasoning and argument.

      For the revision, the authors have done a thorough job of addressing my previous comments and questions.

      The only aspect I would like to ask is that, it would still be great to have a clear definition of hyperaltruism. As it stands, hyperaltruism refers to "people's willingness to pay more to reduce other's pain than<br /> their own pain", ie, this means the "hyper" bit is considered with respect to "self". But shouldn't hyperaltruism be classified contrasting "normal" altruism?

      It is fine that it follows a previously published work (Crockett et al., 2014), but it would still be necessary to explain/define the construct being tested in a standalone fashion rather than letting readers to go back to the original work.

    2. Reviewer #3 (Public review):

      Summary:

      In this study, the authors aimed to index individual variation in decision-making when decisions pit the interests of the self (gains in money, potential for electric shock) against the interests of an unknown stranger in another room (potential for unknown shock). In addition, the authors conducted an additional study in which male participants were either administered intranasal oxytocin or placebo before completing the task to identify the role of oxytocin in moderating task responses. Participants' choice data was analyzed using a harm aversion model in which choices were driven by the subjective value difference between the less and more painful options.

      Strengths:

      Overall, I think this is a well-conducted, interesting, and novel set of research studies exploring decision-making that balances outcomes for the self versus a stranger, and the potential role of the hormone oxytocin (OT) in shaping these decisions. The pain component of the paradigm is well designed, as is the decision-making task, and overall the analyses were well suited to evaluating and interpreting the data. Advantages of the task design include the absence of deception, e.g., the use of a real study partner and real stakes, as a trial from the task was selected at random after the study and the choice the participant made were actually executed. 

      Weaknesses:

      The primary weakness of the paper concerns its framing. Although it purports to be measuring "hyper-altruism," which is the same term used in prior similar (although not identical) designs, I do not believe the task constitutes altruism, but rather the decision to engage, or not engage, in instrumental aggression.

      I continue to believe that when in the "other" trials the only outcome possible for the study partner is pain, and the only outcome possible for the participant is monetary gain, these trials measure decisions about instrumental aggression. That is the exact definition of instrumental aggression is: causing others harm for personal gain. Altruism is not equivalent to refraining from engaging in instrumental aggression, although some similar mechanisms may support both. True altruism would be to accept shocks to the self for the other's benefit (e.g., money).  The interpretation of this task as assessing instrumental aggression is supported by the fact that only the Instrumental Harm subscale of the OUS was associated with outcomes in the task, but not the Impartial Benevolence subscale. By contrast, the IB subscale is the one more consistently associated with altruism (e.g,. Kahane et al 2018; Amormino at al, 2022) I believe it is important for scientific accuracy for the paper, including the title, to be rewritten to reflect what it is testing.

      Although I recognize similar tasks have been previously characterized as "hyper-altruism" I do not believe that is sufficient justification for continuing to promulgate this descriptor without any caveats. I hope the authors will engage more seriously with the idea that this is what the task is measuring.

      Relatedly, in the introduction, I believe it would be important to discuss the non-symmetry of moral obligations related to help/harm--we have obligations not to harm strangers but no obligation to help strangers. This is another reason I do not think the term "hyper altruism" is a good description for this task--given it is typically viewed as morally obligatory not to harm strangers, choosing not to harm them is not "hyper" altruistic (and again, I do not view it as obviously altruism at all).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statement

      *Our lab was totally destroyed on June 15th by an Iranian missile. All stocks, equipment and reagents were lost. While we performed many of the experiments requested by the reviewers, unfortunately some were never completed. We thank you for your understanding. *

      We thank the three reviewers for their thoughtful comments and useful suggestions on how to improve our paper. Some of the reviewers claimed that the paper is “preliminary”. We would like to highlight that in our opinion “preliminary” has two possible meanings in this context: 1) the data does not yet support the claims that the authors wrote; 2) the story is short and should be extended. While we totally agree that type 1 “preliminary” should be addressed (and we have addressed that to the best of our abilities), type 2 “preliminary” is a matter of scope, the length of the paper/project and the publication home. We believe that this story, which has been led by an outstanding master’s student (and as such has had a limited timespan) is worthwhile of publication in its current scope.

      2. Point-by-point description of the revisions

      Reviewers’ comments are in BLUE while our responses are in BLACK.

      Reviewer 1 Summary: This study reports a role for matrix metalloproteinases (MMPs) in the developmental pruning of gamma Kenyon cells (KCs) in the fruit fly Mushroom Body during larval-pupal metamorphosis. The authors show through gene expression studies that MMP genes are upregulated in late larval stages as part of the early program for this type of neuronal pruning. They show through cell-targeted RNAi studies of both secreted MMP-1 and membrane-anchored MMP-2, that both genes are required in glial cells and to a lesser extent within KCs.

      Both MMPs have secreted and membrane-anchored isoforms and we did not assess whether the secreted/anchored isoforms are involved; e.g. see LaFever et al. 2017.

      The authors show that MMP secreted from glial is required for normal levels of Mushroom Body developmental neuronal pruning. They mention that MMP genes have been identified in schizophrenic patient screens in patients, and that perhaps a comparable pruning mechanism could be involved in the loss of grey matter (loss of synapses) in patients. The authors propose that MMP levels may be a potential therapeutic marker in the future.

      We thank the reviewer for his comments. We find it important to clarify that we do not think our work suggests that the MMPs levels may be a potential therapeutic marker without much additional work in the future. In the original text we added a claim from another paper suggesting MMPs as therapeutic target. However, due to the arising confusion, we decided to delete this statement from the text (original line 198). We also added a general disclaimer towards the end of the discussion regarding the genetic power of Drosophila but its limited implication into human health (new lines 276-278).

      Major Comments: Overall, the work is of a reasonable standard, but very preliminary

      Please see general note on two types of “preliminary” – we thank the reviewer for helping us substantiate our claims and strengthen our paper but we do not plan to significantly increase its scope.

      The study lacks the substance to completely convince me of any of the results. There is SUBSTANTIAL work that needs to be done to make this publishable. There are a lot of writing mistakes; so many that I do not list them in detail here

      We are not absolutely sure that we understand to which mistakes this reviewer is eluding. However, we carefully rewrote the manuscript, streamlined many of our claims and added many new and more recent references.

      The references citations are fairly old, but I do not list update replacements here

      Thanks – we added many newer and relevant citations.

      The text is very brief, and the overall writing needs to include significantly more description and detail

      We have included more descriptions and details, as will be elaborated later on, but – again - this is a short report and will remain as such.

      This is evident in all aspects of the manuscript, but especially notable in the Methods and Figure Legends

      Thanks for raising this comment, which was reverberated also by other reviewers – we have now included more details, with a particular focus on the genotypes (Table 2), that somehow were erroneously not included in the original submission, as well as more detailed figure legends.

      None of the Figure Legends include full genotypes of any of the fly lines, and these full fly lines are also not included in the Methods. This is vital to compare the experimental lines to the controls

      True – our apologies for this mistake, we now added the full genotypes in Table 2.

      Major points are listed below:

      1. Figure 2: It is important to note of the specific age of animals in these images when talking about the loss of genes in development. Are all the animals age-matched? High levels of synaptic pruning occur post-eclosion), and it is important to understand when these pruning defects occur. It is mentioned that that overlap for the gene expression data is upregulated during 6-18h APF is this when these images are taken? This is very important in the context of pruning as SCZ symptom presentation is very late relative to these early events.

      We thank the reviewer for this comment which suggests we were not clear enough in our description. We do not claim to have generated an SCZ model and have clarified this better in the text (lines 275-278). Furthermore, axon pruning happens during pupal development, but in all the main figures in this manuscript we dissected young adult flies (3-5 days post eclosion) and show the remnants of unpruned axons (as we have done in numerous studies). To make sure that initial development occurred normally, we also include larval brains in the Figure S7. We now clarified the fact that we are imaging adult brains as a readout to investigate whether pruning occurred during metamorphosis or not (line 124-126).

      1. Figure 2: In the figure legend, it is indicated that the arrows are unpruned axons, however in the controls these areas appear to be highly innervated. Further explanation is needed about the context of the arrows, as there are clear visual differences between these images and the controls, but they appear to have a more expansive phenotype than "unpruned axons". The data does not match the visual representation in comparison to the control.

      We apologize for this confusion. Unfortunately, the driver which we use to label the γ-axons, R71G10-QF2, is not absolutely specific to the γ type KCs but also expressed (sometimes) in the ɑ/β KCs. As the ɑ/β axons are very stereotypic in shape and also express high levels of FasII (which we stain for), we can easily distinguish between the ɑ lobe and unpruned γ axons. To clarify this point, we now clearly demarcate all lobes in the control images and specifically the ɑ lobe in all panels. Additionally, we added new schemes in Figure 2A and 2O to better clarify the anatomy and experimental design.

      1. Figure 2: There needs to be more descriptive definitions and clarifications to the defects labeled in panel K. This could be done in the figure legend, but it would be more useful to label the images provided. For example, if Mmp2 is a "mild pruning affect, put that in the pie chart somewhere, to help guide the description of the phenotype to what those confocal images look like.

      We understand that the pie chart in Figure 2 was confusing and therefore simplified it in the current version (Fig. 2B and 2P). Also, thanks to this great point, we now include a new Figure S3 that includes examples for the ranking categories, which were now performed by two independent investigators in a blind manner.

      Figure 3: The time points of the images of the Mushroom Body (MB) are vital to understanding the process and regulation of these genes.

      Please see our comment to point #1 – unless specifically stated otherwise, all images are MBs of adult flies, as now clearly mentioned in the figure legends, in the text and in the Material and Methods section.

      1. Figure 3D: Significant description of this graph needs to be added for clarity. What parameters separate each phenotypic defect? Labeling the images and showing images that belong in different groups would be very helpful and improve the paper significantly.

      We now included a new Figure S3 (also see our response to comment #3).

      1. Figure S1: Additional experiments would help answer the strength of the phenotype for the ALG-Gal 4 driver. The authors need to perform the rescue experiment. Use a MMP-2 null and then drive it back in the ALG-GAL4 to see if this is sufficient to rescue the neuron pruning. This also isolates the mechanisms to one subtype of glia.

      These are excellent suggestions that are, unfortunately, not doable. To perform a rescue experiment, one would need a viable loss-of-function phenotype of an Mmp2 mutant. There is one published Mmp2 loss-of-function null allele which is lethal during pupal development (Page-McCaw et al, 2003). Our previous data, using tissue specific (ts)CRISPR, suggested the involvement of Mmp2 in neurons for their remodeling (Meltzer et al, 2019). We therefore independently generated an Mmp2 germline mutant using CRISPR (harboring an indel resulting in a premature stop codon and predicted to encode a truncated, 77 amino-acid long protein), now described in Fig. S5A (and in the Materials and Methods). This allele is, as expected, unfortunately also lethal. We attempted to overcome lethality by generating MARCM (mosaic) clones in neurons, but as expected, because Mmp2 is largely secreted, there was no pruning defect phenotype (Fig. S5B-C). Unfortunately, it is not yet possible to generate glial clones.

      Figure 3 and 4: The other glial subtypes need to be analyze to make any conclusion about their involvement, as well as the involvement of the astrocytes. Running these exact same experiments on the cortex glial and ensheathing glia will provide essential insight into what glial subtype is involved. The presumed lack of phenotypes in these other glial subtypes will also strengthen the argument that the astrocytes are specifically involved in this process. These are vital experiments.

      We currently limited our analysis (and conclusions) to astrocytes. Despite the fact that this experiment is beyond our initial scope, we obtained reagents and performed preliminary experiments (using the R77A03-Gal4 driver for cortex glia, and the R83E12-Gal4 for ensheathing glia). In both cases, we observed extremely mild pruning defects, not comparable to those with Repo- or Alrm-Gal4. In these preliminary experiments we lacked a proper control, and now, unfortunately, due to the loss of our lab, we are unable to complete these experiments in a reasonable amount of time.

      1. Figure 4: Again, description of the phenotypes and examples of these would improve the quality of this figure substantially.

      Absolutely agree – see our response to comment #3 (and Fig. S3).

      1. Figure 5: An improvement on the quantifications of these phenotypes would strengthen the paper substantially. More detailed description of the phenotypes and how they related to the control would significantly improve the overall quality of the work.

      Thanks again for highlighting that we neglected to include the full genotypes that are now added (Table 2). We also thank the reviewer for raising the point regarding quantification. First, we generated a new Fig. S3A-E to show examples of the ranking by two independent rankers. Second, ranking was performed by looking at TdTomato positive vertical axons that are outside of the ɑ lobe (high FasII) – this is now better explained in the materials and methods. Additionally, while we would love to have a better scoring, and automatic, system – and even published a semi-automated scoring algorithm in Alyagor et al. 2018 (Figure 3O in the Alyagor paper), because the driver also labels vertical axons (ɑ/β) and because unpruned γ axons often express FasII, this quantification method does not always work. What we have done in previous cases, as we have also done here, is to provide independent ranking by two investigators and compare their ranking (Fig. S3F-G). Finally, we are working with our AI hub to develop automatic scoring systems that will not require human ranking – however this is beyond the scope for this manuscript.

      Minor Comments: 1. Figure 1A: I would suggest labeling the KC (gamma) and potentially one of the others (a/B, a'/B') to orient the reader to the differences between these two subsets of the KCs, and to emphasize which neurons are undergoing pruning and where the cell bodies are and where the axons project.

      Thanks for the suggestions – we now better annotated the scheme in Figure 1A as well as additional schematics in Figure 2 and, finally, better annotations in selected panels. Specifically, the ɑ lobe is outlined in magenta throughout all relevant panels.

      1. Figure 1C: This panel needs further labeling to explain the findings in the heat map. Labeling some of the genes that were found and where they were would be helpful. This could also be done in the figure legend, however without any further labeling or context the heatmap is confusing.

      We apologize for the incomplete figure. We did not want to overload the figure with data, which is why we are showing only the important clusters and did not include gene names. To keep the figure simple, but at the same time provide the complete information, we now include the full data in Fig. S1 (that includes the original heatmap with all the dynamic clusters I-IX, and including all the gene names). For the full raw data, including non-dynamic clusters, the reader is referred to look in Supplemental excel file 1. We hope this provides the clarity that this reviewer rightfully asks for.

      1. Figure 3B,C: The full genotypes need to be labeled. What is the exact genotype used for the control?

      The full genotypes of all figure panels are now included in Table 2 in the Materials and Methods.

      1. Figure S1: The stock number for the ALG-GAL4 is missing, there are multiple different drivers, therefore this could be helpful in understanding this phenotype, as some are better than others.

      Indeed, Alrm-Gal4 comes on two chromosomes – we used BDSC #67032, which is on chromosome III and this is now clearly mentioned the Materials and Methods section.

      1. Figures 3 and 4: Labeling needs to remain consistent; Figure 3 "Glia-Gal4", Figure 4 "glia-gal4".

      Thanks, done.

      Reviewer #1 (Significance (Required)):

      General Assessment: An interesting study on MMP function during an unusual type of neural development (axon pruning). Most of the MMP function appears to be in glia, although the MMP role in this context in unclear. The MMP function in the neurons being pruned is unexpected and even less clear. The study is somewhat poorly described in terse language lacking essential information, which gives the overall impression of a preliminary report.

      Advance: Glial MMP function has been described for neuronal clearance mechanisms following injury. The main advance here is to describe a similar function during normal development. Audience: Developmental neuroscientists, MMP biologists, possibly schizophrenia clinician researchers

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Neuropsychiatric conditions are often influenced by genetic factors. Schizophrenia is a complex mental disorder characterised by a mixture of hallucinations, delusions and disorganised thinking that causes lifelong problems in daily life. GWAS have identified a number of genes associated with the risk of developing schizophrenia, although genetic predisposition alone is not sufficient and additional environmental factors are required. In the current manuscript, the authors aim to exploit the strength of the Drosophila system to explore a link between schizophrenia-associated genes and neuronal remodelling during development. They focus on the mushroom body in the adult brain, where pronounced neuronal remodelling occurs during metamorphosis. To assess the potential role of the genes identified by the GWAS, they performed a targeted RNAi-based screen. They focus on the role of metalloproteases and find that they are required in neurons and in glia for the pruning of mushroom body axons. The study starts with a selection of 32 genes, 29 of which are listed (a bit hidden) in materials and methods and the identification of the Drosophila orthologs. The expression patterns of these genes in Kenyon cells are presented in Figure 1 - but unfortunately no information is given on who is expressed when

      We apologize for the confusion. We attempted to keep Figure 1 simple but this resulted in the absence of critical information, as the reviewer suggests. We now include a Figure S1 that includes the entire heatmap of the dynamically expressed clusters I-IX with all the gene names. Additionally, we now augmented the information in Table 1 to include the screen phenotypes. Finally, Supplemental excel file 1, also included in our original submission, includes all the data, and is now better referred to throughout the text.

      In a next step, Kenyon cell specific RNAi knockdown experiments are shown that identify a pruning phenotype for several genes. They demonstrate that Mmp2 (and similarly Mmp1) is also required in glia. Although Mmp2 was identified by neuronal RNAi-based knockdown, double knockdown experiments led the authors conclude that its primary function is in glia. The study emphasises the use of the advanced genetic model to understand complex human diseases. However, the paper does not go far enough in making use of the excellent genetics available. Basically, the report is about the identification of a few hits in a small RNAi screen, which is fine in itself, but leaves many questions unanswered. Do mmp1/2 mutants have a phenotype?

      This is a very important question that cannot be answered, unfortunately. There is one published Mmp2 loss of function null allele which is lethal during pupal development (Page-MaCaw et al, 2003). Our previous data, using tissue specific (ts)CRISPR, suggested the involvement of Mmp2 in neurons for their remodeling (Meltzer et al, 2019). We therefore independently generated an Mmp2 germline mutant using CRISPR (harboring an indel resulting in a premature stop codon and predicted to encode a truncated, 77 amino-acid long protein), now described in Fig. S5A (and in the Materials and Methods). This allele is, as expected, unfortunately also lethal. We attempted to overcome lethality by generating MARCM (mosaic) clones in neurons, but as expected, because Mmp2 is largely secreted, there was no pruning defect phenotype (Fig. S5B-C). Unfortunately, it is not yet possible to generate glial clones. Additionally, available Mmp1 mutants are, sadly, also homozygous lethal. That said, in our revised manuscript we now include data demonstrating that expression of a dominant negative variant of Mmp1 inhibits pruning (Fig. 3J-K). We strengthened the evidence regarding the reliability of Mmp1 RNAi using an antibody mix (Fig. S4), and for Mmp2 – we refer to a manuscript that tested its efficiency (Harmansa et al., 2023). Lastly, we added new data using an additional RNAi line targeting Mmp2 from the VDRC collection (Fig. 3L).

      Can the phenotype be rescued?

      Unfortunately, without a viable mutant LOF phenotype, a rescue experiment is impossible. Regardless, in an attempt to rescue the RNAi phenotype, we designed and generated an RNAi-resistant Mmp2 overexpression transgene. Unfortunately, due to the destruction of our lab – several days after we received this transgenic line from Bestgene – this experiment is not included in the revision.

      Does TIMP expression lead to similar phenotypes?

      This is an interesting question which we addressed in our experiments but did not include in the text. Unfortunately, overexpression of TIMP did not have any effect on MB development. We are adding this figure here as Reviewer Figure 1, but we think that adding this information to the paper will not improve it for several reasons. The lack of phenotype by overexpression of Timp can result from a technical issue such as low expression or mislocalization of the protein, or a biological issue such as more complicated involvement of TIMP or other MMP inhibitors.

      What is the temporal requirement for Mmp1/2?

      This is an excellent suggestion, not an easy experiment, but one that we initiated, using a temperature sensitive Gal80 to control the expression of the RNAi only during metamorphosis. However, to the unfortunate destruction of our lab, this experiment was never completed.

      What are the target proteins of Mmp2?

      This is the million-dollar question – but unfortunately is beyond the scope of this short report.

      Is Mmp2 still required when astrocyte motility is blocked? What is the morphology of glia after Mmp1/2 knockdown?

      Thank you for this wonderful suggestion. We initiated two types of experiments using sparse labeling techniques (both MARCM and SPARC) to identify the morphology of single astrocytes in WT vs. MMP KD. However, these are complicated crosses that were not completed prior to the destruction of our lab.

      Reviewer #2 (Significance (Required)):

      The strength of the study is to identify a pruning phenotype after RNAi-based knockdown. The limitations is that this study is very superficial, it is the beginning of a paper. The initial claim to use Drosophila because to its advanced genetics is not met. The results section is shorter than the discussion.

      While we agree with much of the reviewer’s statement this also relates to our general comment about “preliminary” type 1 and type 2 – True, this could be the beginning of a big paper and it would definitely be a more comprehensive and deep story. Most of the papers from my lab are indeed a 5 year endeavor. However, this short report (which is now longer, more detailed, and includes additional experiments) is a result of the work of an outstanding master’s student who came up with the idea for the project entirely by herself. Thus – given the data that she has acquired, and the fact that my lab will not continue to study MMPs or schizophrenia, the question needs to be whether the data supports the claims and whether this is an advance of science worthwhile of publication in a respectable journal. Our clear and decisive opinion is that the answer to that question is yes.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work, Schuldiner and colleagues explore the role of Mmp1 and Mmp2 in neuronal remodeling in the mushroom body of Drosophila. Overall, this work is very interesting, but in its current form seems quite preliminary. The biggest limitation of the study is that single RNAi lines are used with no validation that the lines are working, despite the fact that Mmp antibodies are available as are endogenously tagged Mmp lines that could have been used to validate the genetic manipulations. Specific concerns are listed below.

      We thank reviewer 3 for his generally positive assessment of our work and we now performed additional experiments to strengthen and validate the original RNAi findings – for specifics see our reply to the points below.

      Major concerns 1) The scoring system for pruning of mushroom body neurons seems very variable, even in controls (where scoring can range from very mild to moderate), and it is very hard to assess from the images what one is looking at (rather than using our own judgment, we rely on the authors' words). It would be necessary to have better labeling and examples of what phenotypes are considered "mild", "severe", "wild type-like". It would also help to understand how phenotype assessment is guided by the overlap between the signals from TdTomato fluorescence and FasII stain.

      We thank the reviewer for raising this point, that has also been highlighted by other reviewers in some form. First, we have generated Figure S3A-E to show examples of the ranking, which was now performed by two independent investigators. Second, ranking was performed by looking at TdTomato positive vertical axons that are outside of the αlobe (high FasII) – this is now better explained in the materials and methods. Additionally, while we would love to have a better scoring, and automatic, system – and even published a semi-automated scoring algorithm in Alyagor et al. 2018 (Figure 3O in the Alyagor paper), because the driver also labels vertical axons (ɑ/β) and because unpruned γ axons often express FasII, this quantification method does not always work. What we have done in previous cases, as we have also done here, is to provide independent ranking by two investigators and compare their ranking (Fig. S3F-G). Finally, we are working with our AI hub to develop automatic scoring systems that will not require human ranking – however this is beyond the scope for this manuscript.

      2) The biggest limitations of the approach are that single RNAi lines are used to screen, with no accompanying validation of the tool (see above)

      We agree. Unfortunately not all RNAis are “equal” and thus not all of them work. To support the RNAi data, we have better clarified previous experiments that demonstrate the importance of neuronal Mmp2 via tissue specific (ts) CRISPR (Meltzer, et al, 2019). Unfortunately, the Mmp2 null mutant that is available is lethal during pupal development (Page-MaCaw et al, 2003). We therefore independently generated an Mmp2 germline mutant using CRISPR (harboring an indel resulting in a premature stop codon and predicted to encode a truncated, 77 amino-acid long protein), now described in Fig. S5A (and in the Materials and Methods). This allele is, as expected, unfortunately also lethal. We attempted to overcome lethality by generating MARCM (mosaic) clones in neurons, but as expected, because Mmp2 is largely secreted, there was no pruning defect phenotype (Fig. S5B-C). Unfortunately, it is not yet possible to generate glial clones. Additionally, available Mmp1 mutants are, sadly, also homozygous lethal. That said, in our revised manuscript we now include data demonstrating that expression of a dominant negative variant of Mmp1 inhibits pruning (Fig. 3J-K). We strengthened the evidence regarding the reliability of Mmp1 RNAi using an antibody mix (Fig. S4), and for Mmp2 – we refer to a manuscript that tested its efficiency (Harmansa et al., 2023). Lastly, we added new data using an additional RNAi line targeting Mmp2 from the VDRC collection (Fig. 3L).

      3) RNAi-based knockdown is used to infer epistatic information-this is not appropriate as epistasis experiments need to be done with null alleles to make firm conclusions. Additional concerns: ● Even with the same driver, knockdown efficiency for 2 different genes could be variable and dependent of the specific RNAi used. ● The comparison between drivers is even harder, as driver strength varies greatly. ● The knockdown efficiency drops with increasing numbers of RNAi used. ● The specific genotypes used for this experiment should be clarified, as it would be very important to ensure that the UAS dosage is equal across conditions.

      We agree that RNAi is not optimal to assess epistasis. And indeed, we did not mean to claim epistasis relationship between Mmp1 and Mmp2, nor between neurons and glia. We now use better language to clarify this. To define epistatic relationships, the use of mutants would be required, unfortunately the use of nulls is not possible because they are lethal and secreted (thus not enabling mosaic analyses). We agree that increasing the number of RNAi lines is expected to reduce their efficiency – this is why it is even more significant when we see an increased defective phenotype in the double knockdown experiments. Finally, we totally agree about the genotype comment and apologize that it was erroneously omitted in the original submission– all of which have been now added (Table 2 in materials and methods).

      4) To further deepen the rigor of this work, a few simple yet important things could have been done. First, it would be important to rule out that knocking down Mmps does not affect astrocyte numbers and health (could be assessed by counting numbers and observing their morphology). Also, the authors previously showed that astrocytes actively infiltrate the axon bundle prior to pruning to facilitate axon defasciculation and pruning (Marmor-Kollet et al., 2023). It would have provided an important insight to examine if astrocytes can infiltrate the axon bundle if Mmp2 and/or Mmp1 are knocked down.

      Thank you for these wonderful suggestions. We embarked on a few experiments as detailed below, unfortunately these are complicated crosses that were not completed prior to the destruction of our lab. 1) We initiated two types of experiments using sparse labeling techniques (both MARCM and SPARC) to identify the morphology of single astrocytes in WT vs. MMP KD. 2) Testing astrocytic infiltrations requires three binary systems, we obtained and generated stocks required for these experiments, but these were prematurely terminated. 3) We initiated experiments to count the number of glial nuclei in the vicinity of the degenerating axonal lobe (at the onset of pruning). Preliminary experiments with a small n (3 controls, 4 Mmp1 RNAi, and 5 Mmp2 RNAi) suggest that the number of glial nuclei is not significantly different between these conditions.

      Minor The introduction puts big emphasis on the role of glia, but then to narrows down candidate genes for the screen a γ-KCs transcriptional data set is used, and the initial screen is done via knockdown of those candidates in neurons (there is a disconnect between rationale and approach).

      We totally agree with this reviewer which is why we now changed the paper to include both neuronal and glial loss-of-function screens. Figure 1 is now augmented with the glial data.

      Rationale for looking into axon pruning and how that translates into insights about synaptic pruning defects in schizophrenia should be more clearly stated.

      Indeed, our belief that synapse pruning and axon pruning share molecular mechanisms remains yet unproven. However, both are steps during neuronal remodeling, which has been previously implicated in schizophrenia. That said, we now added an additional disclaimer to acknowledge the limitation of our findings in the context of human disease and synapse elimination (lines 275-279).

      Figure 1C: data visualization for this heat map should be improved. Parts of the data are faded, and the differences between gene clusters are unclear.

      We apologize for the incomplete figure. We did not want to overload the figure with data, which is why we are showing only the important clusters and did not include gene names. To keep the figure simple, but at the same time provide the complete information, we now include the full data in Fig. S1 (that includes the original heatmap with all the dynamic clusters I-IX, and including all the gene names). For the full raw data, including non-dynamic clusters, the reader is referred to look in Supplemental excel file 1. We hope this provides the clarity that this reviewer rightfully asks for.

      Reviewer #3 (Significance (Required)):

      In this work, Schuldiner and colleagues explore the role of Mmp1 and Mmp2 in neuronal remodeling in the mushroom body of Drosophila. Overall, this work is very interesting, but in its current form seems quite preliminary. The biggest limitation of the study is that single RNAi lines are used with no validation that the lines are working, despite the fact that Mmp antibodies are available as are endogenously tagged Mmp lines that could have been used to validate the genetic manipulations.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for providing us the opportunity to revise our manuscript titled “Identifying regulators of associative learning using a protein-labelling approach in C. elegans.” We appreciate the insightful feedback that we received to improve this work. In response, we have extensively revised the manuscript with the following changes: we have (1) clarified the criteria used for selecting candidate genes for behavioural testing, presenting additional data from ‘strong’ hits identified in multiple biological replicates (now testing 26 candidates, previously 17), (2) expanded our discussion of the functional relevance of validated hits, including providing new tissue-specific and neuron class-specific analyses, and (3) improved the presentation of our data, including visualising networks identified in the ‘learning proteome’, to better highlight the significance of our findings. We also substantially revised the text to indicate our attempts to address limitations related to background noise in the proteomic data and outlined potential refinements for future studies. All revisions are clearly marked in the manuscript in red font. A detailed, point-by-point response to each comment is provided below.

      1. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Rahmani et al., utilize the TurboID method to characterize the global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, Rahmani et al., uncover 706 proteins that are tagged by the TurboID method specifically in samples extracted from worms that underwent the memory inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP-kinase and cAMP-mediated pathways. The authors then screen a representative group of the hits from the proteome analysis. The authors find that mutants of candidate genes from the MAP-kinase pathway, namely dlk-1 and uev-3, do not affect the performance in the learning paradigm. Instead multiple acetylcholine signaling mutants significantly affected the performance in the associative memory assay, e.g., acc-1, acc-3, gar-1, and lgc-46. Finally, the authors demonstrate that the acetylcholine signaling mutants did not exhibit a phenotype in similar but different conditioning paradigms, such as aversive salt-conditioning or appetitive odor conditioning, suggesting their effect is specific to appetitive salt conditioning.

      Major comments:

      1. The statistical approach and analysis of the behavior assay: The authors use a 2-way ANOVA test which assumes normal distribution of the data. However, the chemotaxis index used in the study is bounded between -1 and 1, which prevents values near the boundaries to be normally distributed.

      Since most of the control data in this assay in this study is very close to 1, it strongly suggests that the CI data is not normally distributed and therefore 2-way ANOVA is expected to give skewed results.

      I am aware this is a common mistake and I also anticipate that most conclusions will still hold also under a more fitting statistical test.

      We appreciate the point raised by Reviewer 1 and understand the importance of performing the correct statistical tests.

      The statistical tests used in this study were chosen since parametric tests, particularly ANOVA tests to assess differences between multiple groups, are commonly used to assess behaviour in the C. elegans learning and memory field. Below is a summary of the tests used by studies that perform similar behavioural tests cited in this work, as examples:

      Table 1 | A summary for the statistical tests performed by similar studies for chemotaxis assay data. References (listed in the leftmost column) were observed to (A) use parametric tests only or (B) performed either a parametric or non-parametric test on each chemotaxis assay dataset depending on whether the data passed a normality test. Listings for ANOVA tests are in bold to demonstrate their common use in the C. elegans learning and memory field.

      Reference

      Parametric test/s used in the reference

      Non-parametric test/s used in the reference

      Beets et al., 2020

      Two-way ANOVA

      None

      Hiroki & Iino 2022

      One-way ANOVA

      None

      Hiroki et al., 2022

      One-way ANOVA

      None

      Hukema et al., 2006

      T-tests

      None

      Hukema et al., Learn. Mem. 2008

      T-tests

      None

      Jang et al., 2019

      ANOVA

      None

      Kitazono et al., 2017

      Two-way ANOVA and t-tests

      None

      Lans et al., 2004

      One-way ANOVA

      None

      Lim et al., 2018

      Two-way ANOVA

      Wilcoxon rank sum test adjusted with the Benjamini–Hochberg method

      Lin et al., 2010

      Two-way or three-way ANOVA

      None

      Nagashima et al., 2019

      One-way ANOVA

      None

      Ohno et al., 2014

      None

      Sakai et al., 2017

      One-way ANOVA or t-tests

      None

      Stein & Murphy 2014

      Two-way ANOVA and t-tests

      None

      Tang et al., 2023

      One-way ANOVA or t-tests

      None

      Tomioka et al., 2006

      T tests

      None

      Watteyne et al., 2020

      One-way ANOVA

      Two-sided Kruskal–Wallis

      We note Reviewer 1's concern that this may stem from a common mistake. As stated, Two-way ANOVA generally relies on normally distributed data. We used GraphPad Prism to perform the Shapiro-Wilk normality test on our chemotaxis assay data as it is generally appropriate for sample sizes Table 2 | Shapiro-Wilk normality test results for chemotaxis assay data in Figure S8C. Chemotaxis assay data was generated to assess salt associative learning capacity for wild-type (WT) versus lgc-46(-) mutant C. elegans. Three experimental groups were prepared for each C. elegans strain (naïve, high-salt control, and trained). From top-to-bottom, the data below displays the ‘W’ value, ‘P value’, a binary yes/no for whether the data passes the Shapiro-Wilk normality test, and a ‘P value summary’ (ns = non-significant). W values measure the similarity between a normal distribution and the chemotaxis assay data. Data is considered normal in the Shapiro-Wilk normality test when a W value is near 1.0 and the null hypothesis is not rejected (i.e., P value > 0.05).*

      WT naïve

      WT high-salt control

      WT trained

      lgc-46 naïve

      lgc-46 high-salt control

      lgc-46 trained

      W

      0.9196

      0.9114

      0.8926

      0.8334

      0.8151

      0.8769

      P value

      0.5272

      0.4758

      0.3705

      0.1475

      0.1070

      0.2954

      Passed normality test (alpha=0.05)?

      Yes

      Yes

      Yes

      Yes

      Yes

      Yes

      P value summary

      ns

      ns

      ns

      ns

      ns

      ns

      The manuscript now includes the use of the Shapiro-Wilk normality test to assess chemotaxis assay data before using two-way ANOVA on page 51.

      Nevertheless an appropriate statistical analysis should be performed. Since I assume the authors would wish to take into consideration both the different conditions and biological repeats, I can suggest two options:

      • Using a Generalized linear mixed model, one can do with R software.
      • Using a custom bootstrapping approach. We thank Reviewer 1 for suggesting these two options. We carefully considered both approaches and consulted with the in-house statistician at our institution (Dr Pawel Skuza, Flinders University) for expert advice to guide our decision. In summary:

      • Generalised linear mixed models: Generalised linear mixed models (GLMMs) are generally most appropriate for nested/hierarchal data. However, our chemotaxis assay data does not exhibit such nesting. Each biological replicate (N) consists of three technical replicates, which are averaged to yield a single chemotaxis index per N. Our statistical comparisons are based solely on these averaged values across experimental groups, making GLMMs less applicable in this context.

      • __Bootstrapping: __Based on advice from our statistician, while bootstrapping can be a powerful tool, its effectiveness is limited when applied to datasets with a low number of biological replicates (N). Bootstrapping relies on resampling existing data to simulate additional observations, which may artificially inflate statistical power and potentially suggest significance where the biological effect size is minimal or not meaningful. Increasing the number of biological replicates to accommodate bootstrapping could introduce additional variability and compromise the interpretability of the results. The total number of assays, especially controls, varies quite a bit between the tested mutants. For example compare the acc-1 experiment in Figure 4.A., and gap-1 or rho-1 in Figure S4.A and D. It is hard to know the exact N of the controls, but I assume that for example, lowering the wild type control of acc-1 to equivalent to gap-1 would have made it non significant. Perhaps the best approach would be to conduct a power analysis, to know what N should be acquired for all samples.

      We thoroughly evaluated performing the power analysis: however, this is typically performed with the assumption that an N = 1 represents a singular individual/person. An N =1 in this study is one biological replicate that includes hundreds of worms, which is why it is not typically employed in our field for this type of behavioural test.

      Considering these factors, we have opted to continue using a two-way ANOVA for our statistical analysis. This choice aligns with recent publications that employ similar experimental designs and data structures. Crucially, we have verified that our data meet the assumptions of normality, addressing key concerns regarding the suitability of parametric testing. We believe this approach is sufficiently rigorous to support our main conclusions. This rationale is now outlined on page 51.

      To be fully transparent, our aim is to present differences between wild-type and mutant strains that are clearly visible in the graphical data, such that the choice of statistical test does not become a limiting factor in interpreting biological relevance. We hope this rationale is understandable, and we sincerely appreciate the reviewer’s comment and the opportunity to clarify our analytical approach.

      We hope that Reviewer 1 will appreciate these considerations as sufficient justification to retain the statistical tests used in the original manuscript. Nevertheless, to constructively address this comment, we have performed the following revisions:

      1. __Consistent number of biological replicates: __We performed additional biological replicates of the learning assay to confirm the behavioural phenotypes for the key candidates described (KIN-2 , F46H5.3, ACC-1, ACC-3, LGC-46). We chose N = 5 since most studies cited in this paper that perform similar behavioural tests do the same (see the table below). Table 3 | A summary for sample sizes generated by similar studies for chemotaxis assay data. References (listed in the leftmost column) were observed to the sample sizes (N) below corresponding to biological replicates of chemotaxis assay data. N values are in bold when the study uses N ≤ 5.

      Reference

      N used in the study for chemotaxis assay data

      Beets et al., 2020

      8

      Hiroki & Iino 2022

      5-8

      Hiroki et al., 2022

      6-7

      Hukema et al., 2006

      ≥ 4

      Hukema et al., Learn. Mem. 2008

      ≥ 4

      Jang et al., 2019

      ≥ 4

      Kitazono et al., 2017

      ≥ 4

      Kauffman et al., 2010

      ≥ 3

      Kauffman et al., J. Vis. Exp. 2011

      ≥ 3

      Lans et al., 2004

      2

      Lim et al., 2018

      2-4

      Lin et al., 2010

      ≥ 4

      Nagashima et al., 2019

      ≥ 7

      Ohno et al., 2014

      ≥ 11

      Sakai et al., 2017

      ≥ 4

      Stein & Murphy 2014

      3-5

      Tang et al., 2023

      ≥ 9

      Watteyne et al., 2020

      ≥ 10

      __Grouped presentation of behavioural data: __We now present all behavioural data by grouping genotypes tested within the same biological replicate, including wild-type controls, rather than combining genotypes tested separately. This ensures that each graph displays data from genotypes sharing the same N, also an important consideration for performing parametric tests. Accordingly, we re-performed statistical analyses using this reduced Nfor relevant graphs. As anticipated, this rendered some comparisons non-significant. All statistical comparisons are clearly indicated on each graph. Improved clarity of figure legends: __We revised figure legends for __Figures 5, 6, S7, S8, & S9 to make clear how many biological replicates have been performed for each genotype by adding N numbers for each genotype in all figures.

      The authors use the phrasing "a non-significant trend", I find such claims uninterpretable and should be avoided. Examples: Page 16. Line 7 and Page 18, line 16.

      This is an important point. While we were not able to find the specific phrasing "a non-significant trend" from this comment in the original manuscript, we acknowledge that referring to a phenotype as both a trend and non-significant may confuse readers, which was originally stated in the manuscript in two locations.

      The main text has been revised on pages 27 & 28 when describing comparisons between trained groups between two C. elegans lines, by removing mentions of trends and retaining descriptions of non-significance.

      Neuron-specific analysis and rescue of mutants:

      Throughout the study the authors avoid focusing on specific neurons. This is understandable as the authors aim at a systems biology approach, however, in my view this limits the impact of the study. I am aware that the proteome changes analyzed in this study were extracted from a pan neuronally expressed TurboID. Yet, neuron-specific changes may nevertheless be found. For example, running the protein lists from Table S2, in the Gene enrichment tool of wormbase, I found, across several biological replicates, enrichment for the NSM, CAN and RIG neurons. A more careful analysis may uncover specific neurons that take part in this associative memory paradigm. In addition, analysis of the overlap in expression of the final gene list in different neurons, comparing them, looking for overlap and connectivity, would also help to direct towards specific circuits.

      This is an important and useful suggestion. We appreciate the benefit in exploring the data from this study from a neuron class-specific lens, in addition to the systems-level analyses already presented.

      The WormBase gene enrichment tool is indeed valuable for broad transcriptomic analyses (the findings from utilising this tool are now on page 16); however, its use of Anatomy Ontology (AO) terms also contains annotations from more abundant non-neuronal tissues in the worm. To strengthen our analysis and complement the Wormbase tool, we also used the CeNGEN database as suggested by Reviewer 3 Major Comment 1 (Taylor et al., 2021), which uses single cell RNA-Seq data to profile gene expression across the C. elegans nervous system. We input our learning proteome data into CeNGEN as a systemic analysis, identifying neurons highly represented by the learning proteome (on pages 16-20). To do this, we specifically compared genes/proteins from high-salt control worms and trained worms to identify potential neurons that may be involved in this learning paradigm. Briefly, we found:

      • WormBase gene enrichment tool: Enrichment for anatomy terms corresponding to specific interneurons (ADA, RIS, RIG), ventral nerve cord neurons, pharyngeal neurons (M1, M2, M5, I4), PVD sensory neurons, DD motor neurons, serotonergic NSM neurons, and CAN.
      • CeNGEN analysis: Representation of neurons previously implicated in associative learning (e.g., AVK interneurons, RIS interneurons, salt-sensing neuron ASEL, CEP & ADE dopaminergic neurons, and AIB interneurons), as well as neurons not previously studied in this context (pharyngeal neurons I3 & I6, polymodal neuron IL1, motor neuron DA9, and interneuron DVC). Methods are detailed on pages 50 & 51. These data are summarised in the revised manuscript as Table S7 & Figure 4.

      To further address the reviewer’s suggestion, we examined the overlap in expression patterns of the validated learning-associated genes acc-1, acc-3, lgc-46, kin-2, and F46H5.3 across the neuron classes above, using the CeNGEN database. This was done to explore potential neuron classes in which these regulators may act in to regulate learning. This analysis revealed both shared and distinct expression profiles, suggesting potential functional connectivity or co-regulation among subsets of neurons. To summarise, we found:

      • All five learning regulators are expressed in RIM interneurons and DB motor neurons.
      • KIN-2 and F46H5.3 share the same neuron expression profile and are present in many neurons, so they may play a general function within the nervous system to facilitate learning.
      • ACC-3 is expressed in three sensory neuron classes (ASE, CEP, & IL1).
      • In contrast, ACC-1 and LGC-46 are expressed in neuron classes (in brackets) implicated in gustatory or olfactory learning paradigms (AIB, AVK, NSM, RIG, & RIS) (Beets et al., 2012, Fadda et al., 2020, Wang et al., 2025, Zhou et al., 2023, Sato et al., 2021), neurons important for backward or forward locomotion (AVE, DA, DB, & VB) (Chalfie et al., 1985), and neuron classes for which their function is yet detailed in the literature (ADA, I4, M1, M2, & M5). These neurons form a potential neural circuit that may underlie this form of behavioural plasticity, which we now describe in the main text on pages 16-20 & 34-35 and summarise in Figure 4.

      OPTIONAL: A rescue of the phenotype of the mutants by re-expression of the gene is missing, this makes sure to avoid false-positive results coming from background mutations. For example, a pan neuronal or endogenous promoter rescue would help the authors to substantiate their claims, this can be done for the most promising genes. The ideal experiment would be a neuron-specific rescue but this can be saved for future works.

      We appreciate this suggestion and recognise its potential to strengthen our manuscript. In response, we made many attempts to generate pan-neuronal and endogenous promoter re-expression lines. However, we faced several technical issues in transgenic line generation, including poor survival following microinjection likely due to protein overexpression toxicity (e.g., C30G12.6, F46H5.3), and reduced animal viability for chemotaxis assays, potentially linked to transgene-related reproductive defects (e.g., ACC-1). As we have previously successfully generated dozens of transgenic lines in past work (e.g. Chew et al., Neuron 2018; Chew et al., Phil Trans B 2018; Gadenne/Chew et al., Life Science Alliance 2022), we believe the failure to produce most of these lines is not likely due to technical limitations. For transparency, these observations have been included in the discussion section of the manuscript on pages 39 & 40 as considerations for future troubleshooting.

      Fortunately, we were able to generate a pan-neuronal promoter line for KIN-2 that has been tested and included in the revised manuscript. This new data is shown in Figure 5B __and described on __pages 23 & 24. Briefly, this shows that pan-neuronal expression of KIN-2 from the ce179 mutant allele is sufficient to reproduce the enhanced learning phenotype observed in kin-2(ce179) animals, confirming the role of KIN-2 in gustatory learning.

      To address the potential involvement of background mutations (also indicated by Reviewer 4 under ‘cross-commenting’), we have also performed experiments with backcrossed versions of several mutants. These experiments aimed to confirm that salt associative learning phenotypes are due to the expected mutation. Namely, we assessed kin-2(ce179) mutants that had been backcrossed previously by another laboratory, as well as C30G12.6(-) and F46H5.3(-) animals backcrossed in this study. Although not all backcrossed mutants retained their original phenotype (i.e., C30G12.6) (Figure 6D, a newly added figure), we found that backcrossed versions of KIN-2 and F46H5.3 both robustly showed enhanced learning (Figures 5A & 6B). This is described in the text on pages 23-26.

      __Minor comments: __

      1. Lack of clarity regarding the validation of the biotin tagging of the proteome. The authors show in Figure 1 that they validated that the combination of the transgene and biotin allows them to find more biotin-tagged proteins. However there is significant biotin background also in control samples as is common for this method. The authors mention they validated biotin tagging of all their experiments, but it was unclear in the text whether they validated it in comparison to no-biotin controls, and checked for the fold change difference.

      This is an important point: We validated our biotin tagging method prior to mass spectrometry by comparing ‘no biotin’ and ‘biotin’ groups. This is shown in Figure S1 in the revised manuscript, which includes a western blot comparing untreated and biotin treated animals that are non-transgenic or expressing TurboID. As expected, by comparing biotinylated protein signal for untreated and treated lanes within each line, biotin treatment increased the signal 1.30-fold for non-transgenic and 1.70-fold for TurboID C. elegans. This is described on __page 8 __of the revised manuscript.

      To clarify, for mass spectrometry experiments, we tested a no-TurboID (non-transgenic) control, but did not perform a no-biotin control. We included the following four groups: (1) No-TurboID ‘control’ (2) No-TurboID ‘trained’, (3) pan-neuronal TurboID ‘control’ and (4) pan-neuronal TurboID ‘trained’, where trained versus control refers to whether ‘no salt’ was used as the conditioned stimulus or not, respectively (illustrated in Figure 1A). Due to the complexity of the learning assay (which involves multiple washes and handling steps, including a critical step where biotin is added during the conditioning period), and the need to collect sufficient numbers of worms for protein extraction (>3,000 worms per experimental group), adding ‘no-biotin’ controls would have doubled the number of experimental groups, which we considered unfeasible for practical reasons. This is explained on __pages 8 & 9 __of the revised manuscript.

      Also, it was unclear which exact samples were tested per replicate. In Page 9, Lines 17-18: "For all replicates, we determined that biotinylated proteins could be observed ...", But in Page 8, Line 24 : "We then isolated proteins from ... worms per group for both 'control' and 'trained' groups,... some of which were probed via western blotting to confirm the presence of biotinylated proteins".

      • Could the authors specify which samples were verified and clarify how?

      Thank you for pointing out these unclear statements: We have clarified the experimental groups used for mass spectrometry experiments as detailed in the response above on pages 8 &____ 9. In addition, western blots corresponding to each biological replicate of mass spectrometry data described in the main text on page 10 and have been added to the revised manuscript (as Figure S3). These western blots compare biotinylation signal for proteins extracted from (1) No-TurboID ‘control’ (2) No-TurboID ‘trained’, (3) pan-neuronal TurboID ‘control’ and (4) pan-neuronal TurboID ‘trained’. These blots function to confirm that there were biotinylated proteins in TurboID samples, before enrichment by streptavidin-mediated pull-down for mass spectrometry.

      OPTIONAL: include the fold changes of biotinylated proteins of all the ones that were tested. Similar to Figure 1.C.

      This is an excellent suggestion. As recommended by the reviewer, we have included fold-changes for biotinylated protein levels between high-salt control and trained groups (on pages 9 & 10 for replicate #1 and in __Table S2 __for replicates #2-5). This was done by measuring protein levels in whole lanes for each experimental group per biological replicate within western blots (__Figure 1C __for replicate #1 and __Figure S3 __for replicates #2-5) of protein samples generated for mass spectrometry (N = 5).

      Figure 2 does not add much to the reader, it can be summarized in the text, as the fraction of proteins enriched for specific cellular compartments.

      • I would suggest to remove Figure 2 (originally written as figure 3) to text, or transfer it to the supplementry material.

      As noted in cross-comment response to Reviewer 4, there were typos in the original figure references, we have corrected them above. Essentially, this comment is referring to Figure 2.

      We appreciate this feedback from Reviewer 1. We agree that the original __Figure 2 __functions as a visual summary from analysis of the learning proteome at the subcellular compartment level. However, it also serves to highlight the following:

      • Representation for neuron-specific GO terms is relatively low, but even this small percentage represents entire protein-protein networks that are biologically meaningful, but that are difficult to adequately describe in the main text.
      • TurboID was expressed in neurons so this figure supports the relevance of the identified proteome to biological learning mechanisms.
      • Many of these candidates could not be assessed by learning assay using single mutants since related mutations are lethal or substantially affect locomotion. These networks therefore highlight the benefit in using strategies like TurboID to study learning. We have chosen to retain this figure, moving it to the supplementary material as Figure S4 in the revised manuscript, as suggested.

      • OPTIONAL- I would suggest the authors to mark in a pathway summary figure similar to Figure 3 (originally written as Figure 4) the results from the behavior assay of the genetic screen. This would allow the reader to better get the bigger picture and to connect to the systemic approach taken in Figures 2 and 3.

      We think this is a fantastic suggestion and thank Reviewer 1 for this input. In the revised manuscript, we have added Figure 7, which summarises the tested candidates that displayed an effect on learning, mapped onto potential molecular pathways derived from networks in the learning proteome. This figure provides a visual framework linking the behavioural outcomes to the network context. This is described in the main text on pages 32-33.

      Typo in Figure 3: the circle of PPM1: The blue right circle half is bigger than the left one.

      We thank the Reviewer for noticing this, the node size for PPM-1.A has been corrected in what is now Figure 2 in the revised work.

      Unclarity in the discussions. In the discussion Page 24, Line 14, the authors raise this question: "why are the proteins we identified not general learning regulators?. The phrasing and logic of the argumentation of the possible answers was hard to follow. - Can you clarify?

      We appreciate this feedback in terms of unclarity, as we strive to explain the data as clearly and transparently as possible. Our goal in this paragraph was to discuss why some candidates were seen to only affect salt associative learning, as opposed to showing effects in multiple learning paradigms (i.e., which we were defining as a ‘general learning regulator’). We have adjusted the wording in several places in this paragraph now on pages 36 & 37 to address this comment. We hope the rephrased paragraph provides sufficient rationalisation for the discussion regarding our selection strategy used to isolate our protein list of potential learning regulators, and its potential limitations.

      ***Cross-Commenting** *

      Firstly, we would like to express our appreciation for the opportunity for reviewers to cross-comment on feedback from other reviewers. We believe this is an excellent feature of the peer review process, and we are grateful to the reviewers for their thoughtful engagement and collaborative input.

      I would like to thank Reviewer #4 for the great cross comment summary, I find it accurate and helpful.

      I also would like to thank Reviewer #4 for spotting the typos in my minor comments, their page and figure numbers are the correct ones.

      We have corrected these typos in the relevant comments, and have responded to them accordingly.

      Small comment on common point 1 - My feeling is that it is challanging to do quantitative mass spectrometry, especially with TurboID. In general, the nature of MS data is that it hints towards a direction but a followup validation work is required in order to assess it. For example, I am not surprised that the fraction of repeats a hit appeared in does not predict well whether this hit would be validated behavioraly. Given these limitations, I find the authors' approach reasonable.

      We thank Reviewer 1 for this positive and thoughtful feedback. We also appreciate Reviewer 4’s comment regarding quantitative mass spectrometry and have addressed this in detail below (see response to Reviewer 4). However, we agree with Reviewer 1 that there are practical challenges to performing quantitative mass spectrometry with TurboID, primarily due to the enrichment for biotinylated proteins that is a key feature of the sample preparation process.

      Importantly, we whole-heartedly agree with Reviewer 1’s statement that “In general, the nature of MS data is that it hints towards a direction but a follow-up validation work is required in order to assess it”. This is the core of our approach: however, we appreciate that there are limitations to a qualitative ‘absent/present’ approach. We have addressed some of these limitations by clarifying the criteria used for selecting candidate genes, based additionally on the presence of the candidate in multiple biological replicates (categorised as ‘strong’ hits). Based on this method, we were able to validate the role of several novel learning regulators (Figures 5, 6, & S7). We sincerely hope that this manuscript can function as a direction for future research, as suggested by this Reviewer.

      I also would like to highlight this major comment from reviewer 4:

      "In Experimental Procedures, authors state that they excluded data in which naive or control groups showed average CI 0.5499 for N2 (page 36, lines 5-7). "

      This threshold seems arbitrary to me too, and it requires the clarifications requested by reviewer 4.

      As detailed in our response to Reviewer 4, Major Comment 2, data were excluded only in rare cases, specifically when N2 worms failed to show strong salt attraction prior to training, or when trained N2 worms did not exhibit the expected behavioural difference compared to untrained controls – this can largely be attributed to clear contamination or over-population issues, which are visible prior to assessing CTX plates and counting chemotaxis indices.

      These criteria were initially established to provide an objective threshold for excluding biological replicates, particularly when planning to assay a large number of genetic mutants. However, after extensive testing across many replicates, we found that N2 worms (that were not starved, or not contaminated) consistently displayed the expected phenotype, rendering these thresholds unnecessary. We acknowledge that emphasizing these criteria may have been misleading, and have therefore removed them from page 50 in the revised manuscript to avoid confusion and ensure clarity.

      Reviewer #1 (Significance (Required)):

      This study does a great job to effectively utilize the TurboID technique to identify new pathways implicated in salt-associative learning in C. elegans. This technique was used in C. elegans before, but not in this context. The salt-associative memory induced proteome list is a valuable resource that will help future studies on associative memory in worms. Some of the implicated molecular pathways were found before to be involved in memory in worms like cAMP, as correctly referenced in the manuscript. The implication of the acetylcholine pathway is novel for C. elgeans, to the best of my knowledge. The finding that the uncovered genes are specifically required for salt associative memory and not for other memory assays is also interesting.

      However overall I find the impact of this study limited. The premise of this work is to use the Turbo-ID method to conduct a systems analysis of the proteomic changes. The work starts by conducting network analysis and gene enrichment which fit a systemic approach. However, since the authors find that ~30% of the tested hits affect the phenotype, and since only 17/706 proteins were assessed, it is challenging to draw conclusive broad systemic claims. Alternatively, the authors could have focused on the positive hits, and understand them better, find the specific circuits where these genes act. This could have increased the impact of the work. Since neither of these two options are satisfied, I view this work as solid, but not wide in its impact and therefore estimate the audience of this study would be more specialized.

      My expertise is in C. elegans behavior, genetics, and neuronal activity, programming and machine learning.

      We thank the Reviewer for these comments and appreciate the recognition of the value of the proteomic dataset and the identification of novel molecular pathways, including the acetylcholine pathway, as well as the specificity of the uncovered genes to salt-associative memory.

      Regarding the reviewer’s concern about the overall impact and scope of the study, we respectfully offer the following clarification. Our aim was to establish a systems-level approach for investigating learning-related proteomic changes using TurboID, and we acknowledge that only a subset of the identified proteins was experimentally tested (now 26/706 proteins in the revised manuscript). Although only five of the tested single gene mutants showed a robust learning phenotype in the revised work (after backcrossing, more stringent candidate selection, improved statistical analysis in addressing reviewer comments), our proteomic data provides us a unique opportunity to define these candidates within protein-protein networks (as illustrated in Figure 7). Importantly, our functional testing focused on single-gene mutants, which may not reveal phenotypes for genes that act redundantly (now mentioned on pages 28-30). This limitation is inherent to many genetic screens and highlights the value of our proteomic dataset, which enables the identification of broader protein-protein interaction networks and molecular pathways potentially involved in learning.

      To support this systems-level perspective, we have added Figure 7, which visually integrates the tested candidates into molecular pathways derived from the learning proteome for learning regulators KIN-2 and F46H5.3. We also emphasise more explicitly in the text (on pages 32-33) the value of our approach by highlighting the functional protein networks that can be derived from our proteomics dataset.

      We fully acknowledge that the use of TurboID across all neurons limits the resolution needed to pinpoint individual neuron contributions, and understand the benefit in further experiments to explore specific circuits. Many circuits required for salt sensing and salt-based learning are highly explored in the literature and defined explicitly (see Rahmani & Chew, 2021), so our intention was to complement the existing literature by exploring the protein-protein networks involved in learning, rather than on neuron-neuron connectivity. However, we recognise the benefit in integrating circuit-level analyses, given that our proteomic data suggests hundreds of candidates potentially involved in learning. While validating each of these candidates is beyond the scope of the current study, we have taken steps to suggest candidate neurons/circuits by incorporating tissue enrichment analyses and single-cell transcriptomic data (Table S7 & Figure 4). These additions highlight neuron classes of interest and suggest possible circuits relevant to learning.

      We hope this clarification helps convey the intended scope and contribution of our study. We also believe that the revisions made in response to Reviewer 1’s feedback have strengthened the manuscript and enhanced its significance within the field.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      __Summary: __

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathways analysis. The authors performed functional characterization of some of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms, and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Major Comments:

      1. The definition of a "hit" from the TurboID approach is does not appear stringent enough. According to the manuscript, a hit was defined as one unique peptide detected in a single biological replicate (out of 5), which could give rise to false positives. In figure S2, it is clear that there relatively little overlap between samples with regards to proteins detected between replicates, and while perhaps unintentional, presenting a single unique peptide appears to be an attempt to inflate the number of hits. Defining hits as present in more than one sample would be more rigorous. Changing the definition of hits would only require the time to re-list genes and change data presented in the manuscript accordingly. We thank Reviewer 2 for this valuable comment, and the following related suggestion. We agree with the statement that “Defining hits as present in more than one sample would be more rigorous”. Therefore, to address this comment, we have now separated candidates into two categories in Table 2 __in the revised manuscript: ‘__strong’ (present in 3 or more biological replicates) and ‘weak’ candidates (present in 2 or fewer biological replicates). However, we think these weaker candidates should still be included in the manuscript, considering we did observe relationships between these proteins and learning. For example, ACC-1, which influences salt associative learning in C. elegans, was detected in one replicate of mass spectrometry as a potential learning regulator (Figure S8A). We describe this classification in the main text on pages 21-22.

      We also agree with Reviewer 2 that the overlap between individual candidate hits is low between biological replicates; the inclusion of Figure S2 __in the original manuscript serves to highlight this limitation. However, it is also important to consider that there is notable overlap for whole molecular pathways between biological replicates of mass spectrometry data as shown in __Figure 2 __in the revised manuscript (this consideration is now mentioned on __pages 13-14). We have included Figure 3 to illustrate representation for two metabolic processes across several biological replicates normally indispensable to animal health, as an example to provide additional visual aid for the overlap between replicates of mass spectrometry. We provide this figure (described on pages 13 & 15) to demonstrate the strength of our approach in that it can detect candidates not easily assessable by conventional forward or reverse genetic screens.

      We also appreciate the opportunity to explain our approach. The criteria of “at least one unique peptide” was chosen based on a previous work for which we adapted for this manuscript (Prikas et al., 2020). It was not intended to inflate the number of hits but rather to ensure sensitivity in detecting low-abundance neuronal proteins. We have clarified this in our Methods (page 46).

      The "hits" that the authors chose to functionally characterize do not seem like strong candidate hits based on the proteomics data that they generated. Indeed, most of the hits are present in a single, or at most 2, biological replicate. It is unclear as to why the strongest hits were not characterized, which if mutant strains are publicly available, would not be a difficult experiment to perform.

      We thank the reviewer for this important suggestion. To address this, we have described two molecular pathways with multiple components that appear in more than one biological replicate of mass spectrometry data in Figure 3 (main text on page 13). In addition, we have included __Figures 6 & S7 __where 9 additional single mutants corresponding to candidates in three or more biological replicates of mass spectrometry were tested for salt associative learning. Briefly, we found the following (number of replicates that a protein was unique to TurboID trained animals is in brackets):

      • Novel arginine kinase F46H5.3 (4 replicates) displays an effect in both salt associative learning and salt aversive learning in the same direction (Figures 6A, 6B, & S9A, pages 31-32 & 37-38).
      • Worms with a mutation for armadillo-domain protein C30G12.6 (3 replicates) only displayed an enhanced learning phenotype when non-backcrossed, not backcrossed. This suggests the enhanced learning phenotype was caused by a background mutation (Figure 6, pages 24-25).
      • We did not observe an effect on salt associative learning when assessing mutations for the ciliogenesis protein IFT-139 (5 replicates), guanyl nucleotide factors AEX-3 or TAG-52 (3 replicates), p38/MAPK pathway interactor FSN-1 (3 replicates), IGCAM/RIG-4 (3 replicates), and acetylcholine components ACR-2 (4 replicates) and ELP-1 (3 replicates) (Figure S7, on pages 27-30). However, we note throughout the section for which these candidates are described that only single gene mutants were tested, meaning that genes that function in redundant or compensatory pathways may not exhibit a detectable phenotype. Because of the lack of strong evidence that these are indeed proteins regulated in the context of learning based on proteomics, including evidence of changes in the proteins (by imaging expression changes of fluorescent reporters or a biochemical approach), would increase confidence that these hits are genuine.

      We thank Reviewer 2 for this suggestion – we agree that it would have been ideal to have additional evidence suggesting that changes in candidate protein levels are associated directly with learning. Ideally, we would have explored this aspect further; however, as outlined in response to Reviewer 1 Major Comment 2 (OPTIONAL), this was not feasible within the scope of the current study due to several practical challenges. Specifically, we attempted to generate pan-neuronal and endogenous promoter rescue lines for several candidates, but encountered significant challenges, including poor survival post-microinjection (likely due to protein overexpression toxicity) and reduced viability for behavioural assays, potentially linked to transgene-related reproductive defects. This information is now described on pages 39 & 40 of the revised work.

      To address these limitations, we performed additional behavioural experiments where possible. We successfully generated a pan-neuronal promoter line for kin-2, which was tested and included in the revised manuscript (Figure 5B, pages 30 & 31). In addition, to confirm that observed learning phenotypes were due to the expected mutations and not background effects, we conducted experiments using backcrossed versions of several mutant lines as suggested by Reviewer 4 Cross Comment 3 (Figure 6, pages 23-24 & 24-26). Briefly, this shows that pan-neuronal expression of KIN-2 from the ce179 mutant allele is sufficient to repeat the enhanced learning phenotype observed in backcrossed kin-2(ce179) animals, providing additional evidence that the identified hits are required for learning. We also confirmed that F46H5.3 modulates salt associative learning, given both non-backcrossed and backcrossed F46H5.3(-) mutants display a learning enhancement phenotype. The revised text now describes this data on the page numbers mentioned above.

      Minor Comments:

      1. The authors highlight that the proteins they discover seem to function uniquely in their gustatory associative paradigm, but this is not completely accurate. kin-2, which they characterize in figure 4, is required for positive butanone association (the authors even say as much in the manuscript) in Stein and Murphy, 2014. We appreciate this correction and thank the Reviewer for pointing this out. We have amended the wording appropriately on page 31 to clarify our meaning.

      2. “Although kin-2(ce179) mutants were not shown to impact salt aversive learning, they have been reported previously to display impaired intermediate-term memory (but intact learning and short-term memory) for butanone appetitive learning (Stein and Murphy, 2014).”*

      Reviewer #2 (Significance (Required)):

      • General Assessment: The approach used in this study is interesting and has the potential to further our knowledge about the molecular mechanisms of associative behaviors. Strengths of the study include the design with carefully thought out controls, and the premise of combining their proteomics with behavioral analysis to better understand the biological significance of their proteomics findings. However, the criteria for defining hits and prioritization of hits for behavioral characterizations were major wweaknesses of the paper.
      • Advance: There have been multiple transcriptomic studies in the worm looking at gene expression changes in the context of behavioral training (Lakhina et al., 2015, Freytag 2017). This study compliments and extends those studies, by examining how the proteome changes in a different training paradigm. This approach here could be employed for multiple different training paradigms, presenting a new technical advance for the field.
      • Audience: This paper would be of interest to the broader field of behavioral and molecular neuroscience. Though it uses an invertebrate system, many findings in the worm regarding learning and memory translate to higher organisms.
      • I am an expert in molecular and behavioral neuroscience in both vertebrate and invertebrate models, with experience in genetics and genomics approaches. We appreciate Reviewer 2’s thoughtful assessment and constructive feedback. In response to concerns regarding definition and prioritisation of hits, we have revised our approach as detailed above to place more consideration on ‘strong’ hits present in multiple biological replicates. We have also added new behavioural data for additional mutants that fall into this category (Figures 6 & S7). We hope these revisions strengthen our study and enhance its relevance to the behavioural/molecular neuroscience community.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      __Summary: __

      In the manuscript titled "Identifying regulators of associative learning using a protein-labelling approach in C. elegans" the authors attempted to generate a snapshot of the proteomic changes that happen in the C. elegans nervous system during learning and memory formation. They employed the TurboID-based protein labeling method to identify the proteins that are uniquely found in samples that underwent training to associate no-salt with food, and consequently exhibited lower attraction to high salt in a chemotaxis assay. Using this system they obtained a list of target proteins that included proteins represented in molecular pathways previously implicated in associative learning. The authors then further validated some of the hits from the assay by testing single gene mutants for effects on learning and memory formation.

      Major Comments:

      In the discussion section, the authors comment on the sources of "background noise" in their data and ways to improve the specificity. They provide some analysis on this aspect in Supplementary figure S2. However, a better visualization of non-specificity in the sample could be a GO analysis of tissue-specificity, and presented as a pie chart as in Figure 2A. Non-neuronal proteins such as MYO-2 or MYO-3 repeatedly show up on the "TurboID trained" lists in several biological replicates (Tables S2 and S3). If a major fraction of the proteins after subtraction of control lists are non-specific, that increases the likelihood that the "hits" observed are by chance. This analysis should be presented in one of the main figures as it is essential for the reader to gauge the reliability of the experiment.

      We agree with this assessment and thank Reviewer 3 for this constructive suggestion. In response, we have now incorporated a comprehensive tissue-specific analysis of the learning proteome in the revised manuscript. Using the single neuron RNA-Seq database CeNGEN, we identified the proportion of neuronal vs non-neuronal proteins from each biological replicate of mass spectrometry data. Specifically, we present Table 1 __on page 17 (which we originally intended to include in the manuscript, but inadvertently left out), which shows that 87-95% (i.e. a large majority) of proteins identified across replicates corresponded to genes detected in neurons, supporting that the TurboID enzyme was able to target the neuronal proteome as expected. __Table 1 is now described in the main text of the revised work on page 16.

      In addition, we performed neuron-specific analyses using both the WormBase gene enrichment tool and the CeNGEN single-cell transcriptomic database, which we describe in detail on our response to Reviewer 1 Major Comment 2. To summarise, these analyses revealed enrichment of several neuron classes, including those previously implicated in associative learning (e.g., ASEL, AIB, RIS, AVK) as well as neurons not previously studied in this context (e.g., IL1, DA9, DVC) (summarised in Table S7). By examining expression overlap across neuron types, we identified shared and distinct profiles that suggest potential functional connectivity and candidate circuits underlying behavioural plasticity (Figure 4). Taken together, these data show that the proteins identified in our dataset are (1) neuronal and (2) expressed in neurons that are known to be required for learning. Methods are detailed on pages 50-51.

      Other than the above, the authors have provided sufficient details in their experimental and analysis procedures. They have performed appropriate controls, and their data has sufficient biological and technical replaictes for statistical analysis.

      We appreciate this positive feedback and thank the Reviewer for acknowledging the clarity of our experimental and analysis procedures.

      Minor Comments:

      There is an error in the first paragraph of the discussion, in the sentences discussing the learning effects in gar-1 mutant worms. The sentences in lines 12-16 on page 22 says that gar-1 mutants have improved salt-associative learning and defective salt-aversive learning, while in fact the data and figures state the opposite.

      We appreciate the Reviewer noting this discrepancy. As clarified in our response to Reviewer 1, Major Comment 1 above, we reanalysed the behavioural data to ensure consistency across genotypes by comparing only those tested within the same biological replicates (thus having the same N for all genotypes). Upon this reanalysis, we found that the previously reported phenotype for gar-1 mutants in salt-associative learning was not statistically different from wild-type controls. Therefore, we have removed references to GAR-1 from the manuscript.

      __Reviewer #3 (Significance (Required)): __Strengths and limitations: This study used neuron-specific TurboID expression with transient biotin exposure to capture a temporally restricted snapshot of the C. elegans nervous system proteome during salt-associative learning. This is an elegant method to identify proteins temporally specific to a certain condition. However, there are several limitations in the way the experiments and analyses were performed which affect the reliability of the data. As the authors themselves have noted in the discussion, background noise is a major issue and several steps could be taken to improve the noise at the experimental or analysis steps (use of integrated C. elegans lines to ensure uniformity of samples, flow cytometry to isolate neurons, quantitative mass spec to detect fold change vs. strict presence/absence). Advance: Several studies have demonstrated the use of proximity labeling to map the interactome by using a bait protein fusion. In fact, expressing TurboID not fused to a bait protein is often used as a negative control in proximity labeling experiments. However, this study demonstrates the use of free TurboID molecules to acquire a global snapshot of the proteome under a given condition. Audience: Even with the significant limitations, this study is specifically of interest to researchers interested in understanding learning and memory formation. Broadly, the methods used in this study could be modified to gain insights into the proteomic profiles at other transient developmental stages. The reviewer's field of expertise: Cell biology of C. elegans neurons.

      We thank the reviewer for their thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential of using neuron-specific TurboID to capture a temporally restricted snapshot of the C. elegans nervous system proteome during learning. We agree that this approach offers a unique opportunity to identify proteins associated with specific behavioural states in future studies.

      We also appreciate the reviewer’s comments regarding limitations in experimental and analytical design. In revising the manuscript, we have taken several steps to address these concerns and improve the clarity, rigour, and interpretability of our data. Specifically:

      • We now provide a frequency-based representation of proteomic hits (Table 2), which helps clarify how candidate proteins were selected and highlights differences between trained and control groups.
      • We have added neuron-specific enrichment analyses using both WormBase and CenGEN databases (Table S7 & Figure 4), which help identify candidate neurons and potential circuits involved in learning (methods on pages 50-51).
      • We have clarified the rationale for using qualitative proteomics in the context of TurboID, in addition to acknowledging the challenges of integrating quantitative mass spectrometry with biotin-based enrichment (page 39). Additional methods for improving sample purity, such as using integrated lines or FACS-enrichment of neurons, could further refine this approach in future studies. For transparency, we did attempt to integrate the TurboID transgenic line to improve the strength and consistency of biotinylation signals. However, despite four rounds of backcrossing, this line exhibited unexpected phenotypes, including a failure to respond reliably to the established training protocol. As a result, we were unable to include it in the current study. Nonetheless, we believe our current approach provides a valuable proof-of-concept and lays the groundwork for future refinement. By addressing the major concerns of peer reviewers, we believe our study makes a significant and impactful contribution by demonstrating the feasibility of using TurboID to capture learning-induced proteomic changes in the nervous system. The identification of novel learning-related mutants, including those involved in acetylcholine signalling and cAMP pathways, provides new directions for future research into the molecular and circuit-level mechanisms of behavioural plasticity.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific genes" which are observed only after saltless feeding. They categorized these proteins by GO analyses and pathway analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, gar-1 acetylcholine receptor GPCR, glna-3 glutaminase involved in glutamate biosynthesis, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Major comments:

      1) There are problems in the data processing and presentation of the proteomics data in the current manuscript which deteriorates the utility of the data. First, as the authors discuss (page 24, lines 5-12), the current approach does not consider amount of the peptides. Authors state that their current approach is "conservative", because some of the proteins may be present in both control and learned samples but in different amounts. This reviewer has a concern in the opposite way: some of the identified proteins may be pseudo-positive artifacts caused by the analytical noise. The problem is that authors included peptides that are "present" in "TurboID, trained" sample but "absent" in the "Non-Tg, trained" and "TurboID, control" samples in any one of the biological replicates, to identify "learning proteome" (706 proteins, page 8, last line - page 9, line 8; page 32, line 21-22). The word "present" implies that they included even peptides whose amounts are just above the detection threshold, which is subject to random noise caused by the detector or during sample collection and preparation processes. This consideration is partly supported by the fact that only a small fraction of the proteins are common between biological replicates (honestly and respectably shown in Figure S2). Because of this problem, there is no statistical estimate of the identity in "learning proteome" in the current manuscript. Therefore, the presentation style in Tables S2 and S3 are not very useful for readers, especially because authors already subtracted proteins identified in Non-Tg samples, which must also suffer from stochastic noise. I suggest either quantifying the MS/MS signal, or if authors need to stick to the "present"/"absent" description of the MS/MS data, use the number of appearances in biological replicates of each protein as estimate of the quantity of each protein. For example, found in 2 replicates in "TurboID, learned" and in 0 replicates in "Non-Tg, trained". One can apply statistics to these counts. This said, I would like to stress that proteins related to acquisition of memory may be very rare, especially because learning-related changes likely occur in a small subset of neurons. Therefore, 1 time vs 0 time may be still important, as well as something like 5 times vs 1 time. In summary, quantitative description of the proteomics results is desired.

      We thank the reviewer for these valuable comments and suggestions.

      We acknowledge that quantitative proteomics would provide beneficial information; however, as also indicated by Reviewer 1 (in cross-comment), it is practically challenging to perform with TurboID. We have included discussion of potential future experiments involving quantitative mass spectrometry, as well as a comprehensive discussion of some of the limitations of our approach as summarised by this Reviewer, in the Discussion section (page 39). However, we note that our qualitative approach also provides beneficial knowledge, such as the identification of functional protein networks acting within biological pathways previously implicated in learning (Figure 2), and novel learning regulators ACC-1/3, LGC-46, and F46H5.3.

      We agree with the assessment that the frequency of occurrence for each candidate we test per biological replicate is useful to disclose in the manuscript as a proxy for quantification. This was also highlighted by Reviewer 2 (Major Comment 1). As detailed above in response to R2, we have now separated candidates into two categories: ‘strong’ (present in 3 or more biological replicates) and ‘weak’ candidates (present in 2 or fewer biological replicates). We have also added behavioural data after testing 9 of these strong candidates in Figures 6 & S7.

      We have also added Table 2 to the revised manuscript, which summarises the frequency-based representation of the proteomics results, as suggested. This is described on pages 22-23. Briefly, this shows the range of candidates further explored using single mutant testing. Specifically, this data showed that many of the tested candidates were more frequently detected in trained worms compared to high-salt controls. This includes both strong and weak candidates, providing a clearer view of how proteomic frequency informed our selection for functional testing.

      2) There is another problem in the treatment of the behavioural data. In Experimental Procedures, authors state that they excluded data in which naive or control groups showed average CI 0.5499 for N2 (page 36, lines 5-7). How were these values determined? One common example for judging a data point as an outlier is > mean + 1.5, 2 or 3 SD, or Thank you for pointing this out. As mentioned by both Reviewer 1 and Reviewer 4, the original manuscript states the following: “Data was excluded for salt associative learning experiments when wild-type N2 displayed (1) an average CI ≤ 0.6499 for naïve or control groups and/or (2) an average CI either 0.5499 for trained groups.”

      To clarify, we only excluded experiments in rare cases where N2 worms did not display robust high salt attraction before training, or where trained N2 did not display the expected behavioural difference compared to untrained or high-salt control N2. These anomalies were typically attributable to clear contamination or starvation issues that could clearly be observed prior to counting chemotaxis indices on CTX plates.

      We established these exclusion criteria in advance of conducting multiple learning assays to ensure an objective threshold for identifying and excluding assays affected by these rare but observable issues. However, these criteria were later found to be unnecessary, as N2 worms robustly displayed the expected untrained and trained phenotypes for salt associative learning when not compromised by starvation or contamination.

      We understand that the original criteria may have appeared to introduce arbitrary bias in data selection. To address this concern, we have removed these criteria from the revised manuscript from page 50.

      Minor comments:

      1) Related to Major comments 1), the successful effect of neuron-specific TurboID procedure was not evaluated. Authors obtained both TurboID and Non-Tg proteome data. Do they see enrichment of neuron-specific proteins? This can be easily tested, for example by using the list of neuron-specific genes by Kaletsky et al. (http://dx.doi.org/10.1038/nature16483 or http://dx.doi.org/10.1371/journal.pgen.1007559), or referring to the CenGEN data.

      We thank this Reviewer for this helpful suggestion, which was echoed by Reviewer 3 (Major Comment 1). As indicated in the response to R3 above, the revised manuscript now includes Table 1 as a tissue-specific analysis of the learning proteome, using the single neuron RNA-Seq database CeNGEN to identify the proportion of neuronal proteins from each biological replicate of mass spectrometry data. Generally, we observed a range of 87-95% of proteins corresponded to genes from the CeNGEN database that had been detected in neurons, providing evidence that the TurboID enzyme was able to target the neuronal proteome as expected. Table 1 is now described in the main text of the revised work on pages 16 & 17.

      2) The behavioural paradigm needs to be described accurately. Page 5, line 16-17, "C. elegans normally have a mild attraction towards higher salt concentration": in fact, C. elegans raised on NGM plates, which include approximately 50mM of NaCl, is attracted to around 50mM of NaCl (Kunitomo et al., Luo et al.) but not 100-200 mM.

      We thank the Reviewer for pointing this out. We agree that clarification is necessary. The revised text reads as follows on page 5: “C. elegans are typically grown in the presence of salt (usually ~ 50 mM) and display an attraction toward this concentration when assayed for chemotaxis behaviour on a salt gradient (Kunitomo et al., 2013, Luo et al., 2014). Training/conditioning with ‘no salt + food’ partially attenuates this attraction (group referred to ‘trained’).”

      Authors call this assay "salt associative learning", which refers to the fact that worms associate salt concentration (CS) and either presence or absence of food (appetitive or aversive US) during conditioning (Kunitomo et al., Luo et al., Nagashima et al.) but they are looking at only association with presence of food, and for proteome analysis they only change the CS (NaCl concentration, as discussed in Discussion, p24, lines 4-5). It is better to attempt to avoid confusion to the readers in general.

      Thank you Reviewer 4 for highlighting this clarity issue. We clarify our definition of “salt associative learning” for the purpose of this study in the revised manuscript on page 6 with the following text:

      “Similar behavioural paradigms involving pairings between salt/no salt and food/no food have been previously described in the literature (Nagashima et al. 2019). Here, learning experiments were performed by conditioning worms with either ‘no salt + food’ (referred to as ‘salt associative learning’) or ‘salt + no food’ (called ‘salt aversive learning’).”

      3) page 32, line 23: the wording "excluding" is obscure and misleading because the elo-6 gene was included in the analysis.

      We appreciate this Reviewer for pointing out this misleading comment, which was unintentional. We have now removed it from the text (on page 21).

      4) Typo at page 24, line 18: "that ACC-1" -> "than ACC-1".

      This has been corrected (on page 37).

      5) Reference. In "LEO, T. H. T. et al.", given and sir names are flipped for all authors. Also, the paper has been formally published (http://dx.doi.org/10.1016/j.cub.2023.07.041).

      We appreciate the Reviewer drawing our attention to this – the reference has been corrected and updated.

      I would like to express my modest cross comments on the reviews:

      1) Many of the reviewers comment on the shortage in the quantitative nature of the proteome analysis, so it seems to be a consensus.

      Thank you Reviewer 4 for this feedback. We appreciate the benefit in performing quantitative mass spectrometry, in that it provides an additional way to parse molecular mechanisms in a biological process (e.g., fold-changes in protein expression induced by learning). However, we note that quantitative mass spectrometry is challenging to integrate with TurboID due to the requirement to enrich for biotinylated peptides during sample processing (we now mention this on page 39). Nevertheless, it would be exciting to see this approach performed in a future study.

      To address the limitations of our original qualitative approach and enhance the clarity and utility of our dataset, we have made the following revisions in the manuscript:

      • Candidate selection criteria: We now clearly define how candidates were selected for functional testing, based on their frequency across biological replicates. Specifically, “strong candidates” were detected in three or more replicates, while “weak candidates” appeared in two or fewer.
      • Frequency-based representation (_Table 2_):__We appreciate the suggestion by Reviewer 4 (Major Comment 1) to quantify differences between high-salt control and trained groups. We now provide the frequency-based representation of the candidates tested in this study within our proteomics data in __Table 2. This data showed that many of the tested candidates were more frequently detected in trained worms compared to high-salt controls. This includes both strong and weak candidates We hope these additions help clarify our approach and demonstrate the value of the dataset, even within the constraints of qualitative proteomics.

      2) Also, tissue- or cell-specificity of the identified proteins were commonly discussed. In reviewer #3's first Major comment, appearance of non-neuronal protein in the list was pointed out, which collaborate with my (#4 reviewer's) question on successful identification of neuronal proteins by this method. On the other hand, reviewer #1 pointed out subset neuron-specific proteins in the list. Obviously, these issues need to be systematically described by the authors.

      We agree with Reviewer 4 that these analyses provide a critical angle of analysis that is not explored in the original manuscript.

      Tissue analysis (Reviewer 3 Major Comment 1): We have used the single neuron RNA-Seq database CeNGEN, to identify that 87-95% (i.e. a large majority) of proteins identified across replicates corresponded to genes detected in neurons. These findings support that the TurboID enzyme was able to target the neuronal proteome as expected. Table 1 provides this information as is now described in the main text of the revised work on page 16.

      __Neuron class analyses (Reviewer 1 Major Comment 2): __In response, we have used the suggested Wormbase gene enrichment tool and CeNGEN. We specifically input proteins from the learning proteome into Wormbase, after filtering for proteins unique to TurboID trained animals. For CeNGEN, we compared genes/proteins from control worms and trained worms to identify potential neurons that may be involved in this learning paradigm.

      Briefly, we found highlight a range of neuron classes known in learning (e.g., RIS interneurons), cells that affect behaviour but have not been explored in learning (e.g., IL1 polymodal neurons), and neurons for which their function/s are unknown (e.g., pharyngeal neuron I3). Corresponding text for this new analysis has been added on pages 16-20, with a new table and figure added to illustrate these findings (Table S7 & Figure 4). Methods are detailed on pages 50-51.

      3) Given reviewer #1's OPTIONAL Major comment, as an expert of behavioral assays in C. elegans, I would like to comment based on my experience that mutants received from Caenorhabditis Genetics Center or other labs often lose the phenotype after outcrossing by the wild type, indicating that a side mutation was responsible for the observed behavioral phenotype. Therefore, outcrossing may be helpful and easier than rescue experiments, though the latter are of course more accurate.

      Thank you for this suggestion. To address the potential involvement of background mutations, we have done experiments with backcrossed versions of mutants tested where possible, as shown in Figure 6. We found that F46H5.3(-) mutants maintained enhanced learning capacity after backcrossing with wild type, compared to their non-backcrossed mutant line. This was in contrast to C30G12.6(-) animals which lost their enhanced learning phenotype following backcrossing using wild type worms. This is described in the text on pages 24-26.

      4) Just let me clarify the first Minor comment by reviewer #2. Authors described that the kin-2 mutant has abnormality in "salt associative learning" and "salt aversive learning", according to authors' terminology. In this comment by reviewer #2, "gustatory associative learning" probably refers to both of these assays.

      Reviewer 4 is correct. We have amended the wording appropriately on page 31 to clarify our meaning to address Reviewer 2’s comment.

      • “Although kin-2(ce179) mutants were not shown to impact salt aversive learning, they have been reported previously to display impaired intermediate-term memory (but intact learning and short-term memory) for butanone appetitive learning (Stein and Murphy, 2014).”*

      5) There seem to be several typos in reviewer #1's Minor comments.

      "In Page 9, Lines 17-18" -> "Page 8, Lines 17-18".

      "Page 8, Line 24" -> "Page 7, Line 24".

      "I would suggest to remove figure 3" -> "I would suggest to remove figure 2"

      "summary figure similar to Figure 4" -> "summary figure similar to Figure 3"

      "In the discussion Page 24, Line 14" -> "In the discussion Page 23, Line 14"

      (I note that because a top page was inserted in the "merged" file but not in art file for review, there is a shift between authors' page numbers and pdf page numbers in the former.)

      It would be nice if reviewer #1 can confirm on these because I might be wrong.

      We appreciate Reviewer 4 noting this, and can confirm that these are the correct references (as indicated by Reviewer 1 in their cross-comments)

      Reviewer #4 (Significance (Required)):

      1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This guarantees the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. We note an example in which a proteome analysis utilizing TurboID, though not the comparison between trained/control, has led to finding of learning related proteins (Hiroki et al., http://dx.doi.org/10.1038/s41467-022-30279-7). As described in the Major comments 1) in the previous section, improvement of data presentation will be necessary to substantiate this novelty.

      We appreciate this thoughtful feedback. We agree that while the neuronal transcriptome has been explored in Lakhina et al., 2015 for C. elegans in the context of memory, our study represents the first to examine learning-induced changes in the total neuronal proteome. We particularly agree with the statement that “for some genes, protein levels may change even though mRNA levels remain the same”. This is essential rationale that we now discuss on page 42.

      Additionally, we acknowledge the relevance of the study by Hiroki et al., 2022, which used TurboID to identify learning-related proteins, though not in a trained versus control comparison. Our work builds on this by directly comparing trained and control conditions, thereby offering new insights into the proteomic landscape of learning. This is now clarified on page 36.

      To substantiate the novelty and significance of our approach, we have revised the data presentation throughout the manuscript, including clearer candidate selection criteria, frequency-based representation of proteomic hits (Table 2), and neuron-specific enrichment analyses (Table S7 & Figure 4). We hope these improvements help convey the unique contribution of our study to the field.

      2) Authors found six mutants that have abnormality in the salt learning (Fig. 4). These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed. Although site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      Thank you Reviewer 4, for this encouraging feedback. To further strengthen the study and expand its relevance, we have tested additional mutants in response to Reviewer 3’s comments, as shown in Figures 6 & S7. These results provide even more candidate genes and pathways for future exploration, enhancing the significance and impact of our study.

  3. www.tripleeframework.com www.tripleeframework.com
    1. where the technology may simply be replacing a traditional method of instruction

      I think it is very important to remember this as an educator and parent. We have to be sure to maximize use and make it beneficial and worthwhile, not just replacing other instruction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We spotted and corrected several inconsistencies and mislabelling issues throughout the text and figures. Thanks!  

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We carefully reviewed the specific claims and fixed some of the wording so it adheres to the data shown.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We now carried out additional analysis to test this. We found that while AUDp and AUDv exhibit distinct tuning properties, they show similar differences between adolescent and adult neurons (see Supplementary Table 6, Fig. S7-1a-h). Note that TEa and AUDd could not be evaluated due to low numbers of modulated neurons in this protocol.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      That is a fair comment, and we refined our interpretations. Moreover, we also addressed whether impulsiveness impacted lick rates. In the Educage, we found that adolescent mice had shorter ITIs only after FAs (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs where licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). Finally, we note that potential differences in satiation were already addressed in the original manuscript by carefully examining the number of trials completed across the session. See also Review 3, comment #1 below.

      Reviewer #2 (Public review):

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We reviewed the manuscript carefully and revised the relevant sections to clarify the rationale behind the analyses. See detailed responses to all the reviewer’s specific comments.

      B) The results of optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We expanded our discussion on these experiments (L495-511) and also added an additional analysis to strengthen our findings (Fig. S3-2e).

      Reviewer #3 (Public review):

      (1) The authors report that "adolescent mice showed lower auditory discrimination performance compared to adults" and that this performance deficit was due to (among other things) "weaker cognitive control". I'm not fully convinced of this interpretation, for a few reasons. First, the adolescents may simply have been thirstier, and therefore more willing to lick indiscriminately. The high false alarm rates in that case would not reflect a "weaker cognitive control" but rather, an elevated homeostatic drive to obtain water. Second, even the adult animals had relatively high (~40%) false alarm rates on the freely moving version of the task, suggesting that their behavior was not particularly well controlled either. One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      irst, as requested, we added the Hit rates and FA rates for the head-fixed task (Fig. S3-1a). Second, as requested by the reviewr, we performed additional analyses in both the Educage and head-fixed versions of the task. Specifically, we analyzed the ITI duration following each trial outcome. We found that adolescent mice had shorter ITIs only after Fas (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs during which licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). See also comment #D of reviewer #1 above.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We carefully checked the text to ensure that each claim is accurately supported by the corresponding reference.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We appreciate the reviewer’s concern. While we acknowledge that pooling neurons across auditory cortical subregions may obscure region-specific effects, our primary focus in this study is on developmental differences between adolescents and adults, which were far more pronounced than subregional differences.

      To address this potential limitation: (1) We analyzed firing differences across subregions during task engagement (see Fig. S4-1, S4-2, S4-3; Supplementary Tables 2 and 3). (2) We have now added new analyses for the passive listening condition in AUDp and AUDv (Fig. S7-1; Supplementary Table 6).

      These analyses support our conclusion that developmental stage has a greater impact on auditory cortical activity than subregional location in the contexts examined. For clarity and cohesion, the main text emphasizes developmental differences, while subregional analyses are presented in the Supplement.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We agree that other cortical layers, particularly supragranular layers, are important for auditory processing and plasticity. Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L464-8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The presentation of the paper must be strengthened. As it is now, it makes it difficult to appreciate the strengths of the results. Here are some points that should be addressed:

      a) The manuscript is full of inconsistencies that should be fixed to improve the reader's understanding. For example, the description on l.217 and the Figure. S3-1b, the D' value of 0 rounded to 0.01 on l. 735 (isn't it rather the z-scored value that is rounded? A D' of 0 is not a problem), the definition of lick bias on l. 750 and the values in Fig.2, the legend of Figure 7F and what is displayed on the graph (is it population sparseness or responsiveness?), etc.

      We adjusted the legend and description of former Fig. S3-1b (now Fig. S3-2b).

      We now clarify that the rounded values refer to z-scored hit and false alarm rates that we used in the d’ calculation. We adjusted the definition of the lick bias in Fig. 2 and Fig. S3-1b (L804).

      We replaced ‘population responsiveness’ with ‘population sparseness’ throughout the figures, legend and the text.

      b) References to figures are sometimes wrong (for example on l. 737,739).

      c) Some text is duplicated (for example l. 814 and l. 837).

      d) Typos should be corrected (for example l. 127, 'the', l. 787, 'upto').

      We deleted the incorrect references of this section, removed the duplicated text, and corrected the typos.

      e) Color code should be changed (for example the shades of blue for easy and hard tasks - they are extremely difficult to differentiate).

      After consideration, we decided to retain the blue color code (i.e., Fig. 1d, Fig. 3d, Fig. 4e-g, Fig. 5c, Fig. 6d–g), where the distinction between the shades of blue appears sufficiently clear and maintains visual consistency and aesthetic appeal. We did however, made changes in the other color codes (Fig. 4, Fig. 5, Fig. 6, Fig. 7).

      f) Figure design should be improved. For example, why is a different logic used for displaying Figure 5A or B and Figure 1E?

      We adjusted the color scheme in Fig. 5. We chose to represent the data in Fig. 5 according to task difficulty, as this arrangement best illustrates the more pronounced deficits in population decoding in adolescents during the hard task.

      f) Why use a 3D representation in Figure 4G? (2)

      The 3D representation in Fig. 4g was chosen to illustrate the 3-way interactions between onset-latency, maximal discriminability, and duration of discrimination.

      g) Figure 1A, lower right panel- should "response" not be completed by "lick", "no lick"?

      We changed the labels to “Lick” and “No Lick” in Fig. 1a.

      h) l.18 the age mentioned is misleading, because the learning itself actually started 20 days earlier than what is cited here.

      Corrected.

      i) Explain what AAV5-... is on l.212.

      We added an explanation of virus components (see L216-220).

      (2) The comparison of CV in Figure 2 H-J is interesting. I am curious to know whether the differences in the easy and hard tasks could be due to a decrease in CV in adults, rather than an increase in CV in adolescents? Also, could the difference in J be due to 3 outliers?

      We agree that the observed CV differences may reflect a reduction in variability in adults rather than an increase in adolescents. We have revised the Results section accordingly to acknowledge this interpretation.

      Regarding the concern about potential outliers in Fig. 2J, we tested the data for outliers using the isoutlier function in MATLAB (defining outliers as values exceeding three standard deviations from the mean) and found no such cases.

      (3) Figure 2c shows that there is no difference in perceptual sensitivity between adolescents and adults, whereas the conclusion from Figure 4 is that adolescents exhibit lower discriminability in stimulus-related activity. Aren't these results contradictory?

      This is a nuanced point. The similar slopes of the psychometric functions (Fig. 2c) indicating comparable perceptual sensitivity and the lower AUC observed in the ACx of adolescents (Fig. 4) do not necessarily contradict each other. These two measures capture related but distinct issues: psychometric slopes reflect behavioral output, which integrates both sensory encoding and processing downstream to ACx, while the AUC analysis reflects stimulus-related neural activity in ACx, which may still include decision-related components.<br /> Note that stimulus-related neural discriminability outside the context of the task is not different between adolescent and adult experts (Fig. 7h; p = 0.9374, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). This suggests that there are differences that emerge when we measure during behavior. Also note that behavior may rely on processing beyond ACx, and it is possible that downstream areas compensate for weaker cortical discriminability in adolescents — but this issue merits further investigation.

      (4) Why do you think that the discrimination in hard tasks decreases with learning (Figure 6D vs Figure 6F)?

      This is another nuanced point, and we can only speculate at this stage. While it may appear counterintuitive that single-neuron discriminability (AUC) for the hard task is reduced after learning (Fig. 6D vs. 6F), we believe this may reflect a shift in sensory coding in expert animals. In a recent study (Haimson et al., 2024; Science Advances), we found that learning alters single-neuron responses in the easy versus hard task in complex and distinct ways, which may account for this result. It is also possible that, in expert mice, top-down mechanisms such as feedback from higher-order areas act to suppress or stabilize sensory responses in auditory cortex, reducing the apparent stimulus selectivity of single neurons (e.g., AUC), even as behaviorally relevant information is preserved or enhanced at the population level.

      Reviewer #2 (Recommendations for the authors):

      This is very interesting work and I enjoyed reading the manuscript. See below for my comments, queries and suggestions, which I hope will help you improve an already very good paper.

      We thank the reviewer for the meticulous and thoughtful review.

      (1) Line 107: x-axis of panel 1e says 'pre-adolescent'.

      (2) Line 130: replace 'less' with 'fewer'.

      (3) Line 153: 'both learned and catch trials': I find the terminology here a bit confusing. I would typically understand a catch trial to be a trial without a stimulus but these 'catch' trials here have a stimulus. It's just that they are not rewarded/punished. What about calling them probe trials instead?

      We corrected the labelling (1), reworded to ‘fewer’ and ‘probe trials’ (2,3).

      (4) Line 210: The results of the optogenetics experiments are very interesting. In particular, because the effect is so dramatic and much bigger than what has been reported in the literature previously, I believe. Lick rates are dramatically reduced suggesting that the mice have pretty much stopped engaging in the task and the authors very rightly state that the 'execution' of the behavior is affected. I think it would be worth discussing the implications of these results more thoroughly, perhaps also with respect to some of the lesion work. Useful discussions on the topic can be found, for instance, in Otchy et al., 2015; Hong et al., 2018; O'Sullivan et al., 2019; Ceballo et al., 2019 and Lee et al., 2024. Are the mice unable to hear anything in laser trials and that is why they stopped licking? If they merely had trouble distinguishing them then we would perhaps expect the psychometric curves to approach chance level, i.e. to be flat near the line indicating a lick rate of 0.5. Could the dramatic decrease in lick rate be a motor issue? Can we rule out spillover of the virus to relevant motor areas? (I understand all of the 200nL of the virus were injected at a single location) Or are the effects much more dramatic than what has been reported previously simply because the GtACR2 is much more effective at silencing the auditory cortex? Could the effect be down to off-target effects, e.g. by removing excitation from a target area of the auditory cortex, rather than the disruption of cortical processing?

      We have now expanded the discussion in the manuscript to more thoroughly consider alternative interpretations of the strong behavioral effect observed during ACx silencing (L495–511). In particular, we acknowledge that the suppression of licking may reflect not only impaired sensory discrimination but also broader disruptions to arousal, motivation, or motor readiness. We also discuss the potential impact of viral spread, circuit-level off-target effects, and the potency of GtACR2 as possible contributors. We highlight the need for future work using more graded or temporally precise manipulations to resolve these issues.

      (5) Line 226: Reference 19 (Talwar and Gerstein 2001) is not particularly relevant as it is mostly concerned with microstimulation-induced A1 plasticity. There are, however, several other papers that should be cited (and potentially discussed) in this context. In particular, O'Sullivan et al., 2019 and Ceballo et al., 2019 as these papers investigate the effects of optogenetic silencing on frequency discrimination in head-fixed mice and find relatively modest impairments. Also relevant may be Kato et al., 2015 and Lee et al., 2024, although they look at sound detection rather than discrimination.

      We changed the references and pointed the reader to the (new section) Discussion.

      (6) Line 253: 'engaged [in] the task.

      (7) Figure 4: It appears that panel S4-1d is not referred to anywhere in the main text.

      Fixed.

      (8) Line 260: Might be useful to explain a bit more about the motivation behind focusing on L5/L6. Are there mostly theoretical considerations, i.e. would we expect the infragranular layers to be more relevant for understanding the difference in task performance? Or were there also practical considerations, e. g. did the data set contain mostly L5/L6 neurons because those were easier to record from given the angle at which the probe was inserted? If those kinds of practical considerations played a role, then there is nothing wrong with that but it would be helpful to explain them for the benefit of others who might try a similar recording approach.

      There were no deep theoretical considerations for targeting L5/6.  Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L463–467). See also comment D of reviewer 3.

      (9) Supplementary Table 2: The numbers in brackets indicate fractions rather than percentages.

      Fixed.

      (10) Figure S4-3: The figure legend implies that the number of neurons with significant discriminability for the hard stimulus and significant discriminability for choice was identical. (adolescent neurons = 368, mice = 5, recordings = 10; adult n = 544, mice = 6, recordings = 12 in both cases). Presumably, that is not actually the case and rather the result of a copy/paste operation gone wrong. Furthermore, I think it would be helpful to state the fractions of neurons that can discriminate between the stimuli and between the choices that the animal made in the main text.

      Thank you for spotting the mistake. We corrected the n’s and added the percentage of neurons that discriminate stimulus and choice in the main text and the figure legend.

      (11) Line 301: 'We used a ... decoder to quantify hit versus correct reject trial outcomes': I'm not sure I understand the rationale here. For the single unit analysis hit and false alarm trials were compared to assess their ability to discriminate the stimuli. FA and CR trials were compared to assess whether neurons can encode the choice of the mice. But the hit and CR trials which are contrasted here differ in terms of both stimulus and behavior/choice so what is supposed to be decoded here, what is supposed to be achieved with this analysis?

      Thank you for this important point. You're correct that comparing hit and CR trials captures differences in both stimulus and choice, or task-related differences. We chose this contrast for the population decoding analysis to achieve higher trial counts per session and similar number of trials which are necessary for the reliability of the analysis. While this approach does not isolate stimulus from choice encoding, it provides an overall measure of how well population activity distinguishes task-relevant outcomes. We explicitly acknowledge this issue in L313-314.

      (12) Line 332: What do you mean when you say the novice mice were 'otherwise fully engaged' in the task when they were not trained to do the task and are not doing the task?

      By "otherwise fully engaged," we mean that novice mice were actively participating in the task environment, similar to expert mice — they were motivated by thirst and licked the spout to obtain water. The key distinction is that novice mice had not yet learned the task rules and likely relied on trial-and-error strategies, rather than performing the task proficiently.

      (13) Line 334: 'regardless of trial outcome': Why is the trial outcome not taken into account? What is the rationale for this analysis? Furthermore, in novice mice a substantial proportion of the 'go' trials are misses. In expert mice, however, the proportion of 'miss trials' (and presumably false alarms) will by definition be much smaller. Given this, I find it difficult to interpret the results of this section.

      This approach was chosen to reliably decode a sufficient number of trials for each task difficulty (i.e. expert mice predominantly performed CRs on No-Go trials and novice mice often showed FAs). Utilizing all trial outcomes ensured that we had enough trials for each stimulus type to accurately estimate the AUCs. This approach avoids introducing biases due to uneven trial numbers across learning stages.

      (14) Line 378: 'differences between adolescents and adults arise primarily from age': Are there differences in any of the metrics shown in 7e-h between adolescents and adults?

      We confirm that differences between adolescents and adults are indeed present in some metrics but not others in Figure 7e–h. Specifically, while tuning bandwidth was similar in novice animals, it was significantly lower in adult experts (Fig. 7e; novice: p = 0.0882; expert: p = 0.0001 Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The population sparseness was similar in both novice and expert adolescent and adult neurons (Fig. 7f; novice: p = 0.2873; expert: p = 0.1017, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The distance to the easy go stimulus was similar in novice animals, but lower in adult experts (Fig. 7g; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The neuronal d-prime was similar in both novice and expert adolescent and adult neurons (Fig. 7h; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript).

      (15) Line 475: '...well and beyond...': something seems to be missing in this statement.

      (16) Line 487: 'onto' should be 'into', I think.

      (17) Line 610 and 613: '3 seconds' ... '2.5 seconds': Was the response window 3s or 2.5s?

      (18) Line 638: 'set' should be 'setup', I believe.

      All the mistakes mentioned above, were fixed. Thanks.

      (19) Line 643: 'Reward-reinforcement was delayed to 0.5 seconds after the tone offset': Presumably, if they completed their fifth lick later than 0.5 seconds after the tone, the reward delivery was also delayed?

      Apologies for the lack of clarity. In the head-fixed version, there was no lick threshold. Mice were reinforced after a single lick. If that lick occurred after the 0.5-second reinforcement delay following tone offset, the reward or punishment was delivered immediately upon licking.

      (20) Line 661: 'effect [of] ACx'.

      (21) Line 680: 'a base-station connected to chassis'. The sentence sounds incomplete.

      (22) Line 746: 'infliction', I believe, should say 'inflection'.

      (23) Line 769: 'non-auditory responsive units': Shouldn't that simply say 'non-responsive units'? The way it is currently written I understand it to mean that these units were responsive (to some other modality perhaps) but not to auditory stimulation.

      (24) Line 791: 'bins [of] 50ms'.

      (25) Line 811: 'all of' > 'of all'.

      (26) Line 814: Looks like the previous paragraph on single unit analysis was accidentally repeated under the wrong heading.

      (27) Line 817: 'encoded' should say 'calculated', I believe.

      All the mistakes mentioned above were fixed. Thanks.

      (28) Line 869: 'bandwidth of excited units': Not sure I understand how exactly the bandwidth, i.e. tuning width was measured.

      We acknowledge that our previous answer was unclear and expanded the Methods section. To calculate bandwidth, we identified significant tone-evoked responses by comparing activity during the tone window to baseline firing rates at 62 dB SPL (p < 0.05). For each neuron, we counted the number of contiguous frequencies with significant excitatory responses, subtracting isolated false positives to correct for chance. We then converted this count into an octave-based bandwidth by multiplying the number of frequency bins by the octave spacing between them (0.1661 octaves per step).

      (29) Line 871: 'population sparseness': Is that the fraction of tone frequencies that produced a significant response? I would have thought that this measure is very highly correlated to your measure of bandwidth, to the point of being redundant, but I may have misunderstood how one or the other is calculated. Furthermore, the Y label of Figure 7f says 'responsiveness' rather than sparseness and that would seem to be the more appropriate term because, unless I am misunderstanding this, a larger value here implies that the neuron responded to more frequencies, i.e. in a less sparse manner.

      We have clarified the use of the term "population sparseness" and updated the Y-axis label in Figure 7f to better reflect this measure. This metric reflects the fraction of tone–attenuation combinations that elicited a significant excitatory response across the entire population of neurons, not within individual units.

      While this measure is related to bandwidth, it captures a distinct property of the data. Bandwidth quantifies how broadly or narrowly a single neuron responds across frequencies at a fixed intensity, whereas population sparseness reflects how distributed responsiveness is across the population as a whole. Although the two measures are related, since broadly tuned neurons often contribute to lower population sparseness, they capture distinct aspects of neural coding and are not redundant.

      (30) Line 881: I think this line should refer to Figure 7h rather than 7g.

      Fixed.

      Reviewer #3 (Recommendations for the authors):

      (1) In the Educage, water was only available when animals engaged in the task; however, there is no mention of whether/how animal weight was monitored.

      In the Educage, mice had continuous access to water by voluntarily engaging in the task, which they could perform at any time. Although body weight was not directly monitored, water access was essentially ad libitum, and mice performed hundreds of trials per day, thereby ensuring sufficient daily intake. This approach allowed us to monitor hydration (ad libitum food is supplied in the home cage). The 24/7 setup, including automated monitoring of trial counts and water consumption, was reviewed and approved by our institutional animal care and use committee (IACUC).

      (2) In Figure 2B-C and Figure 2E, the y-axis reads "lick rate". At first glance, I took this to mean "the frequency of licking" (i.e. an animal typically licks at a rate of 5 Hz). However, what the authors actually are plotting here is the proportion of trials on which an animal elicited >= 5 licks during the response window (i.e. the proportion of "yes" responses). I recommend editing the y-axis and the text for clarity.

      We replaced the y-label and adjusted the figure legend (Fig. 2).

      (3) I didn't see any examples of raw (filtered) voltage traces. It would be worth including some to demonstrate the quality of the data.

      We have added an example of a filtered voltage trace aligned to tone onset in Fig. S4-1a to illustrate data quality. In addition, all raw and processed voltage traces, along with relevant analysis code, are available through our GitHub repository and the corresponding dataset on Zenodo.

      (4) The description of the calculation of bias (C) in the methods section (lines 749-750) is incorrect. The correct formula is C = -0.5 * [z(hit rate) + z(fa rate)]. I believe this is the formula that the authors used, as they report negative C values. Please clarify or correct.

      Thanks for spotting this. It is now corrected.

      (5) The authors use the terms 'naïve' and 'novice' interchangeably. I suggest sticking with one term to avoid potential confusion.

      (6) Multiple instances: "less trials/day" should be "fewer trials/day"

      (7) Supplementary Table 2: The values reported are proportions, not percentages. Please correct.

      (8) Line 270: Table 2 does not show the number of neurons in the dataset categorized by region. Perhaps the authors meant Supplementary Table 2?

      Fixed. Thank you for pointing these mistakes out.

      (9) Figure 5C: the data from the hard task are entirely obscured by the data from the easy task. I recommend splitting it into two different plots.

      We agree and split the decoding of the easy and the hard task into two graphs (left: easy task; right: hard task). Thank you!

      (10) How many mice contributed to each analyzed data set? Could the authors provide a breakdown in a table somewhere of how many neurons were recorded in each mouse and which ones were included in which analyses?

      We added an overview of the analyzed datasets in supplementary Table 7. Please note that the number of mice and neurons used in each analysis is also reported in the main text and legends. Importantly, all primary analyses were conducted using LME models, which explicitly account for hierarchical data structure and inter-mouse variability, thereby addressing potential concerns about data imbalance or bias.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below. 

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain mostly the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. This would increase ribosome demand. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells. We have now rephrased the section between the lines 321 and 331 to clarify this point.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the present time point and the previous time point in the stochastic simulations, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the transcriptional burst frequency and the burst size, as well as the rate of mRNA removal. We have now incorporated more details about the modelling part along with explanation for parameters and terms in the revised manuscript (lines 390 to 411; lines 795 to lines 807). 

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise. 

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered. 

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 1B shows noise as Distance to Median (DM) that can be positive or negative. It is therefore misleading that the authors say there is a 10-fold increase in noise (this would be relevant if the quantity was strictly positive). How is the 10-fold estimated? Similar comments apply to Figure 1F and the estimated 37-fold. I also wonder if the datasets combined from different studies are necessarily compatible.

      We have now changed the statements and mentioned the actual noise values for different classes of genes rather than the fold-changes (lines 111-113 and 143-145). We agree that the measurements for mRNA expression levels, protein synthesis rates and protein noise were obtained from experiments done by different research labs, and this could introduce more variation in the data. However, it is unlikely the experimental variations are likely to be random and do not bias any specific class of genes (in Fig. 1B and Fig. 1F) more than others.  

      (2)   How Figure 1D has been generated seems confusing, the authors state this is based on the Gillespie algorithm, but in panel 1C and also in the methods, they are writing ODEs and Equations 3 and 4 stating the Euler method for the solution of ODEs. Also, I am concerned if this has been done at steady-state. The protein noise for the two-state model can be analytically obtained, and instead of simulations, the authors could have just used the expression. Also, Figure 1D shows CV while the corresponding data Figure 1B is showing mean adjusted DM. So, I am not sure if the comparison is valid. I am also very confused about the fact that the authors show CV does not depend on the mean expression of proteins and mRNA. Analytical solutions suggested there is always an inverse relationship exists between CV and mean and this has also been experimentally observed (see for example Newman et al 2006).

      We used Gillespie algorithm for stochastic simulations and identified the time points when an event (for example, switching to ON or OFF states during transcriptional bursting) occurred. If an event occurred at a time point, the rates of the reactions were guided by the equations 3 and 4, as the rates of reactions were dependent on the number of mRNA (or protein) molecules present, production rates and removal rates. 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise (Newman et al., 2006). For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. 

      The work of Newman et al. (2006) measures noise values of different genes with different transcriptional burst characteristics and different mRNA and protein removal rates. We also see similar results in our simulations (Fig. 1E), where as we increase the mean expression by changing the transcriptional burst frequency, the protein noise goes down.     

      (3) Estimating parameters of gene expression using reference 44 ignores the effect of variability in capture efficiency and cell size. In a recent paper, Tang et al Bioinformatics 39 (7), btad395 2023 addressed this issue.

      Thank you for referring to the work of Tang et al. (2023). We note that the cell size and capture efficiency have a small effect on the burst frequency (Kon) but has a more pronounced effect on burst size (Tang et al., 2023). In our analysis, we considered only burst frequency and even with likely small inaccuracies in our estimation of Kon, we can capture interesting association of burst frequency with noise trends. 

      (4) In the methods "αp = 0.007 per mRNA molecule per unit time", I believe it should be per protein molecule per unit time.

      Corrected.

      (5)  Figure 3 uses TASEP modelling but the details of this modelling are not described well.

      We have now expanded the description of the modelling approach in the revised manuscript (lines 391-412; lines 693-776 and lines 797-809). In addition, we have also added more details in the figure captions. 

      (6) Another overall issue is that when the authors talk about changes in burst frequency or changes in translation efficiency, it is not always clear, is this done while keeping all the other parameters constant therefore changing mean expressions, or is this done by keeping the mean expressions constant?

      To test for the association between mean protein expression and protein noise, we have varied the mean expression by changing the translation initiation rate (TLinit) for the most part of the manuscript while keeping other parameters constant. In figure 5, where we decoupled TLinit from ribosome traversal rate (V), we changed the mean protein expression by changing the ribosome traversal rate while keeping other parameters constant. We have now mentioned this in the manuscript. 

      (7)   I believe Figures 5 and 6 present the same data in different ways, I wonder if these can be combined or if some aspect of the data in Figure 5 could go to supplementary. Also, the statistical tests in Figure 5E and F are not clear what they are testing.

      We have now moved figures 5E and 5F to the supplement (Fig. S20). We have also added details of the statistical test in the figure caption. 

      Reviewer #2 (Public review): 

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below. 

      Major comments: 

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules of specific genes at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, the ribosome demand model had two predictions both of which could be validated through modelling as well as from our experiments. 

      To further investigate whether removing ribosome demand from our model could eliminate the positive mean-noise correlation for a gene, we have now tested two additional sets of models where we decoupled the translation initiation rate (TLinit) from the ribosome traversal speed (V). In the first model, we changed the mean protein expression by changing the translation initiation rate but keeping the ribosome traversal speed constant. Thus, in this scenario, ribosome demand varied according to the variation in the translation initiation rate. As expected, the positive correlation between mean expression and protein noise was maintained in this condition (Fig. 5B). In the second model, we changed the mean expression by changing the ribosome traversal speed but keeping the translation initiation rate (and therefore, the ribosome demand) constant. In this situation, the relationship between mean expression and protein noise turned negative (Fig. 5B and fig. S16). These results further pointed that the ribosome demand was indeed driving the positive relationship between mean expression and protein noise. 

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      We agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, but it has been observed in studies across bacteria, yeast, and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the relative strength of these associations, but to understand the molecular basis of the influence of translational efficiency on protein noise. 

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We have revised the figure captions to include more details as per the reviewer’s suggestion. 

      (4)  It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells. 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-tomedian (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise. For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. We now mention this in line 123-124. We used the measure of protein synthesis rate per mRNA as the measure of translational efficiency (Riba et al., 2019; line 100). Alternatively, we also used tRNA adaptation index (tAI) as a measure of translational efficiency, as codon choice could also influence the translation rate per mRNA molecule (Tuller et al., 2010) (line 193). 

      The protein noise was quantified from the signal intensity of GFP tagged proteins (Newman et al., 2006; and our data), which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.  

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they may not be new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models. 

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a basal translation initiation rate depending on the mRNA numbers and other variables. We changed the basal translation initiation rate to alter the mean protein expression levels. We have now elaborated the modelling section to incorporate these details in the revised manuscript (lines 404 to 412). 

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the basal translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description of the model that the changes in the translation initiation rate were also linked to changes in the translation elongation rate (Fig. 3D). Thus, an increase in the translation initiation rate was associated with faster ribosome traversal through an mRNA molecule. This has also been observed in an experimental study by Barrington et al. (2023). Therefore, the models can also be expressed in terms of the translation elongation rate or ribosome traversal speed, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1)  The discussion from lines 180 to 182 appears consistent with Figure 1E. It seems that the twostate model can already explain why the genes with high burst frequency and high protein synthesis rate showed a small protein noise. It is unclear to me the purpose of this discussion.

      Yes, the results from Fig. 1E were from stochastic simulations, whereas the results discussed in the lines 191 to 193 (in the revised manuscript) were based on our analysis of experimental data that is shown in Fig. 2D.

      (2)  If I understand correctly, "translational efficiency" is the same as "protein synthesis rate" in this work. It would be helpful if the authors could keep the same notation throughout the paper to avoid confusion.

      The protein synthesis rate per mRNA molecule is the best measure of translational efficiency, and we used the experimental data from Riba et al. (2019) for this purpose (line 99-100). Alternatively, we also used tRNA Adaptation Index (tAI) as a measure of translational efficiency, as the codon choice also influences the rate at which an mRNA molecule is translated (Tuller et al., 2010) (line 192). 

      (3) On line 227, does "higher translation rate" mean "higher translation initiation rate"? The same issues happen in a few places in this paper.

      Corrected now (line 243 in the revised manuscript and throughout the manuscript). 

      (4) The discussion from lines 296 to 301 is unclear. It is not obvious to me how the authors obtained the conclusion that lowering translational efficiency would decrease the protein expression noise.

      High translational efficiency will require more ribosomes and hence, will increase ribosome demand. If ribosome demand is the molecular basis of high expression noise for genes with bursty transcription and high translational efficiency, then we can expect a reduction in ribosome demand and a reduction in noise if we lower the translational efficiency. We have rephrased this section for clarity between the lines 334 and 339 in the revised manuscript.   

      (5)  On line 324, should slower translation mean a shorter distance between neighboring ribosomes? One can imagine the extreme limit in which ribosomes move very slowly so that the mRNA is fully packed with ribosomes. 

      Slower translation or ribosome traversal rate would also lower the translation initiation rate (Barrington et al., 2023). Slower traversal of ribosomes reduces the chances of collision in case of transient slow-down of ribosomes due to occurrence of one or more non-preferred codons. We have now clarified this part in the lines 360 to 369 in the revised manuscript.

      (6) The text from lines 423 to 433 can be put in Methods.

      We have already added this part to the methods section (lines 900 to 910) and now minimize this discussion in the results section. 

      (7)  The discussion from lines 128 to 130 is unclear, and the statement appears to be consistent with the two-state model (see Figure 1E). The meaning of "initial mRNA numbers" is also unclear.

      An earlier study has proposed that essential genes in yeast employs high transcription and low translation strategy for expression, likely to maintain low expression noise in these genes and to prevent detrimental effects of high expression noise (Fraser et al., 2004). However, there has been no direct supportive evidence. Therefore, we were testing whether the differences in mRNA levels and translational efficiency of genes can lead to differences in protein noise through stochastic simulations. The discussion between the lines 130 and 132 in the revised manuscript summarises the results of the simulations. 

      Initial mRNA numbers - mRNA copy numbers that are present in the cell at the start of stochastic simulations. However, we have now changed it to ‘mRNA levels’ in the revised manuscript for clarity (line 131 in the revised manuscript).

      (8)  On line 212, is the translation initiation rate TL_init the same thing as beta_p in Figure 3A?

      βp refers to the rate of protein synthesis, which is influenced by the translational burst kinetics as well as the translation initiation rate, whereas TLinit refers to the translation initiation rate. So, these parameters are related, but are not the same.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Floedder et al report that dopamine ramps in both Pavlovian and Instrumental conditions are shaped by reward interval statistics. Dopamine ramps are an interesting phenomenon because at first glance they do not represent the classical reward prediction errors associated with dopamine signaling. Instead, they seem somewhat to bridge the gap between tonic and phasic dopamine, with an intense discussion still being held in the field about what is their actual behavioral role. Here, in tests with head-fixed mice, and dopamine being recorded with a genetically encoded fluorescent sensor in the nucleus accumbens, the authors find that dopamine ramps were only present when intertrial intervals were relatively short and the structure of the task (Pavlovian cue or progression in a VR corridor) contained elements that indicated progression towards the reward (e.g., a dynamic cue). The authors show that these findings are well explained by their previously published model of Adjusted Net Contingency of Causal Relation (ANCCR).

      Strengths:

      This descriptive study delineates some fundamental parameters that define dopamine ramps in the studied conditions. The short, objective, and to-the-point format of the manuscript is great and really does a service to potential readers. The authors are very careful with the scope of their conclusions, which is appreciated by this reviewer.

      We thank the reviewer for their overall support of the formatting and scope of the manuscript. 

      Weaknesses:

      The discussion of the results is very limited to the conceptual framework of the authors' preferred model (which the authors do recognize, but it still is a limitation). The correlation analysis presented in panel l of Figure 3 seems unnecessary at best and could be misleading, as it is really driven by the categorical differences between the two conditions that were grouped for this analysis. There are some key aspects of the data and their relationship with each other, the previous literature, and the methods used to collect them, that could have been better discussed and explored.

      We agree with the reviewer that a weakness of the discussion was the limited framing of the results within the ANCCR model. To address this, we have expanded our introduction and discussion sections to provide a more thorough explanation of our model and possible leading alternatives.

      We thank the reviewer for pointing out that Figure 3l may be misleading for readers; we removed this panel from the revised Figure 4.

      We have further addressed the specific concerns raised by the reviewer in their comments to the authors. Indeed, we agree with the reviewer that the original manuscript was narrow in its focus regarding relationships between different aspects of the data. To more thoroughly explore how key variables – including dopamine ramp slope and onset response as well as licking behavior slope – could relate to each other, we have added Extended Data Figure 8. In this figure, we show that no correlations exist between any of these key variables in either dynamic tone condition; it is our hope that this additional analysis highlights the significance of the clear relationship between dopamine ramp slope and ITI duration. 

      Reviewer #2 (Public Review):

      In this manuscript by Floeder et al., the authors report a correlation between ITI duration and the strength of a dopamine ramp occurring in the time between a predictive conditioned stimulus and a subsequent reward. They found this relationship occurring within two different tasks with mice, during both a Pavlovian task as well as an instrumental virtual visual navigation task. Additionally, they observed this relationship only in conditions when using a dynamic predictive stimulus. The authors relate this finding to their previously published model ANCCR in which the time constant of the eligibility trace is proportionate to the reward rate within the task.

      The relationship between ITI duration and the extent of a dopamine ramp which the authors have reported is very intriguing and certainly provides an important constraint for models for dopamine function. As such, these findings are potentially highly impactful to the field. I do have a few questions for the authors which are written below.

      We thank the reviewer for their interest in our findings and belief in their potential to be impactful in the field. 

      (1) I was surprised to see a lack of counterbalance within the Pavlovian design for the order of the long vs short ITI. Ramping of the lick rate does increase from the long-duration ITIs to the short-duration ITI sessions. Although of course, this increase in ramping of the licking across the two conditions is not necessarily a function of learning, it doesn't lend support to the opposite possibility that the timing of the dynamic CS hasn't reached asymptotic learning by the end of the long-duration ITI. The authors do reference papers in which overtraining tends to result in a reduction of ramping, which would argue against this possibility, yet differential learning of the dynamic CS would presumably be required to observe this effect. Do the authors have any evidence that the effect is not due to heightened learning of the timing of the dynamic CS across the experiment?

      We appreciate the reviewer expressing their surprise regarding the lack of counterbalance in our Pavlovian experimental design. We previously did not explicitly do this because the ramps disappeared in the short ITI/fixed tone condition, indicating that their presence is not just a matter of total experience in the task. However, we agree that this is incidental, but not direct evidence. To address this drawback, we repeated the Pavlovian experiment in a new cohort of animals with a revised training order, switching conditions such that the short ITI/dynamic tone (SD) condition preceded the long ITI/dynamic tone (LD) condition (see revised Figure 2a). Despite this change in the training order, the main findings remain consistent: positive dLight slopes (i.e., dopamine ramps) are only observed in the SD condition (Figure 2b-d). 

      We thank the reviewer for raising these questions regarding licking behavior and learning and their relationship with dopamine ramps. Indeed, a closer look at the average licking behavior reveals subtle differences across conditions (Figure 1f and Extended Data Figure 5a). While the average lick rate during the ramp window does not differ across conditions (Extended Data Figure 5c), the ramping of the lick rate during this window is higher for dynamic tone conditions compared to fixed tone conditions (Extended Data Figure 5d). Despite these differences, we still believe that the main comparison between the dopamine slope in the SD vs LD condition remains valid given their similar lick ramping slopes. Furthermore, our primary measure of learning is not lick slope, but anticipatory lick rate during the 1 s trace preceding reward delivery, which is robustly nonzero across cohorts and conditions (Figure 1g and Extended Data Figure 5b). 

      Taken together, we hope that the results from our counterbalanced Pavlovian training and more rigorous analysis of lick behavior across conditions provide sufficient evidence to assuage concerns that the differences in ramping dopamine simply reflect differences in learning. 

      (2) The dopamine response, as measured by dLight, seems to drop after the reward is delivered. This reduction in responding also tends to be observed with electrophysiological recordings of dopamine neurons. It seems possible that during the short ITI sessions, particularly on the shorter ITI duration trials, that dopamine levels may still be reduced from the previous trial at the onset of the CS on the subsequent trial. Perhaps the authors can observe the dynamics of the recovery of the dopamine response following a reward delivery on longer-duration ITIs in order to determine how quickly dopamine is recovering following a reward delivery. Are the trials with very short ITIs occurring within this period that dopamine is recovering from the previous trial? If so, how much of the effect may be due to this effect? It should be noted that the lack of observance of a ramp on the condition of shortduration ITIs with fixed CSs provides a potential control for this effect, yet the extent to which a natural ramp might occur following sucrose deliveries should be investigated.

      We thank the reviewer for highlighting the possibility that ramps may be due to the dopamine response recovery following reward delivery. Given that peak reward dopamine responses tend to be larger in long ITI conditions, however, we felt that it was inappropriate to compare post-reward dopamine recovery times across conditions. Instead, we decided to directly compare the dLight slope 2s before cue onset (“pre-cue window,” a proxy for recovery from previous trial) with the dLight slope during our ramp window from 3 to 8s after cue onset (Extended Data Figure 6a). There were no significant differences in pre-cue dLight slope across conditions (Extended Data Figure 6b); this suggests that the ramping slopes seen in the SD condition, but not other conditions, is not simply due to the natural dopamine recovery response following reward delivery. Furthermore, if the dopamine ramps observed in the SD condition were a continuation of the post-reward dopamine recovery from the previous trial, we would expect to see a positive correlation between the dLight slope before and during the cue. However, there is no such correlation between the dLight slopes in the ramp window vs. pre-cue window in the SD condition (Extended Data Figure 6c-d). We believe that this observation, along with the builtin control of the SF condition mentioned by the reviewer, serves as evidence against the possibility of our ramp results being due to a natural ramp after reward delivery.

      (3) The authors primarily relate the finding of the correlation between the ITI and the slope of the ramp to their ANCCR model by suggesting that shorter time constants of the eligibility trace will result in more precisely timed predictors of reward across discrete periods of the dynamic cue. Based on this prediction, would the change in slope be more gradual, and perhaps be more correlated with a broader cumulative estimate of reward rate than just a single trial?

      To clarify, we do not propose that a smaller eligibility trace time constant results in more precise timing per se. Instead, we believe that the rapid eligibility trace decay from smaller time constants gives greater causal predictive power for later periods in the dynamic cue (see Extended Data Figure 1) since the memory of the earlier periods of the cue is weaker. 

      We appreciate the reviewer’s curiosity regarding the influence of a broader cumulative estimate of reward vs. only the immediately preceding ITI on dopamine ramp slopes. Indeed, in several instrumental tasks (e.g., Krausz et al., Neuron, 2023), recent reward rate modulates the magnitude of dopamine ramps, making this an important variable to investigate. We chose to use linear regression for each mouse separately to analyze the relationship between the trial dopamine slope and the average previous ITI for the past 1 through 10 most recent trials. In the SD condition, as reported in our earlier manuscript, there was a significantly negative dependence of trial dopamine slope with the single previous ITI (i.e., if the previous ITI was long, the next trial tends to have a weaker ramp). This negative dependence, however, only held for a single previous trial; there was no clear relationship between the per-trial dopamine slope and the average of the past 2 through 10 ITIs (Extended Data Figure 7a). For the LD condition, on the other hand, there is no clear relationship between the per-trial dopamine slope and the average previous ITI for any of the past 1 through 10 trials, with one exception: there is a significantly negative dependence of trial dopamine slope with the average ITI of the previous 2 trials (Extended Data Figure 7b). This longer timescale relationship in the LD condition suggests that the adaptation of the eligibility trace time constant is nuanced and depends on the general ITI length. 

      In general, though we reason that the eligibility trace time constant should depend on overall event rates, we do not currently propose a real-time update rule for the eligibility trace time constant depending on recent event rates. Accordingly, we are currently agnostic about the actual time scale of history of recent event rate calculation that mediates the eligibility trace time constant. Our experimental results suggest that when the ITI is generally short for Pavlovian conditioning, the eligibility trace time constant adapts to ITI on a rapid timescale. However, only a small fraction of the variability of this rapid fluctuation is captured by recent ITI history. A more thorough investigation of this real-time update rule would need to be done in the future.

      Reviewer #3 (Public Review):

      Summary:

      Floeder and colleagues measure dopamine signaling in the nucleus accumbens core using fiber photometry of the dLight sensor, in Pavlovian and instrumental tasks in mice. They test some predictions from a recently proposed model (ANCCR) regarding the existence of "ramps" in dopamine that have been seen in some previous research, the characteristics of which remain poorly understood.

      They find that cues signaling a progression toward rewards (akin to a countdown) specifically promote ramping dopamine signaling in the nucleus accumbens core, but only when the intertrial interval just experienced was short. This work is discussed in the context of ongoing theoretical conceptions of dopamine's role in learning.

      Strengths:

      This work is the clearest demonstration to date of concrete training factors that seem to directly impact whether or not dopamine ramps occur. The existence of ramping signals has long been a feature of debates in the dopamine literature and this work adds important context to that. Further, as a practical assessment of the impact of a relatively simple trial structure manipulation on dopamine patterns, this work will be important for guiding future studies. These studies are well done and thoughtfully presented.

      We thank the reviewer for recognizing the context that our study adds to the dopamine literature and the potential for our experiments to guide future work. 

      Weaknesses:

      It remains somewhat unclear what limits are in place on the extent to which an eligibility trace is reflected in dopamine signals. In the current study, a specific set of ITIs was used, and one wonders if the relative comparison of ITI/history variables ("shorter" or "longer") is a factor in how the dopamine signal emerges, in addition to the explicit length ("short" or "long") of the ITI. Another experimental condition, where variable ITIs were intermingled, could perhaps help clarify some remaining questions.

      Though we used ITIs of fixed means, due to the exponential nature of their distribution, we did intermingle ITIs of various durations in both our long and short ITI conditions. The distribution of ITI durations is visualized in Figure 1c for Pavlovian conditioning and Extended Data Figure 9b for VR navigation. 

      The relative comparison between consecutive ITIs was not something we originally explored, so we thank the reviewer for wondering how it impacts the dopamine signal. To investigate this, we quantified both the change in ITI (+ or - Δ ITI for relatively longer or shorter, respectively) and the change in dopamine ramp slope between consecutive trials in the SD condition (Figure 3d). Across each mouse separately, we found a significantly negative relationship between Δ slope and Δ ITI (Figure 3e-f). Also, the average Δ slope was significantly greater for consecutive trials with a Δ ITI below -1 s compared to trials with a Δ ITI above +1 s (Figure 3g). Altogether, these findings suggest that relative comparison of ITIs does correlate with changes in the dopamine signal; a relatively longer ITI tends to have a weaker ramp, which fits in nicely with the expected inverse relationship between ITI and dopamine ramp slope from our ANCCR model.

      In both tasks, cue onset responses are larger, and longer on long ITI trials. One concern is that this larger signal makes seeing a ramp during the cue-reward interval harder, especially with a fluorescence method like photometry. Examining the traces in Figure 1i - in the long, dynamic cue condition the dopamine trace has not returned to baseline at the time of the "ramp" window onset, but the short dynamic trace has. So one wonders if it's possible the overall return to baseline trend in the long dynamic conditions might wash out a ramp.

      This is a good point, and we thank the reviewer for raising it. Certainly, the cue onset response is significantly larger in long ITI conditions (see Figure 1i-j and Figure 4h-j). To avoid any bleed over effect, we intentionally chose ramp window periods during later portions of the trial (in line with work from others e.g., Kim et al., Cell, 2020). While the cue onset dopamine pulse seems to have flatlined by the start of the ramp window period, the dopamine levels clearly remain elevated relative to pre-cue baseline. This type of signal has been observed with fiber photometry in other Pavlovian conditioning paradigms with long cue durations (e.g., Jeong et al., Science, 2022). Because of the persistently elevated dopamine levels, it is certainly possible that a ramping signal during the cue is getting washed out; with the bulk fluorescence photometry technique we employed in this study, this possibility is unfortunately difficult to completely rule out. However, the long ITI/fixed tone (LF) condition could serve as a potential control given the overall similarity in the dopamine signal between the LF and LD conditions: both conditions have large cue onset responses with elevated dopamine throughout the duration of the cue (see Extended Data Figures 2c and 3c). Critically, the LD condition lacks a noticeable ramp despite the dynamic tone providing information on temporal proximity to reward, which is thought to be necessary for dopamine ramps to occur. Importantly, regardless of whether a ramp is masked in the long ITI dynamic condition, most studies investigate such a condition in isolation and would report the absence of dopamine ramps. Thus, at a descriptive level, we believe it remains true that observable dopamine ramps are only present when the ITI is short. 

      Not a weakness of this study, but the current results certainly make one ponder the potential function of cue-reward interval ramps in dopamine (assuming there is a determinable function). In the current data, licking behavior was similar on different trial types, and that is described as specifically not explaining ramp activity.

      We agree that this work naturally raises the question of the function of dopamine ramps. However, selective and precise manipulation of only the dopamine ramps without altering other features such as phasic responses, or inducing dopamine dips, is highly technically challenging at this moment; due to this challenge, we intentionally focused on the conditions that determine the presence or absence of dopamine ramps rather than their function. We agree with the reviewer that studying the specific function of dopamine ramps is an interesting future question. 

      Reviewing Editor:

      The reviewers felt the results are of considerable and broad interest to the neuroscience community, but that the framing in terms of ANCCR undermined the scope of the findings as did the brief nature of the formatting of the manuscript. In addition, the reviewers felt that the relationship between ramp dynamics, behavior, and ITI conditions requires more in-depth analyses. Relatedly, the lack of counterbalancing of the ITI durations was considered to be a drawback and needs to be addressed as it may affect the baseline. Addressing these issues in a satisfactory manner would improve the assessment of the manuscript to important/convincing.

      We truly appreciate the valuable feedback provided on this manuscript by all three reviewers and the reviewing editor. Based on this input, we have significantly revised the manuscript to address the issues brought up by the reviewers. Firstly, we have conducted additional experiments to counterbalance the ITI conditions for Pavlovian conditioning; this strengthened our results by confirming our original findings that ITI duration, rather than training order, is the key variable controlling the presence or absence of dopamine ramps. Secondly, we completed more rigorous analyses to further explore the relationship between dopamine dynamics, animal behavior, and ITI duration; we generally found no significant correlations between these variables, with a notable exception being our main finding between ITI duration and dopamine ramp slope. Finally, we revised and expanded our writing to both explain predictions from our ANCCR model in less technical language and explore how alternative theoretical frameworks could potentially explain our findings. In doing so, we hope that our manuscript is now more accessible and of interest to a broad audience of neuroscience readers.

      Reviewer #1 (Recommendations For The Authors):

      The study could be improved if the authors performed a more detailed comparison of how other theoretical frameworks, beyond ANCCR could account for the observed findings. Also, the correlation analysis presented in the panel I of Figure 3 seems unnecessary and potentially spurious, as the slope of the correlation is clearly mostly driven by the categorical differences between the two ITI conditions, which were combined for the analysis - it's not clear what is the value of this analysis beyond the group comparison presented in the following panel.

      Again, we thank the reviewer for elaborating on their concern regarding Figure 3l – we have removed it from the revised Figure 4. 

      The relationship between ramp dynamics with the behavior and the large differences in cue onset responses between short and long ITI conditions could have been better explored. If I understand correctly the overarching proposal of this and other publications by this group, then the differences in cue responses is determined by the spacing of rewards in a somewhat similar way that the ramps are. So, is there a trial-by-trial correlation between the amplitude of the cue responses and the slope of the ramps? Is there a correlation between any of these two measures with the licking behavior, and if so, does it change with the ITI condition? A more thorough exploration of these relationships would help support the proposal of the primacy of inter-event spacing in determining the different types of dopamine responses in learning.

      There are certainly interesting relationships between dopamine dynamics, behavior, and ITI that we failed to explore in our original manuscript – we appreciate the reviewer bringing them up. We found no correlation between dopamine ramp slope and cue onset response in either the SD or LD condition (Extended Data Fig 8a-b). Moreover, we found no correlation between either of these variables and the trial-by-trial licking behavior (Extended Data Fig 8c-f). Finally, there is no relationship between licking behavior and previous ITI duration (Extended Data Fig 8g-h), suggesting that behavioral differences do not account for differences in the dopamine ramp slope. Together, the lack of significant relationships between these other variables highlights the specific, clear relationship between ITI duration and dopamine ramp slope. 

      Finally, another issue I feel could have been better discussed is how the particular settings of both tasks might be biasing the results. For example, there is an issue to be considered about how the dopamine ramp dynamics reported here, especially the requirement of a dynamic cue for ramps to be present, square with the previous published results by one of the authors - Mohebi et al, Nature, 2019. In that manuscript, rats were executing a bandit task where, to this reviewer's understanding, there was no explicit dynamic cue aside from the standard sensory feedback of the rats moving around in the behavior boxes to approach a nose poke port. Is the idea that this sensory feedback could function as a dynamic cue? If that's the case, then this short-scale, movement-related feedback should also function as a dynamic cue in a freely moving Pavlovian condition, when the animals must also move towards a reward delivery port, right? Therefore, could it be that the experimental "requirement" of a dynamic cue is only present in a head-fixed condition? One could phrase this in a different way to Steelman and potentially further the authors' proposal: perhaps in any slightly more naturalistic setting, the interaction of the animals with their environment always functions as a dynamic cue indicating proximity to reward, and this relationship was experimentally isolated by the use of head fixation (but not explicitly compared with a freely moving condition) in the present study. I think that would be an interesting alternative to consider and discuss, and perhaps explore experimentally at some point.

      We thank the reviewer for raising this important point regarding the influence of our experimental settings on our results. At first glance, it could appear that our results demonstrating the necessity of a dynamic cue for ramps in a head-fixed setting do not fit neatly with other results in a freely moving setup (e.g., Collins et al., Scientific Reports, 2016; Mohebi et al., Nature, 2019). Exactly as the reviewer states though, we believe that sensory feedback from the environment in freely moving preparations serves the same function as a dynamic progression of cues. We have considered the implications of methodological differences between head-fixed and freely moving preparations in the discussion section. 

      Reviewer #2 (Recommendations For The Authors):

      This comment relates indirectly to comment 3, in that the authors intermix theory throughout the manuscript. I think this would be fine if the experiment was framed directly in terms of ANCCR, but the authors specifically mention that this experiment wasn't developed to distinguish between different theories. As such, it seems difficult to assess the scope of the comments regarding theory within the paper because they tend to be specifically related to ANCCR. For instance, the last comment has broad implications of how the ramp might be related to the overall reward rate, an interesting finding that constrains classes of dopamine models rather than evidence just for ANCCR. Perhaps adding a discussion section that allows the authors to focus more on theory would be beneficial for this manuscript.

      We appreciate this suggestion by the reviewer. We have updated both our introduction and discussion sections to elaborate more thoroughly on theory.

      Reviewer #3 (Recommendations For The Authors):

      The paper could potentially benefit from the use of more accessible language to describe the conceptual basis of the work, and the predictions, and a bit of reformatting away from the brief structure with lots of supplemental discussion.

      For example, in the introduction, the line - "Varying the ITI was critical because our theory predicts that the ITI is a variable controlling the eligibility trace time constant, such that a short ITI would produce a small time constant relative to the cue-reward interval (Supplementary Note 1)". As far as I can tell, this is meant to get across the notion that dopamine represents some aspect of the time between rewards - dopamine signals will differ for cues following short vs long intervals between rewards.

      As written, the language of the paper takes a fair bit of parsing, but the notions are actually pretty simple. This is partly due to the brief format the paper is written in, where familiarity with the previous papers describing ANCCR is assumed.

      From a readability standpoint, and the potential impact of the paper on a broad audience, perhaps this could be considered as a point for revision.

      We thank the reviewer for pointing out the drawbacks of our technical language and brief formatting. To address this, we have removed the majority of the supplementary notes and expanded our introduction and discussion sections. In doing so, we hope that the conceptual foundations of this work, and potential alternative theoretical explanations, are accessible and impactful for a broad audience of readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods-sampling cerebrospinal fluid are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Figure 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Figure 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post-infection (Figure 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Figure 6B), and in whole blood from 2 out of 2 samples (Figure 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine. 

      Strengths: 

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples suggest that whole blood (but not plasma) could be tested for smallRNA-1 to diagnose N. fowleri infections. 

      Thank you for comments regarding the strengths of this study. We agree that our data for detecting the biomarker in biofluids from mice is convincing. In addition, our spike-in studies with human cerebrospinal fluid, plasma, and urine (Figure 6) suggest these biofluids from humans could be used for diagnosis.

      We appreciate the comment regarding plasma and recognize this was not fully explained in the manuscript. We do believe that plasma can be used to assess the biomarker. Firstly, we demonstrated equivalent sensitivity of the method to detect smallRNA-1 in plasma and urine in mice with end-stage PAM (Figure 5). In addition, spike in samples of human plasma, cerebrospinal fluid, and urine demonstrated equivalent sensitivity of detecting the biomarker (Figure 6). 

      The negative result for human plasma in Figure 6C requires clarification; this sample was convalescent plasma from a survivor. The patient presented to the hospital on August 7, 2016, was treated, made a remarkable recovery, and was released from the hospital later that month. The plasma sample in Figure 6C was collected September 7, 2016, which is a month after treatment was initiated and weeks after the patient was symptom free. Our interpretation of the convalescent plasma result is the patient had cleared the active amoeba infection and that is why we did not detect the biomarker. We have added text in the discussion and in the legend for Figure 6 to clarify the convalescent plasma result. 

      One additional caveat for consideration is that many of the samples we received from amoebaeinfected humans were stored at room temperatures for undefined periods of time before being moved to <-20°C (see details in Table S9). We can’t rule out possible sample degradation, but this is an unfortunate reality of obtaining human samples from individuals later confirmed to be infected with pathogenic free-living amoebae.

      Weaknesses: 

      (1) There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood (only 2 samples were tested, both had detectable smallRNA-1), serum (1 out of 1 sample tested negative), or human urine (presumably there is no material available for testing). This limitation is openly discussed in the last paragraph of the discussion section. 

      We agree the extremely limited availability of human samples is a limitation of this study. Given the rarity of these infections in the United States, even prospective studies to systematically collect samples would be very challenging. We hope that by publishing the details of this biomarker detection is that the method can be used by diagnostic reference centers, especially in areas where outbreaks of multiple cases per year have been reported.

      (2) There seems to be some noise in the data for uninfected samples (Figures 4B-C, 5B, and 6C), especially for those with serum (2E). While this is often orders of magnitude lower than the positive results, it does raise questions about false positives, especially early in infection when diagnosis would be the most useful. A few additional uninfected human samples may be helpful. 

      We agree; however, we would like to point out the progression of disease in humans and mice are similar. Typically, patients survive between 10-14 days after presumed exposure and mice have similar survival times following instillation of N. fowleri amoebae into a nare of the mouse. Therefore, detection of this biomarker as early as 72 h in mice is seemingly equivalent to the onset of initial symptoms in humans.  

      Reviewer #2 (Public review): 

      Summary: 

      The authors sought to develop a rapid and non-invasive diagnostic method for primary amoebic meningoencephalitis (PAM), a highly fatal disease caused by Naegleria fowleri. Due to the challenges of early diagnosis, they investigated extracellular vesicles (EVs) from N. fowleri, identifying small RNA biomarkers. They developed an RT-qPCR assay to detect these biomarkers in various biofluids. 

      Strengths: 

      (1)  This study has a clear methodological approach, which allows for the reproducibility of the experiments. 

      (2) Early and Non-Invasive Diagnosis - The identification of a small RNA biomarker that can be detected in urine, plasma, and cerebrospinal fluid (CSF) provides a non-invasive diagnostic approach, which is crucial for improving early detection of PAM. 

      (3) High Sensitivity and Rapid Detection - The RT-qPCR assay developed in the study is highly sensitive, detecting the biomarker in 100% of CSF samples from human PAM cases and in mouse urine as early as 24 hours post-infection. Additionally, the test can be completed in ~3 hours, making it feasible for clinical use. 

      (4)  Potential for Disease Monitoring - Since the biomarker is detectable throughout the course of infection, it could be used not only for early diagnosis but also for tracking disease progression and monitoring treatment efficacy. 

      (5)  Strong Experimental Validation - The study demonstrates biomarker detection across multiple sample types (CSF, urine, whole blood, plasma) in both animal models and human cases, providing robust evidence for its clinical relevance. 

      (6) Addresses a Critical Unmet Need - With a >97% case fatality rate, PAM urgently requires improved diagnostics. This study provides one of the first viable liquid biopsy-based diagnostic approaches, potentially transforming how PAM is detected and managed. 

      Thank you for summarizing the strengths of the study.

      Weaknesses: 

      (1) Limited Human Sample Size - While the biomarker was detected in 100% of CSF samples from human PAM cases, the number of human samples analyzed (n=6 for CSF) is relatively small. A larger cohort is needed to validate its diagnostic reliability across diverse populations. 

      As noted in response to Reviewer #1 above, we agree this is a limitation of the study; however, we were fortunate to obtain even 15 µL samples of cerebrospinal fluid, plasma, serum, or whole blood from as many patients as we did. There is an urgent need for more systematic collection and storage of samples for rare diseases like primary amoebic meningoencephalitis so that advancements in diagnostics and biomarker discovery can be conducted. It is our sincere hope that by publishing our detailed methods and experimental results in this manuscript, that additional hospitals and research centers can replicate our studies and help advance this or other techniques for early diagnosis of PAM.

      (2) Lack of Pre-Symptomatic or Early-Stage Human Data - Although the biomarker was detected in mouse urine as early as 24 hours post-infection, there is no data on whether it can be reliably detected before symptoms appear in humans, which is crucial for early diagnosis and treatment initiation. 

      It is difficult to envision a method to obtain these biofluids from infected humans prior to onset of symptoms. More likely the best we can hope for is that physicians include primary amoebic meningoencephalitis in their assessment of patients that present with prodromal symptoms of meningitis.

      (3)  Plasma Detection Challenges - While the biomarker was detected in whole blood, it was not detected in human plasma, which could limit the ease of clinical implementation since plasma-based diagnostics are more common. Further investigation is needed to understand why it is absent in plasma and whether alternative blood-based approaches (e.g., whole blood assays) could be optimized. 

      See response to Reviewer #1 above.

      Reviewer #1 (Recommendations for the authors): 

      (1) What is the evidence that these small RNAs are secreted specifically in EVs? I believe that they are, and ultimately it doesn't impact the conclusions, but I think the evidence here could be either stronger or presented in a more obvious way. 

      Our data demonstrates that smallRNA-1 is present in N. fowleri-derived EVs (Figures 2 and Supplemental Figure 7) and in the intact amoebae (Figure 3B).  Initial sequencing data to identify these smallRNA biomarkers came from PEG-precipitated EVs (Figure S1), by using methods we previously published (22). The PEG-precipitated EVs were extracted specifically for spike in studies. Finally, the smallRNAs in EVs were confirmed after extraction of EVs from 7 N. fowleri strains (Figure 2). We do not have evidence that they are secreted outside of EVs.

      (2) The figure legends would be more useful with some additional information. For example: why are there two points for Nf69 in Fig 2B? In Figure 3A-B, please add more detail as to what the graphs are showing (are they histograms binned by a number of amoebae? This does not seem obvious to me). 

      We agree the Figure legends should be edited for clarity and to add additional information. Both Figure legends have been updated.

      In Figure 2B, each point represents the mean of three technical replicates of EV preps for each N. fowleri strain.

      In Figure 3 the points indicate the Copy#/µL of a well from a 96-well plate. The histograms show the mean of these observations for each condition. 

      (3)  In Figure 2E, the FBS seems like it has near detectable levels of smallRNA-1 compared to Ac and Bm (albeit N. fowleri has 4 orders of magnitude higher levels than the FBS). Because cows are likely exposed to N. fowleri and have documented infections (e.g. doi: 10.1016/j.rvsc.2012.01.002), is it possible this signal is real? 

      Thank you for making this interesting observation. We agree that cows are likely to have significant exposure to N. fowleri, yet documented infections are rare. In this case we do not believe the near detectable levels of smallRNA-1 in FBS was due to an infected donor animal. This noise was likely due to extracting RNA from concentrated FBS rather than FBS diluted in cell culture media. In addition, as shown in Supplemental Figure 4, the qPCR product from EVs extracted from FBS were not the same as that from the N. fowleri-derived EVs. Please note we used a PEG extraction reagent that separates lipid particles, so this is additional evidence the smallRNAs are present in EVs.

      (4)  In Figure 6A, why was the sample size greater for water and unspiked urine? Similarly, why is the number of infected mice so variable in Figure 4B? 

      In Figure 6A we assayed de-identified biofluids provided by Advent Hospital in Orlando, Florida. The plasma and serum samples were pooled from multiple individuals; whereas, individual urine samples (n=8) were provided for this experiment. We have updated the legend for Figure 6A to include these details.

      For Figure 4B we used plasma collected at the end-stage of disease following infections with five different strains of N. fowleri. The sample sizes varied for two reasons. First, Nf69 was the strain used most by our lab and we had plasma from several in vivo experiments. The lower sample sizes for the other strains came from an experiment with 8 mice per group. Some of these strains were less virulent and did not succumb to disease with the number of amoebae inoculated in this experiment. Thus, plasma was only collected from animals that were euthanized due to severe N.

      fowleri infections. In follow up studies (e.g., Figure 5B), plasma was collected every 24 hr for analysis.

      Very minor points: 

      (1)  The number of acronyms (FLA, PAM, EVs, CNS, CSF, LOD) could be reduced to make this paper more reader-friendly. 

      Acronyms that were used infrequently in the manuscript (FLA, CNS, LOD, mNGS, UC) have been edited to spell out the complete names. We kept the acronyms EVs and CSF because they are each used more than twenty times in the manuscript.

      (2)  The decimal point in the Cq values is formatted strangely. 

      The decimal points have been edited to normal format in both the manuscript and supplementary material.

      (3)  Figure 3C is not intuitive. I do not understand the logic for the placement of the different samples (was row A only amoebae, B only Veros, C blank, D a mix, and F more Veros?). 

      Thank you for this comment; we agree the microtiter plate schematic (Fig 3C) was misleading. We have revised Figure 3C to make the point that we tested amoebae alone, Vero cells alone, and we combined supernatants from Vero cells (alone) plus amoebae (alone) to confirm that 1) smallRNA-1 was only detected in amoeba-conditioned media, and 2) that Vero-conditioned media does not affect detection of smallRNA-1.

      Reviewer #2 (Recommendations for the authors): 

      Minor corrections: 

      The abbreviation 'Nf' for Naegleria fowleri is not appropriate in a scientific publication. According to taxonomic conventions, the correct way to abbreviate a scientific name is as follows: 

      The first mention should be written in full: Naegleria fowleri. 

      In subsequent mentions, the genus name should be abbreviated to its initial in uppercase, followed by a period, while the species name remains in lowercase: N. fowleri. 

      The same rule applies to Balamuthia mandrillaris and Acanthamoeba species, which should be abbreviated as B. mandrillaris and Acanthamoeba spp. after their first mention. 

      We agree and each of the scientific names have been updated to the proper format. Please note Nf69 is the accepted nomenclature for this N. fowleri strain, so no changes were made when referring to this specific strain.

      Temperatures should be expressed in international units (°C). Please update the temperatures reported in Fahrenheit (°F) in the 'Materials and Methods' section, specifically in the 'Animal Studies' subsection. 

      These changes were made in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      - Women are less likely to submit their papers to highly influential journals (*e.g.*, Nature, Science and PNAS).

      - Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      - Women are also more likely to say that they were advised not to submit to highly influential journals.

      Recommendation

      This paper highlights an important point, namely that the submissions' behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates---or a lack thereof---should not be automatically interpreted as as evidence of for or against discrimination (broadly defined) in the peer review process. I do, however, make a few suggestions below that the authors may (or may not) wish to address.

      We thank the author for this comment and for the following suggestions, which we take into account in our revision of the manuscript.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then 'we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      - First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be important reasons why not -- e.g., if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      - Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      In my opinion, the survey evidence reported here isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major -- or even minor -- contribution of your paper, so I would not mention policy interventions in the abstract. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!)

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      - What is the rationale for conditioning on academic rank and does this have explanatory power on its own---i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      The referee is right: academic rank was added to control for career age of researchers, with the assumption that this variable would influence submission behavior. However, the rank information we collected was for the time that the individual respondent took the survey, which could be different from the rank they held concerning their submission behaviors mentioned in the survey. That is why we didn't consider rank as an independent variable of interest. But I do also agree with the reviewer that it could be related to their submission behaviors in some cases. Our initial analysis shows that academic rank is not a significant predictor of whether researchers submitted to SNP, but does contribute significantly to the SNP acceptance rates and desk rejection rates of individuals in Medical Sciences.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Basson et al. study the representation of women in "high-impact" journals through the lens of gendered submission behavior. This work is clear and thorough, and it provides new insights into gender disparities in submissions, such as that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. The results have broad implications for all academic communities and may help toward reducing gender disparities in "high-impact" journal submissions. I enjoyed reading this article, and I have several recommendations regarding the methodology/reporting details that could help to enhance this work.

      We thank the referee for their comments.

      Strengths:

      This is an important area of investigation that is often overlooked in the study of gender bias in publishing. Several strengths of the paper include:

      (1) A comprehensive survey of thousands of academics. It is admirable that the authors retroactively reached out to other researchers and collected an extensive amount of data.

      (2) Overall, the modeling procedures appear thorough, and many different questions are modeled.

      (3) There are interesting new results, as well as a thoughtful discussion. This work will likely spark further investigation into gender bias in submission behavior, particularly regarding the possible gendered effect of mentorship on article submission.

      Thank you for those comments.

      Weaknesses:

      (1) The GitHub page should be further clarified. A detailed description of how to run the analysis and the location of the data would be helpful. For example, although the paper says that "Aggregated and de-identified data by gender, discipline, and rank for analyses are available on GitHub," I was unable to find such data.

      We added the link to the Github page, as well as more details on the how to run the statistical analysis. Unfortunately, our IRB approval does not allow for the sharing of the raw data.

      (2) Why is desk rejection rate defined as "the number of manuscripts that did not go out for peer review divided by the number of manuscripts rejected for each survey respondent"? For example, in your Grossman 2020 reference, it appears that manuscripts are categorized as "reviewed" or "desk-rejected" (Grossman Figure 2). If there are gender differences in the denominator, then this could affect the results.

      We thank the referee for pointing this out. Actually, what the referee is proposing is how we calculated it in the manuscript; the calculation mentioned in the manuscript was a mistake. We corrected the manuscript.

      (3) Have you considered correcting for multiple comparisons? Alternatively, you could consider reporting P-values and effect sizes in the main text. Otherwise, sometimes the conclusions can be misleading. For example, in Figure 3 (and Table S28), the effect is described as significant in Social Sciences (p=0.04) but not in Medical Sciences (p=0.07).

      We highly appreciate the suggestion. We’ve added Odds Ratio values and p-values to the main manuscript.

      (4) More detail about the models could be included. It may be helpful to include this in each table caption so that it is clear what all the terms of the model were. For instance, I was wondering if journal or discipline are included in the models.

      We appreciate the suggestion. We’ve added model details to the figure and table captions in the manuscript and the supplemental materials.

      Reviewer #3 (Public Review):

      Summary:

      This is a strong manuscript by Basson and colleagues which contributes to our understanding of gender disparities in scientific publishing. The authors examine attitudes and behaviors related to manuscript submission in influential journals (specifically, Science, Nature and PNAS). The authors rightly note that much attention has been paid to gender disparities in work that is already published, but this fails to capture the unseen hurdles that occur prior to publication (which include decisions about where to publish, desk rejections, revisions and resubmissions, etc.). They conducted a survey study to address some of these components and their results are interesting:

      They find that women are less likely to submit their manuscript to Science, Nature or PNAS. While both men and women feel their work would be better suited for more specialized journals, women were more likely to think their work was 'less novel or groundbreaking.'

      A smaller proportion of respondents indicated that they were actively discouraged from submitting their manuscripts to these journals. In this instance, women were more likely to receive this advice than men.

      Lastly, the authors also looked at self-reported acceptance and rejection rates and found that there were no gender differences in acceptance or rejection rates.

      These data are helpful in developing strategies to mitigate gender disparities in influential journals.

      We thank the referee for their comments

      Comments:

      The methods the authors used are appropriate for this study. The low response rate is common for this type of recruitment strategy. The authors provide a thoughtful interpretation of their data in the Discussion.

      We thank the referee for their comments

      Reviewer #4 (Public Review):

      This manuscript covers an important topic of gender biases in the authorship of scientific publications. Specifically, it investigates potential mechanisms behind these biases, using a solid approach, based on a survey of researchers.

      Main strengths

      The topic of the MS is very relevant given that across sciences/academia representation of genders is uneven, and identified as concerning. To change this, we need to have evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with high impact factor. While previous work has detected this gap, as well as some potential mechanisms, the current MS provides strong evidence, based on a survey of close to 5000 authors, that this gap might be due to lower submission rates of women compared to men, rather than the rejection rates. The data analysis is appropriate to address the main research aims. The results interestingly show that there is no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking, and be advised not to submit to prestigious journals

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, and actions to include other forms of measuring scientific impact and merit.

      We thank the referee for their comments.

      Main weakness and suggestions for improvement

      (1) The main message/further actions: I feel that the MS fails to sufficiently emphasise the need for a different evaluation system for researchers (and their research). While we might act to support women to submit more to high-impact journals, we could also (and several initiatives do this) consider a broader spectrum of merits (e.g. see https://coara.eu/ ). Thus, I suggest more space to discuss this route in the Discussion. Also, I would suggest changing the terms that imply that prestigious journals have a better quality of research or the highest scientific impact (line 40: journals of the highest scientific impact) with terms that actually state what we definitely know (i.e. that they have the highest impact factor). And think this could broaden the impact of the MS

      We agree with the referee. We changed the wording on impact, and added a few lines were added on this in the discussion.

      (2) Methods: while methods are all sound, in places it is difficult to understand what has been done or measured. For example, only quite late (as far as I can find, it's in the supplement) we learn the type of authorship considered in the MS is the corresponding authorship. This information should be clear from the very start (including the Abstract).

      We performed the suggested edits.

      Second, I am unclear about the question on the perceived quality of research work. Was this quality defined for researchers, as quality can mean different things (e.g. how robust their set-up was, how important their research question was)? If researchers have different definitions of what quality means, this can cause additional heterogeneity in responses. Given that the survey cannot be repeated now, maybe this can be discussed as a limitation.

      We agree that this can mean something different for researchers—probably varies by discipline, but also by gender. But that was precisely the point: whether men/women considered their “best work” to be published in higher impact venue. While there may be heterogeneity in those perceptions, the fact that 1) men and women rate their research at the same level and 2) we control for disciplinary differences should mitigate some of that.

      I was surprised to see that discipline was considered as a moderator for some of the analyses but not for the main analysis on the acceptance and rejection rates.

      We appreciate the attention to detail. In our analysis of acceptance and rejection rates, we conducted separate regression analyses for each discipline to capture any field-specific patterns that might otherwise be obscured.

      We added more details on this to clarify.

      I was also suppressed not to see publication charges as one of the reasons asked for not submitting to selected journals. Low and middle-income countries often have more women in science but are also less likely to support high publication charges.

      That is a good point. However, both Science and Nature have subscription options, which do not require any APCs.

      Finally, academic rank was asked of respondents but was not taken as a moderator.

      Academic rank is included in the regression as a control variable (Figure 1).

      Reviewer #2 (Recommendations For The Authors):

      In addition to the points in the "Weaknesses" section of the my Public Review above, I have several suggestions to improve this work.

      (1) Can you please indicate what the error bars mean in each plot? I am assuming that they are 95% confidence intervals.

      We appreciate the attention to detail. Yes, they are 95% confidence intervals. We’ve clarified this in the captions of the corresponding figures. 

      (2) Can you provide a more detailed explanation for why the 7 journals were separated? I see that on page 3 of the supporting information you write that "Due to limited responses, analysis per journal was not always viable. The results pertaining to the journals were aggregated, with new categories based on the shared similarities in disciplinary foci of the journals and their prestige." Specifically, why did you divide the data into (somewhat arbitrary) categories as opposed to using all the data and including a journal term in your model?

      The survey covered 7 journals:

      • Science, Nature, and PNAS (S.N.P.)

      • Nature Communications and Science Advances (NC.SA.)

      • NEJM and Cell (NEJM.C.)

      We believe that the first three are a class of their own: they cover all fields (while NEJM and Cell are limited to (bio)medical sciences), and have a much higher symbolic capital than both Nature Comms and Science Advances (which are receiving cascading papers from Nature and Science, respectively). We believe that factors leading to submission to S.N.P. are much different than those leading to submission to the other groups of journals, which is why we separated the analysis in that manner.

      (3) You included random effects for linear regression but not for logistic regression. Please justify this choice or include additional logistic regression models with random effects.

      We used mixed-effect models for linear regressions (where number of submissions, acceptance rate, or rejection rate is the dependent variable). As mentioned in the previous comment, we tested using rank as the control variable and found it had a potential impact on the variables we analyzed using linear regressions in some disciplines. Therefore, we introduced it as a random effect for all the linear regression models.

      Reviewer #3 (Recommendations For The Authors):

      The limitations of this work are currently described in the Supplement. It may be helpful to bring several of these items into the Discussion so that they can be addressed more prominently.

      Added content

      Reviewer #4 (Recommendations For The Authors):

      (1) Line 40: add 'as leading authors of papers published in' before ' 'journals'

      Done

      (2) Explain what the direction in the ' relationship between' line 62 is

      Added

      (3) Lines 101-102 - this is a bit unclear. Please, provide some more info, also including what did these studies find.

      Added

      (4) Is 'sociodemographic' the best term in line 120

      Yes, we believe so.

      (5) Results would benefit from a short intro with the info on the number of respondents, also by gender.

      Those are present at the end of the intro (and in the methods, at the end). We nonetheless added gender.

      (6) Line 134 add how many woman and man did submit to Science, Nature, and PNAS

      Added. In all disciplines combined, 552 women and 1,583 men ever submitted to these three elite journals. More details can be found in SI Table 9

      (7) Add 'Self-' before reported, line 141

      Added

      (8) Add sample sizes to Figs 1 and 2

      Those are in the appendix

      (9) Line 168 - unclear if this is ever or as their first choice

      We do not discriminate – it is whether the considered it at all.

      (10) Add sample size in line 177

      Added. 480 women and 1404 men across all disciplines reported desk rejections by S.N.P. journals.

      (11) I would like to see some discussion on the fact that the highest citation paper will also be a paper that the authors have submitted earlier in their careers given that citations will pile up over time.

      Those are actually quite evenly distributed. We modified the supplementary materials.

      (12) Data availability - be clear that supporting info contains only summary data. Also, while the Data availability statement refers to de-identified data on Github, the Github page only contains the code, and the note that 'The STAT code used for our analyses is shared.

      We are unable to share the survey response details publicly per IRB protocols.' Why were de-identified data shared? This is extremely important to allow for the reproducibility of MS results. I would also suggest sharing data in a trusted repository (e.g. Dryad, ZENODO...) rather than on Github, as per current recommendations on the best practices for data sharing.

      Thank you for your careful reading and for highlighting the importance of clear data availability. We will revise our Data Availability Statement to explicitly state that the supporting information contains only summary data and that the complete analysis code is available on GitHub.

      We understand the importance of sharing de-identified data for reproducibility. However, our IRB strictly prohibits the sharing of any individual-level data, including de-identified files, to protect participant confidentiality. Consequently, the summary data included in the supporting information, together with the provided code, is intended to facilitate the verification of our core findings. Our previous statement regarding “de-identified” data sharing was inaccurate and thus has been removed. We apologize for the confusion.

      In light of your suggestion, we are also exploring depositing the summary data and code in a trusted repository (e.g., Dryad or Zenodo) to further align with current best practices for data sharing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatial-numerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an L-R bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc.

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

      Reviewer #1(Public review)

      (1) Introduction needs to be edited to make it much more concise and shorter. Hypotheses (from line 67 to 81) and predictions (from line 107 to 124) must be thoroughly rephrased, because (a) general readers are not familiar with the hypotheses (emotional valence and BAFT), (b) the hypotheses may or may not be mutually exclusive, and therefore (c) the logical linkage between the hypotheses and the predicted results are not necessarily clear. Most general readers may be embarrassed by the apparently complicated logical constructs of this study. Instead, it is recommended that focal spotlight should be given to the issue of functional contributions of brain lateralization to the cognitive development of number sense.

      We thank the Reviewer for these comments, which allowed us to improve the clarity of our hypotheses and predictions. We thoroughly rephrased them to ensure they are accessible to general readers and specified that the models may or may not be mutually exclusive. Additionally, we highlighted the functional contributions of brain lateralization to the cognitive development of number sense, addressing the suggested focal point. While we have shortened the introduction, we opted to retain essential background information to ensure readers are well-informed about the relevant scientific literature. Please review the entire introduction, particularly lines 84–118 and 218.

      (2) In relation to the above (a), abbreviations need to be reexamined. MNL (mental number line) appears early on lines 27 and 49, whereas the possibly related conceptual term SNA appeared first on line 213, without specification to "spatial numerical association".

      We thank the Reviewer for bringing this to our attention. We have addressed the suggestions, and the term SNA has been used specifically to refer to numerical spatialization in non-human animals. Please see lines 27-30.

      (3) By the way, what difference is there between MNL and SNA? Please specify the difference if it is important. If not important, is it possible that one of these two is consistently used in this report, at least in the Introduction?

      We clarified the distinction between MNL and SNA and have consistently used SNA in this report; please see lines 47-75.

      (4) In relation to the above (a and b), clarification of the hypotheses and their abbreviations in the form of a table or a graphical representation will strongly reinforce the general readers' understanding. It is also possible that some of these hypotheses are discussed later in the Discussion, rather than in Introduction.

      We appreciated this suggestion and have now clarified the hypotheses, also providing a table/graphical representation, aiming to enhance accessibility for general readers; please see lines 110-118, and 218.

      (5) Figures 1 and 2 are transparent and easily understandable; however, the statistical details in the Results may bother the readers as the main points are doubly represented in Figures 1, 2, and Table 1. These (statistics and Table 1) may go to the supplementary file, if the editor agrees.

      We would prefer to keep Table 1 and the statistical details as part of the main article to provide readers with a comprehensive overview of the experimental results. However, if the editors also suggest to move them to the supplementary file, we are open to making this adjustment.

      (6) In Figure 1D and E, and text lines 139-140. Figure 1D shows that the chick is looking monocularly by the right eye, but the text (line 139) says "left eye in use. Is it correct?

      We thank the reviewer for pointing out this incongruity. We have corrected the text to align with Figure 1D and E; please see lines 180-181.

      (7) Methods. The behavioral experiment was initiated on Wednesday (8 a.m.; line 479), but at what age? At what post-hatch day was the experiment terminated? A simple graphical illustration of the schedule will be quite helpful.

      We have added the requested details, specifying that experiments began on the third post-hatch day and ended on the fifth day; please see lines 533-539.

      Additionally, we have included a graphical illustration of the schedule to enhance clarity; please see line 666.  

      (8) Methods. How many chicks were excluded from the study in the course of Pre-training (line 525) and Training (line 535-536)? Was the exclusion rate high, or just negligible?

      We appreciate the reviewer's suggestion. We have now included the number of subjects excluded during the training phase; please see lines 593-597.

      We wish to thank the Reviewer again for his/her work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This work integrates two timepoints from the Adolescent Brain Cognitive Development (ABCD) Study to understand how neuroimaging, genetic, and environmental data contribute to the predictive power of mental health variables in predicting cognition in a large early adolescent sample. Their multimodal and multivariate prediction framework involves a novel opportunistic stacking model to handle complex types of information to predict variables that are important in understanding mental health-cognitive performance associations. 

      Strengths: 

      The authors are commended for incorporating and directly comparing the contribution of multiple imaging modalities (task fMRI, resting state fMRI, diffusion MRI, structural MRI), neurodevelopmental markers, environmental factors, and polygenic risk scores in a novel multivariate framework (via opportunistic stacking), as well as interpreting mental health-cognition associations with latent factors derived from partial least squares. The authors also use a large well-characterized and diverse cohort of adolescents from the ABCD Study. The paper is also strengthened by commonality analyses to understand the shared and unique contribution of different categories of factors (e.g., neuroimaging vs mental health vs polygenic scores vs sociodemographic and adverse developmental events) in explaining variance in cognitive performance 

      Weaknesses: 

      The paper is framed with an over-reliance on the RDoC framework in the introduction, despite deviations from the RDoC framework in the methods. The field is also learning more about RDoC's limitations when mapping cognitive performance to biology. The authors also focus on a single general factor of cognition as the core outcome of interest as opposed to different domains of cognition. The authors could consider predicting mental health rather than cognition. Using mental health as a predictor could be limited by the included 9-11 year age range at baseline (where many mental health concerns are likely to be low or not well captured), as well as the nature of how the data was collected, i.e., either by self-report or from parent/caregiver report. 

      Thank you so much for your encouragement.

      We appreciate your comments on the strengths of our manuscript.

      Regarding the weaknesses, the reliance on the RDoC framework is by design. Even with its limitations, following RDoC allows us to investigate mental health holistically. In our case, RDoC enabled us to focus on a) a functional domain (i.e., cognitive ability), b) the biological units of analysis of this functional domain (i.e., neuroimaging and polygenic scores), c) potential contribution of environments, and d) the continuous individual deviation in this domain (as opposed to distinct categories). We are unaware of any framework with all these four features.

      Focusing on modelling biological units of analysis of a functional domain, as opposed to mental health per se, has some empirical support from the literature. For instance, in Marek and colleagues’ (2022) study, as mentioned by a previous reviewer, fMRI is shown to have a more robust prediction for cognitive ability than mental health. Accordingly, our reasons for predicting cognitive ability instead of mental health in this study are motivated theoretically (i.e., through RDoC) and empirically (i.e., through fMRI findings). We have clarified this reason in the introduction of the manuscript.

      We are aware of the debates surrounding the actual structure of functional domains where the originally proposed RDoC’s specific constructs might not fit the data as well as the data-driven approach (Beam et al., 2021; Quah et al., 2025). However, we consider this debate as an attempt to improve the characterisation of functional domains of RDoC, not an effort to invalidate its holistic, neurobiological and basicfunctioning approach. Our use of a latent-variable modelling approach through factor analyses moves towards a data-driven direction. We made the changes to the second-to-last paragraph in the introduction to make this point clear:

      “In this study, inspired by RDoC, we a) focused on cognitive abilities as a functional domain, b) created predictive models to capture the continuous individual variation (as opposed to distinct categories) in cognitive abilities, c) computed two neurobiological units of analysis of cognitive abilities: multimodal neuroimaging and PGS, and d) investigated the potential contributions of environmental factors. To operationalise cognitive abilities, we estimated a latent variable representing behavioural performance across various cognitive tasks, commonly referred to as general cognitive ability or the gfactor (Deary, 2012). The g-factor was computed from various cognitive tasks pertinent to RDoC constructs, including attention, working memory, declarative memory, language, and cognitive control. However, using the g-factor to operationalise cognitive abilities caused this study to diverge from the original conceptualisation of RDoC, which emphasises studying separate constructs within cognitive abilities (Morris et al., 2022; Morris & Cuthbert, 2012). Recent studies suggest an improvement to the structure of functional domains by including a general factor, such as the g-factor, in the model, rather than treating each construct separately (Beam et al., 2021; Quah et al., 2025). The g-factor in children is also longitudinally stable and can forecast future health outcomes (Calvin et al., 2017; Deary et al., 2013). Notably, our previous research found that neuroimaging predicts the g-factor more accurately than predicting performance from separate individual cognitive tasks (Pat et al., 2023). Accordingly, we decided to conduct predictive models on the g-factor while keeping the RDoC’s holistic, neurobiological, and basic-functioning characteristics.”

      Reviewer #2 (Public review):

      Summary: 

      This paper by Wang et al. uses rich brain, behaviour, and genetics data from the ABCD cohort to ask how well cognitive abilities can be predicted from mental-health-related measures, and how brain and genetics influence that prediction. They obtain an out-ofsample correlation of 0.4, with neuroimaging (in particular task fMRI) proving the key mediator. Polygenic scores contributed less. 

      Strengths: 

      This paper is characterized by the intelligent use of a superb sample (ABCD) alongside strong statistical learning methods and a clear set of questions. The outcome - the moderate level of prediction between the brain, cognition, genetics, and mental health - is interesting. Particularly important is the dissection of which features best mediate that prediction and how developmental and lifestyle factors play a role. 

      Thank you so much for the encouragement. 

      Weaknesses: 

      There are relatively few weaknesses to this paper. It has already undergone review at a different journal, and the authors clearly took the original set of comments into account in revising their paper. Overall, while the ABCD sample is superb for the questions asked, it would have been highly informative to extend the analyses to datasets containing more participants with neurological/psychiatric diagnoses (e.g. HBN, POND) or extend it into adolescent/early adult onset psychopathology cohorts. But it is fair enough that the authors want to leave that for future work. 

      Thank you very much for providing this valuable comment and for your flexibility.

      For the current manuscript, we have drawn inspiration from the RDoC framework, which emphasises the variation from normal to abnormal in normative samples (Morris et al., 2022). The ABCD samples align well with this framework.

      We hope to extend this framework to include participants with neurological and psychiatric diagnoses in the future. We have begun applying neurobiological units of analysis for cognitive abilities, assessed through multimodal neuroimaging and polygenic scores (PGS), to other datasets containing more participants with neurological and psychiatric diagnoses. However, this is beyond the scope of the current manuscript. We have listed this as one of the limitations in the discussion section:

      “Similarly, our ABCD samples were young and community-based, likely limiting the severity of their psychopathological issues (Kessler et al., 2007). Future work needs to test if the results found here are generalisable to adults and participants with stronger severity.”

      In terms of more practical concerns, much of the paper relies on comparing r or R2 measures between different tests. These are always presented as point estimates without uncertainty. There would be some value, I think, in incorporating uncertainty from repeated sampling to better understand the improvements/differences between the reported correlations. 

      This is a good suggestion. We have now included bootstrapped 95% confidence intervals in all of our scatter plots, showing the uncertainty of predictive performance.

      The focus on mental health in a largely normative sample leads to the predictions being largely based on the normal range. It would be interesting to subsample the data and ask how well the extremes are predicted. 

      We appreciate this comment. Similar to our response to Reviewer 2’s Weakness #1, our approach has drawn inspiration from the RDoC framework, which emphasises the variation from normal to abnormal in normative samples (Morris et al., 2022). Subsampling the data would make us deviate from our original motivation. 

      Moreover, we used 17 mental healh variables in our predictive models: 8 CBCL subscales, 4 BIS/BAS subscales and 5 UPSS subscales. It is difficult to subsample them. Perhaps a better approach is to test the applicability of our neurobiological units of analysis for cognitive abilities (multimodal neuroimaging and PGS) in other datasets that include more extreme samples. We are working on this line of studies at the moment, and hope to show that in our future work. 

      Reviewer 2’s Weakness #4

      A minor query - why are only cortical features shown in Figure 3? 

      We presented both cortical and subcortical features in Figure 3. The cortical features are shown on the surface space, while the subcortical features are displayed on the coronal plane. Below is an example of these cortical and subcortical features from the ENBack contrast. The subcortical features are presented in the far-right coronal image.

      We separated the presentation of cortical and subcortical features because the ABCD uses the CIFTI format (https://www.humanconnectome.org/software/workbenchcommand/-cifti-help). CIFTI-format images combine cortical surface (in vertices) with subcortical volume (in voxels). For task fMRI, the ABCD parcellated cortical vertices using Freesurfer’s Destrieux atlas and subcortical voxels using Freesurfer’s automatically segmented brain volume (ASEG).

      Due to the size of the images in Figure 3, it may have been difficult for Reviewer 2 to see the subcortical features clearly. We have now added zoomed-in versions of this figure as Supplementary Figures 4–13.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the autors):

      (1) In the abstract, could the authors mention which imaging modalities contribute most to the prediction of cognitive abilities (e.g., working memory-related task fMRI)? 

      Thank you for the suggestion. Following this advice, we now mention which imaging modalities led to the highest predictive performance. Please see the abstract below.

      “Cognitive abilities are often linked to mental health across various disorders, a pattern observed even in childhood. However, the extent to which this relationship is represented by different neurobiological units of analysis, such as multimodal neuroimaging and polygenic scores (PGS), remains unclear. 

      Using large-scale data from the Adolescent Brain Cognitive Development (ABCD) Study, we first quantified the relationship between cognitive abilities and mental health by applying multivariate models to predict cognitive abilities from mental health in children aged 9-10, finding an out-of-sample r\=.36 . We then applied similar multivariate models to predict cognitive abilities from multimodal neuroimaging, polygenic scores (PGS) and environmental factors. Multimodal neuroimaging was based on 45 types of brain MRI (e.g., task fMRI contrasts, resting-state fMRI, structural MRI, and diffusion tensor imaging). Among these MRI types, the fMRI contrast, 2-Back vs. 0-Back, from the ENBack task provided the highest predictive performance (r\=.4). Combining information across all 45 types of brain MRI led to the predictive performance of r\=.54. The PGS, based on previous genome-wide association studies on cognitive abilities, achieved a predictive performance of r\=.25. Environmental factors, including socio-demographics (e.g., parent’s income and education), lifestyles (e.g., extracurricular activities, sleep) and developmental adverse events (e.g., parental use of alcohol/tobacco, pregnancy complications), led to a predictive performance of r\=.49. 

      In a series of separate commonality analyses, we found that the relationship between cognitive abilities and mental health was primarily represented by multimodal neuroimaging (66%) and, to a lesser extent, by PGS (21%). Additionally, environmental factors accounted for 63% of the variance in the relationship between cognitive abilities and mental health. The multimodal neuroimaging and PGS then explained 58% and 21% of the variance due to environmental factors, respectively. Notably, these patterns remained stable over two years. 

      Our findings underscore the significance of neurobiological units of analysis for cognitive abilities, as measured by multimodal neuroimaging and PGS, in understanding both a) the relationship between cognitive abilities and mental health and b) the variance in this relationship shared with environmental factors.”

      (2) Could the authors clarify what they mean by "completing the transdiagnostic aetiology of mental health" in the introduction? (Second paragraph). 

      Thank you. 

      We intended to convey that understanding the transdiagnostic aetiology of mental health would be enhanced by knowing how neurobiological units of cognitive abilities, from the brain to genes, capture variations due to environmental factors. We realise this sentence might be confusing. Removing it does not alter the intended meaning of the paragraph, as we clarified this point later. The paragraph now reads:

      “According to the National Institute of Mental Health’s Research Domain Criteria (RDoC) framework (Insel et al., 2010), cognitive abilities should be investigated not only behaviourally but also neurobiologically, from the brain to genes. It remains unclear to what extent the relationship between cognitive abilities and mental health is represented in part by different neurobiological units of analysis -- such as neural and genetic levels measured by multimodal neuroimaging and polygenic scores (PGS). To fully comprehend the role of neurobiology in the relationship between cognitive abilities and mental health, we must also consider how these neurobiological units capture variations due to environmental factors, such as sociodemographics, lifestyles, and childhood developmental adverse events (Morris et al., 2022). Our study investigated the extent to which a) environmental factors explain the relationship between cognitive abilities and mental health, and b) cognitive abilities at the neural and genetic levels capture these associations due to environmental factors. Specifically, we conducted these investigations in a large normative group of children from the ABCD study (Casey et al., 2018). We chose to examine children because, while their emotional and behavioural problems might not meet full diagnostic criteria (Kessler et al., 2007), issues at a young age often forecast adult psychopathology (Reef et al., 2010; Roza et al., 2003). Moreover, the associations among different emotional and behavioural problems in children reflect transdiagnostic dimensions of psychopathology (Michelini et al., 2019; Pat et al., 2022), making children an appropriate population to study the transdiagnostic aetiology of mental health, especially within a framework that emphasises normative variation from normal to abnormal, such as the RDoC (Morris et al., 2022).“

      (3) It is unclear to me what the authors mean by this statement in the introduction: "Note that using the word 'proxy measure' does not necessarily mean that the predictive model for a particular measure has a high predictive performance - some proxy measures have better predictive performance than others". 

      We added this sentence to address a previous reviewer’s comment: “The authors use the phrasing throughout 'proxy measures of cognitive abilities' when they discuss PRS, neuroimaging, sociodemographics/lifestyle, and developmental factors. Indeed, the authors are able to explain a large proportion of variance with different combinations of these measures, but I think it may be a leap to call all of these proxy measures of cognition. I would suggest keeping the language more objective and stating these measures are associated with cognition.” 

      Because of this comment, we assumed that the reviewers wanted us to avoid the misinterpretation that a proxy measure implies high predictive performance. This term is used in machine learning literature (for instance, Dadi et al., 2021). We added the aforementioned sentence to ensure readers that using the term 'proxy measure' does not necessarily mean that the predictive model for a particular measure has high predictive performance. However, it seems that our intention led to an even more confusing message. Therefore, we decided to delete that sentence but keep an earlier sentence that explains the meaning of a proxy measure (see below).

      “With opportunistic stacking, we created a ‘proxy’ measure of cognitive abilities (i.e., predicted value from the model) at the neural unit of analysis using multimodal neuroimaging.”

      (4) Overall, despite comments from reviewers at another journal, I think the authors still refer to RDoC more than needed in the intro given the restructuring of the manuscript. For instance, at the end of page 4 and top of page 5, it becomes a bit confusing when the authors mention how they deviated from the RDoC framework, but their choice of cognitive domains is still motivated by RDoC. I think the chosen cognitive constructs are consistent with what is in ABCD and what other studies have incorporated into the g factor and do not require the authors to further justify their choice through RDoC. Also, there is emerging work showing that RDoC is limited in its ability to parse apart meaningful neuroimaging-based patterns; see for instance, Quah et al., Nature 2025 (https://doi.org/10.1038/s41467-025-55831-z). 

      Thank you very much for your comment. We have addressed it in our Response to Reviewer 1’s summary, strengths, and weaknesses above. We have rewritten the paragraph to clarify the relevance of our work to the RDoC framework and to recent studies aiming to improve RDoC constructs (including that from Quah and colleagues).

      (5) I am still on the fence about the use of 'proxy measures of cognitive abilities' given that it is defined as the predictive performance of mental health measures in predicting cognition - what about just calling these mental health predictors? Also, it would be easier to follow this train of thought throughout the manuscript. But I leave it to the authors if they decide to keep their current language of 'proxy measure of cognition'. 

      Thank you so much for your flexibility. As we explained previously, this ‘proxy measures’ term is used in machine learning literature (for instance, Dadi et al., 2021). We thought about other terms, such as “score”, which is used in genetics, i.e., polygenic scores (Choi et al., 2020). and has recently been used in neuroimaging, i.e., neuroscore (Rodrigue et al., 2024). However, using a ‘score’ is a bit awkward for mental health and socio-demographics, lifestyle and developmental adverse events. Accordingly, we decided to keep the term ‘proxy measures’.

      (6) It is unclear which cognitive abilities are being predicted in Figure 1, given the various domains that authors describe in their intro. Is it the g-factor from CFA? This should be clarified in all figure captions. 

      Yes, cognitive abilities are operationalised using a second-order latent variable, the g-factor from a CFA. We now added the following sentence to Figure 1, 2, 4 to make this point clearer. Thank you for the suggestion:

      “Cognitive abilities are based on the second-order latent variable, the g-factor, based on a confirmatory factor analysis of six cognitive tasks.”

      (7) I think it may also be worthwhile to showcase the explanatory power cognitive abilities have in predicting mental health or at least comment on this in the discussion. Certainly, there may be a bidirectional relationship here. The prediction direction from cognition to mental health may be an altogether different objective than what the paper currently presents, but many researchers working in psychiatry may take the stance (with support from the literature) that cognitive performance may serve as premorbid markers for later mental health concerns, particularly given the age range that the authors are working with in ABCD. 

      Thank you for this comment. 

      It is important to note that we do not make a directional claim in these cross-sectional analyses. The term "prediction" is used in a machine learning sense, implying only that we made an out-of-sample prediction (Yarkoni & Westfall, 2017). Specifically, we built predictive models on some samples (i.e., training participants) and applied our models to test participants who were not part of the model-building process. Accordingly, our predictive models cannot determine whether mental health “causes” cognitive abilities or vice versa, regardless of whether we treat mental health or cognitive abilities as feature/explanatory/independent variables or as target/response/outcome variables in the models. To demonstrate directionality, we would need to conduct a longitudinal analysis with many more repeated samples and use appropriate techniques, such as a cross-lagged panel model. It is beyond the scope of this manuscript and will need future releases of the ABCD data.

      We decided to use cognitive abilities as a target variable here, rather than a feature variable, mainly for theoretical reasons. This work was inspired by the RDoC framework, which emphasises functional domains. Cognitive abilities is the functional domain in the current study. We created predictive models to predict cognitive abilities based on a) mental health, b) multimodal neuroimaging, c) polygenic scores, and d) environmental factors. We could not treat cognitive abilities as a functional domain if we used them as a feature variable. For instance, if we predicted mental health (instead of cognitive abilities) from multimodal neuroimaging and polygenic scores, we would no longer capture the neurobiological units of analysis for cognitive abilities.

      We now made it clearer in the discussion that our use of predictive models cannot provide the directional of the effects

      “Our predictive modelling revealed a medium-sized predictive relationship between cognitive abilities and mental health. This finding aligns with recent meta-analyses of case-control studies that link cognitive abilities and mental disorders across various psychiatric conditions (Abramovitch et al., 2021; East-Richard et al., 2020). Unlike previous studies, we estimated the predictive, out-of-sample relationship between cognitive abilities and mental disorders in a large normative sample of children. Although our predictive models, like other cross-sectional models, cannot determine the directionality of the effects, the strength of the relationship between cognitive abilities and mental health estimated here should be more robust than when calculated using the same sample as the model itself, known as in-sample prediction/association (Marek et al., 2022; Yarkoni & Westfall, 2017). Examining the PLS loadings of our predictive models revealed that the relationship was driven by various aspects of mental health, including thought and externalising symptoms, as well as motivation. This suggests that there are multiple pathways—encompassing a broad range of emotional and behavioural problems and temperaments—through which cognitive abilities and mental health are linked.”

      (8) There is a lot of information packed into Figure 3 in the brain maps; I understand the authors wanted to fit this onto one page, and perhaps a higher resolution figure would resolve this, but the brain maps are very hard to read and/or compare, particularly the coronal sections. 

      Thank you for this suggestion. We agree with Reviewer 1 that we need to have a better visualisation of the feature-importance brain maps. To ensure that readers can clearly see the feature importance, we added a Zoom-in version of the feature-importance brain maps as Supplementary Figures 4 – 13.

      (9) It would be helpful for authors to cluster features in the resting state functional connectivity correlation matrices, and perhaps use shorter names/acronyms for the labels. 

      Thank you for this suggestion. 

      We have now added a zoomed-in version of the feature importance for rs-fmri as Supplementary Figure 7 (for baseline) and 12 (for follow-up).

      (10) Figures 4a) and 4b): please elaborate on "developmental adverse" in the title. I am assuming this is referring to childhood adverse events, or "developmental adversities". 

      Thank you so much for pointing this out. We meant ‘developmental adverse events’. We have made changes to this figure in the current manuscript.

      (11) For the "follow-up" analyses, I would recommend the authors present this using only the features that are indeed available at follow-up, even if the list of features is lower, otherwise it becomes a bit confusing with the mix of baseline and follow-up features. Or perhaps the authors could make this more clear in the figures by perhaps having a different color for baseline vs follow-up features along the y-axis labels. 

      Thank you for this advice. We have now added an indicator in the plot to show whether the features were collected in the baseline or follow-up. We also added colours to indicate which type of environmental factors they were. It is now clear that the majority of the features that were collected at baseline, but were used for the followup predictive model, were developmental adverse events.

      (12) Minor: Makowski et al 2023 reference can be updated to Makowski et al 2024, published in Cerebral Cortex. 

      Thank you for pointing this out. We have updated the citation accordingly. 

      References

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007. https://doi.org/10.1016/j.cpr.2021.102007

      Beam, E., Potts, C., Poldrack, R. A., & Etkin, A. (2021). A data-driven framework for mapping domains of human neurobiology. Nature Neuroscience, 24(12), 1733–1744. https://doi.org/10.1038/s41593-021-00948-9

      Calvin, C. M., Batty, G. D., Der, G., Brett, C. E., Taylor, A., Pattie, A., Čukić, I., & Deary, I. J. (2017). Childhood intelligence in relation to major causes of death in 68 year follow-up: Prospective population study. BMJ, j2708. https://doi.org/10.1136/bmj.j2708

      Casey, B. J., Cannonier, T., Conley, M. I., Cohen, A. O., Barch, D. M., Heitzeg, M. M., Soules, M. E., Teslovich, T., Dellarco, D. V., Garavan, H., Orr, C. A., Wager, T. D., Banich, M. T., Speer, N. K., Sutherland, M. T., Riedel, M. C., Dick, A. S., Bjork, J. M., Thomas, K. M., … ABCD Imaging Acquisition Workgroup. (2018). The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience, 32, 43–54. https://doi.org/10.1016/j.dcn.2018.03.001

      Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9. https://doi.org/10.1038/s41596-020-0353-1

      Dadi, K., Varoquaux, G., Houenou, J., Bzdok, D., Thirion, B., & Engemann, D. (2021). Population modeling with machine learning can enhance measures of mental health. GigaScience, 10(10), giab071. https://doi.org/10.1093/gigascience/giab071

      Deary, I. J. (2012). Intelligence. Annual Review of Psychology, 63(1), 453–482. https://doi.org/10.1146/annurev-psych-120710-100353

      Deary, I. J., Pattie, A., & Starr, J. M. (2013). The Stability of Intelligence From Age 11 to Age 90 Years: The Lothian Birth Cohort of 1921. Psychological Science, 24(12), 2361–2368. https://doi.org/10.1177/0956797613486487

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214. https://doi.org/10.1037/cap0000196

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Kessler, R. C., Amminger, G. P., Aguilar-Gaxiola, S., Alonso, J., Lee, S., & Üstün, T. B. (2007). Age of onset of mental disorders: A review of recent literature. Current Opinion in Psychiatry, 20(4). https://journals.lww.com/co-psychiatry/fulltext/2007/07000/age_of_onset_of_mental_disorders_a_review_of .10.aspx

      Marek, S., Tervo-Clemmens, B., Calabro, F. J., Montez, D. F., Kay, B. P., Hatoum, A. S., Donohue, M. R., Foran, W., Miller, R. L., Hendrickson, T. J., Malone, S. M., Kandala, S., Feczko, E., Miranda-Dominguez, O., Graham, A. M., Earl, E. A., Perrone, A. J., Cordova, M., Doyle, O., … Dosenbach, N. U. F. (2022). eproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654–660. https://doi.org/10.1038/s41586-022-04492-9

      Michelini, G., Barch, D. M., Tian, Y., Watson, D., Klein, D. N., & Kotov, R. (2019). Delineating and validating higher-order dimensions of psychopathology in the Adolescent Brain Cognitive Development (ABCD) study. Translational Psychiatry, 9(1), 261. https://doi.org/10.1038/s41398-019-0593-4

      Morris, S. E., & Cuthbert, B. N. (2012). Research Domain Criteria: Cognitive systems, neural circuits, and dimensions of behavior. Dialogues in Clinical Neuroscience, 14(1), 29–37.

      Morris, S. E., Sanislow, C. A., Pacheco, J., Vaidyanathan, U., Gordon, J. A., & Cuthbert, B. N. (2022). Revisiting the seven pillars of RDoC. BMC Medicine, 20(1), 220. https://doi.org/10.1186/s12916-022-02414-0

      Pat, N., Riglin, L., Anney, R., Wang, Y., Barch, D. M., Thapar, A., & Stringaris, A. (2022). Motivation and Cognitive Abilities as Mediators Between Polygenic Scores and Psychopathology in Children. Journal of the American Academy of Child and Adolescent Psychiatry, 61(6), 782-795.e3. https://doi.org/10.1016/j.jaac.2021.08.019

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2023). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, 33(6), 2682–2703. https://doi.org/10.1093/cercor/bhac235

      Quah, S. K. L., Jo, B., Geniesse, C., Uddin, L. Q., Mumford, J. A., Barch, D. M., Fair, D. A., Gotlib, I. H., Poldrack, R. A., & Saggar, M. (2025). A data-driven latent variable approach to validating the research domain criteria framework. Nature Communications, 16(1), 830. https://doi.org/10.1038/s41467-025-55831-z

      Reef, J., Diamantopoulou, S., van Meurs, I., Verhulst, F., & van der Ende, J. (2010). Predicting adult emotional and behavioral problems from externalizing problem trajectories in a 24-year longitudinal study. European Child & Adolescent Psychiatry, 19(7), 577–585. https://doi.org/10.1007/s00787-010-0088-6

      Rodrigue, A. L., Hayes, R. A., Waite, E., Corcoran, M., Glahn, D. C., & Jalbrzikowski, M. (2024). Multimodal Neuroimaging Summary Scores as Neurobiological Markers of Psychosis. Schizophrenia Bulletin, 50(4), 792–803. https://doi.org/10.1093/schbul/sbad149

      Roza, S. J., Hofstra, M. B., Van Der Ende, J., & Verhulst, F. C. (2003). Stable Prediction of Mood and Anxiety Disorders Based on Behavioral and Emotional Problems in Childhood: A 14-Year Follow-Up During Childhood, Adolescence, and Young Adulthood. American Journal of Psychiatry, 160(12), 2116–2121. https://doi.org/10.1176/appi.ajp.160.12.2116

      Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Perlee et al. sought to generate a zebrafish line where CRISPR-based gene editing is exclusively limited to the melanocyte lineage, allowing assessment of cell-type restricted gene knockouts. To achieve this, they knocked in Cas9 to the endogenous mitfa locus, as mitfa is a master regulator of melanocyte development. The authors use multiple candidate genes - albino, sox10, tuba1a, ptena/ptenb, tp53 - to demonstrate their system induces lineagerestricted gene editing. This method allows researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout (sox10, tuba1a), drive directed phenotypes, such as depigmentation (albino), and induce lineage-specific tumors, such as melanomas (ptena/ptenb, tp53, when accompanied with expression of BRAFV600E). While the genetic approaches are solid, the argued increase in efficiency of this model compared to current tools was untested, and therefore unable to be assessed. Furthermore, the mechanistic explanations proposed to underlie their phenotypes are mostly unfounded, as discussed further in the Weaknesses section. Despite these concerns, there is still a clear use for this genetic methodology and its implementation will be of value to many in vivo researchers.

      Strengths:

      The strongest component of this manuscript is the genetic control offered by the mitfa:Cas9 system and the ability to make stable, lineage-specific knockouts in zebrafish. This is exemplified by the studies of tuba1a, where the authors nicely show non-cell autonomous mechanisms have obfuscated the role of this gene in melanocyte development. In addition, the mitfa:Cas9 system is elegantly straightforward and can be easily implemented in many labs. Mostly, the figures are clean, controls are appropriate, and phenotypes are reproducible. The invented method is a welcomed addition to the arsenal of genetic tools used in zebrafish.

      Weaknesses:

      The major weaknesses of the manuscript include the overly bold descriptions of the value of the model and the superficial mechanistic explanations for each biological vignette.

      The authors argue that a major advantage of this system is its high efficiency. However, no direct comparison is made with other tools that achieve the same genetic control, such as MAZERATI. This is a missed opportunity to provide researchers the ability to evaluate these two similar genetic approaches. In addition, Fig.1 shows that not all melanocytes express Cas9. This is a major caveat that goes unaddressed. It is of paramount importance to understand the percentage of mitfa+ cells that express Cas9. The histology shown is unclear and too zoomed out of a scale to make any insightful conclusions, especially in Fig.S1. It would also be beneficial to see data regarding Cas9 expression in adult melanocytes, which are distinct from embryonic melanocytes in zebrafish. Moreover, this system still requires the injection of a plasmid encoding gRNAs of interest, which will yield mosaicism. A prime example of this discrepancy is in Fig.6, where sox10 is clearly still present in "sox10 KO" tumors.

      We agree with these points. While our method has the advantage of endogenous knockin (thus keeping all regulatory elements), you are correct that we did not make a direct comparison with existing technologies like MAZERATI, and therefore we cannot make comparative claims about efficiency. Based on this, we have revised the manuscript to remove these points, reduce the strength/boldness of the claims, and make it more clear what our system achieves in comparison to existing systems. In reference to the other specific points you raise above about mosaicism and extent of Cas9 expression:

      - We have added a paragraph to address the advantages and disadvantages of mitfaCas9 compared to expression of Cas9 with lineage-specific promoters including MAZERATI in the discussion.  

      - Figure 1C has been revised to more clearly show the overlap of mitfa and Cas9 in melanocytes. 

      - We then quantified the percentage of mitfa+ cells expressing Cas9 from the in situ hybridizations (Supplemental Figure S1D). We did attempt to look at Cas9 protein expression in both embryonic and adult melanocytes by immunofluorescence. Unfortunately, the Cas9 antibodies commercially available did not work on the zebrafish embryos or adult tailfins, so we are limited in proper quantification to the in situs in the embryos.

      The authors argue that their model allows rapid manipulation of melanocyte gene expression. Enthusiasm for the speed of this model is diminished by minimal phenotypes in the F0, as exemplified in Fig.2. Although the authors say >90% of fish have loss of pigmentation, this is misleading as the phenotype is a very weak, partial loss. Only in the F1 generation do robust phenotypes emerge, which takes >6 months to generate. How this is more efficient than other tools that currently exist is unclear and should be discussed in more detail.

      This needed clarification, and we have now modified the Discussion to reflect this more accurately. What we were trying to show is that both F0 and F1 fish can be useful in screening for the effect of any given gene. In the F0, while you are correct that the phenotype is indeed weak/partial, it is also quantifiable and therefore can be used as a rapid screen for potential effects of knockout, so it can help with speed. The major advantage of the F1 generation is that we can generate fully penetrant phenotypes for recessive genes since the fish just needs to have 1 copy of the Cas9/sgRNA instead of 2. This means we do not have to go to F2 or F3 generations, which really does save time. But we agree this could be achieved using MAZERATI, and so we have added these considerations to the manuscript, as we feel these are important.

      In Figure 3, the authors find that melanocyte-specific knockout of sox10 leads to only a 25% reduction in melanocytes in the F1 generation. This is in contradiction to prior literature cited describing sox10 as indispensable for melanocyte development. In addition, the authors argue that sox10 is required for melanocyte regeneration. This claim is not accurate, as >50% of melanocytes killed upon neocuproine treatment can regenerate. This data would indicate that sox10 is required for only a subset of melanocytes to develop (Fig.3C) and for only a subset to regenerate (Fig.3G). This is an interesting finding that is not discussed or interrogated further.

      We too were initially very puzzled by this result. We do not completely understand it, but we have two thoughts about it. First could be timing. sox10 usually starts to be expressed around the 1-somite stage, and so in the original sox10/colourless mutant (which truly has no melanocytes), sox10 will be lost during those early stages. In contrast, mitf comes on later (around 18hpf) so this might indicate that there is a subset of melanocytes that are dependent upon this early expression of sox10. This may indicate that there could be different functions of sox10 early in melanocyte development versus later timepoints after melanocytes have already been specified. This might also help explain our findings during regeneration.  Second could be genetic compensation. Since in the other parts of the paper we seem to see a somewhat reciprocal relationship between sox10 and sox9, it is conceivable that loss of sox10 in the melanocytes could be compensated for by sox9 (or even other genes) in our CRISPR approach (as opposed to the ENU allele in colourless). Since we really do not fully understand this, we have added a section to the Discussion about this issue, mentioning these possibilities but leaving open other yet to be defined mechanisms.

      Tumor induction by this model is weak, as indicated by the tumor curves in Figs.5,6. This might be because these fish are mitfa heterozygous. Whereas the avoidance of mitfa overexpression driven by other models including MAZERATI is a benefit of this system, the effect of mitfa heterozygosity on tumor incidence was untested. This is an essential question unaddressed in the manuscript.

      We agree that in the BRAF;p53 group especially tumor incidence is very low, although PTEN loss does accelerate it. One possibility is exactly as you stated, and that mitfa heterozygosity is the etiology. The other possibility is that in the MAZERATI approach (https://pubmed.ncbi.nlm.nih.gov/30385465/) the authors used the casper background as opposed to the wild-type T5D as we did in our study. In unpublished observations, we have found that casper (with miniCoopR rescue) is markedly more sensitive to melanoma induction compared to WT fish in this setting. In fact, in looking at our BRAF;p53 curves compared to the original Patton paper curves (https://pubmed.ncbi.nlm.nih.gov/15694309/) which were also done in a WT background with no miniCoopR, they are fairly similar. This might indicate that casper + miniCoopR particularly sensitizes the fish to melanoma. However, because we do not fully know the reasons for this, we have now included both of these possible reasons in the Discussion.

      In Fig.6, the authors recapitulate previous findings with their model, showing sox10 KO inhibits tumor onset. The tumors that do develop are argued to be highly invasive, have mesenchymal morphology, and undergo phenotypic switching from sox10 to sox9 expression. The data presented do not sufficiently support these claims. The histology is not readily suggestive of invasive, mesenchymal melanomas. Sox10 is still present in many cells and sox9 expression is only found in a small subset (<20%). Whether sox10-null cells are the ones expressing sox9 is untested. If sox9-mediated phenotypic switching is the major driver of these tumors, the authors would need to knockout sox9 and sox10 simultaneously and test whether these "rare" types of tumors still emerge. Additional histological and genetic evaluation is required to make the conclusions presented in Fig.6. It feels like a missed opportunity that the authors did not attempt to study genes of unknown contribution to melanoma with their system.

      We did not mean to overstate the admittedly early observations from these fish. Invasiveness in the fish models can be difficult to precisely quantify, and therefore is somewhat qualitative. While we did not mean to imply that every cell that loses sox10 will become sox9 positive (which is clearly not the case), the human single-cell RNA-seq data does suggest these are somewhat mutually exclusive populations (https://pubmed.ncbi.nlm.nih.gov/32753671/). This phenomenon has also long been observed even prior to single-cell approaches (https://pubmed.ncbi.nlm.nih.gov/25629959/). So while we agree our data is not definitive in this regard, it is consistent with the literature and was presented mainly to provide areas for future exploration with the model. 

      Overall, this manuscript introduces a solid method to the arsenal of zebrafish genetic tools but falls short of justifying itself as a more efficient and robust approach than what currently exists. The mechanisms provided to explain observed phenotypes are tenuous. Nonetheless, the mitfa:Cas9 approach will certainly be of value to many in vivo biologists and lays the foundation to generate similar methods using other tissue-specific regulators and other Cas proteins.

      We hope that by toning down the language around what we have observed, and providing as honest an assessment as possible as to what might be occurring, that the manuscript will be helpful for future studies aiming to knock out genes in the melanocyte lineage.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes a genetic tool utilizing mutant mitfa-Cas9 expressing zebrafish to knockout genes to analyze their function in melanocytes in a range of assays from developmental biology to tumorigenesis. Overall, the data are convincing and the authors cover potential caveats from their model that might impact its utility for future work.

      Strengths:

      The authors do an excellent job of characterizing several gene deletions that show the specificity and applicability of the genetic mitfa-Cas9 zebrafish to studying melanocytes.

      Weaknesses:

      Variability across animals not fully analyzed.

      To more clearly show variability across animals, we calculated the percentage of mitfa+ cells that express Cas9 across n=7 mitfaCas9 embryos. We also expanded Supplemental Figure 2 to show loss of pigmentation across n=7 individual adult MG-albino F2 fish instead of one representative image.

      Reviewer #3 (Public review):

      Summary:

      Perlee et al. present a method for generating cell-type restricted knockouts in zebrafish, focusing on melanocytes. For this method, the authors knock-in a Cas9 encoding sequence into the mitfa locus. This mitfaCas9 line has restricted Cas9 expression, allowing the authors to generate melanocyte-specific knockouts rapidly by follow-up injection of sgRNA expressing transposon vectors.

      The paper presents some interesting vignettes to illustrate the utility of their approach. These include 1) a derivation of albino mutant fish as a demonstration of the method's efficiency, 2) an interrogation and novel description of tuba1a as a potential non-autonomous contributor to melanocyte dispersion, and 3) the generation of sox10 deficient melanoma tumors that show "escape" of sox10 loss through upregulation of sox9. The latter two examples highlight the usefulness of cell-type targeted knockouts (Body-wide sox10 and tuba1a loss elicit developmental defects). Additionally, the tumor models involve highly multiplexed sgRNAs for tumor initiation which is nicely facilitated by the stable Cas9.

      Strengths:

      The approach is clever and could prove very useful for studying melanocytes and other cell types. As the authors hint at in their discussion, this approach would become even more powerful with the generation of other Cas9-restricted lineages so a single sgRNA construct can be screened across many lineages rapidly (or many sgRNA and fish lines screened combinatorially).

      The biological findings used to demonstrate the power of the approach are interesting in their own right. If it proves true, tuba1a's non-autonomous effects on melanosome dispersion are striking, and this example demonstrates very nicely how one could use Perlee et al.'s approach to search for other non-autonomous mechanisms systematically. Similarly, the observation of the sox9 escape mechanism with sox10 loss is a beautiful demonstration of the relevance of SOX10/SOX9's reciprocal regulation in vivo. This system would be a very nice model for further interrogating mechanisms/interventions surrounding Sox10 in melanoma.

      Finally, the figure presentation is very nice. This work involves complex genetic approaches including multiple fish generations and multiplexed construct injections. The vector diagrams and breeding schemes in the paper make everything very clear/"grok-able," and the paper was enjoyable to read.

      Weaknesses:

      The mitfa-driven GFP on their sgRNA-expressing cassette is elegant, but it makes one wonder why the endogenous knock-in is necessary. It would strengthen the motivation of the work if the authors could detail the potential advantages and disadvantages of their system compared to expressing Cas9 with a lineage-specific promoter from a transposon in their introduction or discussion.

      We agree this needed a better and more clear explanation. There are many excellent examples of promoter driven Cas9 approaches. Within melanocytes, Ablain and others have developed the MAZERATI system (https://pubmed.ncbi.nlm.nih.gov/30385465/) which is very powerful, especially for melanoma development. In our minds, the major advantage of endogenous knockin is that we retain all of the natural regulatory elements (many of which are not known) and so small promoter fragments always run the risk of missing certain types of regulation. While these regulatory elements may not matter under homeostatic conditions, they may become very important under perturbation, stress or disease states. This is why it is common, for example, in the mouse field, to knock in things like Cre into endogenous loci. We have now added a clarification of this to the manuscript.

      Related to the above - is mitfa haplosufficient? If the mitfaCas9/+ fish have any notable phenotypes, it would be worth noting for others interested in using this approach to study melanoma and pigmentation.

      In normal melanocytes, mitfa is haplosufficient. There are no visible differences between mitfaCas9/+ and wild-type fish at any stages of development (Figure S1C). Although we did not directly compare tumor growth in mitfa-/+ and mitfa+/+ fish in this study, it is possible that the disruption of mitfa in mitfaCas9/+ fish affects melanoma development. Most zebrafish melanoma models involve the overexpression of mitfa with MiniCoopR vectors and it would be interesting in future studies to determine how mitfa heterozygosity affects melanoma initiation or progression. 

      A core weakness (and also potential strength) of the system is that introduced edits will always be non-clonal (Fig 2H/I). The activity of individual sgRNAs should always be validated in the absence of any noticeable phenotype to interpret a negative result. Additionally, caution should be taken when interpreting results from rare events involving positive outgrowth (like tumorogenesis) to account for the fact many cells in the population might not have biallelic null alleles (i.e., 100% of the gene product removed).

      Along those lines: in my opinion, the tuba1a results are the most provocative finding in the paper, but they lack key validation. With respect to cutting activity, the Alt-R and transgenic sgRNA expression approaches are not directly comparable. Since there is no phenotype in the melanocyte specific tuba1a knockouts, the authors must confirm high knockout efficiency with this set of reagents before making the claim there is a non-autonomous phenotype. This can be achieved with GFP+ sorting and NGS like they performed with their albino melanocytes.

      The whole-body tuba1a knockout phenotype is expected to be pleiotropic, and this expectation might mask off-target effects. Controls for knockout specificity should be included. For instance, confidence in the claims would greatly increase if the dispersed melanosome phenotype could be recovered with guide-resistant tuba1a re-expression and if melanocyte-restricted tuba1a reexpression failed to rescue. As a less definitive but adequate alternative, the authors could also test if another guide or a morpholino against tuba1a phenocopies the described Alt-R edited fish.

      Thank you for your thoughtful suggestions, which led us to an important discovery. While validating the original tuba1a guide RNA, we found that tuba1a sg1 also targets tuba1c, a gene that shares 99.78% homology with tuba1a in zebrafish. To determine which gene was responsible for the melanocyte phenotype, we designed multiple new guide RNAs specifically targeting either tuba1a or tuba1c and used Alt-R to globally knock them out in zebrafish embryos. However, none of these guides successfully replicated the phenotype (Sanger sequencing validation for the most efficient tuba1a and tuba1c guides is provided below).

      Ultimately, we identified a new guide RNA (5’-GGTCTACAAAGACAGCCCTA-3’) that successfully phenocopied the original tuba1a sg1 melanocyte phenotype. Tuba1c—but not tuba1a—was predicted to have a mismatch at the 3’ end of the guide sequence, which is typically expected to inhibit target cleavage. Surprisingly, despite this mismatch, we observed robust cleavage in both tuba1a and tuba1c. Since the melanocyte phenotype was only reproducible when both tuba1a and tuba1c were targeted, this suggests potential compensatory interactions between these highly similar genes. We have updated the text and figures to reflect this finding and have included validation of this second guide RNA (tuba1a/c sg2) in Supplemental Figure 3.

      As you suggested, we also conducted GFP+ sorting and NGS to confirm knockout of both tuba1a and tuba1c in melanocytes of mitfaCas9 fish (Figure S3G). The knockout percentages were comparable to those observed in our previous experiment with MG_-albino_ fish. This also confirms that this method can be used to sort and sequence GFP+ cells even when pigmentation is retained, which was not the case for albino fish. 

      I have similar questions about the sox10 escapers, but these suggestions are less critical for supporting the authors claims (especially given the nice staining). Are the sox10 tumors relatively clonal with respect to sox10 mutations? And are the sox10 tumor mutations mostly biallelic frameshifts or potential missense mutations/single mutations that might not completely remove activity? I am particularly curious as SOX10 doesn't seem to be completely absent (and is still very high in some nuclei) in the immunohistochemistry.

      We attempted to address this question by performing DNA sequencing on the FFPE blocks that we had retained from the original study. While our sequencing facility said this should be possible, we could not consistently generate high enough quality DNA to make a definitive statement either way. While we are very curious to know what the nature of the mutations are in these “escapers”, the student who performed these studies has now graduated, and it would take us several additional months to a year to fully address it. Given this, we would prefer to leave this open question to a future paper, but have addressed this limitation in the Discussion.

      Recommendations for the authors:

      Reviewing Editor:

      Overall, the reviewers felt and eLife concurs that your manuscript is insightful and appropriate for publication. Reviewers were impressed by your generating a zebrafish line where CRISPRbased gene editing is exclusively limited to the melanocyte lineage, allowing assessment of celltype restricted gene knockouts. Your use of multiple candidate genes to demonstrate that your system induces lineage-restricted gene editing is compelling and will be of interest to the broad readership of eLife. This method will allow researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout, drive directed phenotypes, such as depigmentation, and induce lineage-specific tumors, such as melanomas. This said, the argued increase in efficiency of this model compared to current tools was untested, and therefore it remains difficult for a reader to assess the extent to which your new model represents a major advance over prior ones. Of additional concern are the mechanistic explanations proposed to underlie the phenotypes, as these are largely unfounded. Thus, in preparing your final publication version of the paper, eLife strongly encourages you to fully address the reviewers' thoughtful comments. In particular, the boldness of the claims made in the manuscript should be reduced. Terms like "highly efficient" and "rapid" are unsupported due to the lack of comparison with other well-established methods, like MAZERATI.

      As discussed above in each of the reviewer points above, we agree with both of these points. We have reduced the boldness of the claims, with a better discussion of the different approaches. We also address the potential mechanisms of our observations, and where and why we still lack an understanding of what gives rise to those phenotypes. 

      There are also some minor discrepancies that should be edited in the manuscript: Fig.2A plasmid description is written oppositely in text; Fig.3 labels G-H are swapped in the legend description; Fig.5A MTdT is unexplained. This is a non-exhaustive list, and the authors are encouraged to carefully read through their manuscript to revise other minor mistakes and formatting errors.

      Figure 2A was revised to show the correct orientation of mitfa:GFP and the guide RNA cassette as described in the text. Figure 3 legend was fixed. We have gone through the manuscript again to make sure we have not made any other errors, to the best of our knowledge.

      The biggest concern is the expression of cas9 and the weak histological support shown in Fig.1 and Fig.S1. It would be a benefit to all readers and potential future users to know how robust cas9 expression is in the melanocyte lineage. It would be helpful if there is a way to analyze the percentage of cells that are mutated in each animal to understand the variability that can exist across animals with the method.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The analysis of the scRNA sequencing could also be described more fully.

      More details have been added to the scRNA sequencing analysis including the functions that were used. 

      The final major concern is whether this model is genuinely more valuable than MAZERATI. A more elaborate discussion would benefit potential future users to guide their decisions regarding which tool best suits their experimental goals.

      As noted above, we agree with this statement. The reviewers are correct in that we did not directly compare our system to MAZERATI, and therefore cannot make any claims about efficiency in a comparative regard. Therefore, in our revised Discussion, we talk about the relative strengths and weaknesses of each approach, and emphasize that our approach mainly has the advantage of retaining endogenous regulatory elements for mitfa, but that each user should decide which is the best approach for their problem.

      There are also some minor concerns that should be addressed.

      Are the mitfaCas9 fish used as homozygotes before the first cross? If so, might be nice to include their nacre-like phenotype in diagrams like Fig 2A.

      For these studies, heterozygous mitfaCas9 fish were used for all breedings and progeny were sorted for BFP+ eyes. This enabled the comparison to sibling controls without Cas9 expression. 

      BFP+ eye screening for mitfaCas9 is elegant and included nicely in the diagrams. Are germline sgRNA integrants identified in F1 with melanocyte GFP? Or present at a high enough efficiency that this is not relevant? This would be good to include in the diagrams.

      Germline sgRNA integrants are identified with melanocyte GFP in embryos. Figure 2A has been edited to show GFP expression. 

      Most cells are GFP positive in S3C (the F0 "mosaic"). It might be nice to show a single GFP stripe like in the other panels for direct comparison of edited/non-edited in the same fish.

      This figure (now S3E) has been edited to show a clear comparison between GFP+ and GFP- cells in the same fish. 

      177 - CRISPR-Seq is basically amplicon sequencing. This would measure efficiency but not "specificity" as described. Off-target activity would have to be measured at other loci etc. Not necessary to do, but I don't think measured.

      In this case, “specificity” refers to cell type specificity, not genomic specificity. We are measuring cell type specificity by comparing on-target cutting in GFP+ cells (melanocytes) versus GFP- cells (non-mitfa expressing cells). We did not look at off-target activity of Cas9 in this study and have edited the text to make this clearer. 

      219 -"several gaps were visible"

      Fixed

      286 - TUBA1A should be italicized

      Fixed

      399 - SOX9's most enriched dependency in DepMap is cutaneous melanoma and its top coessential gene is SOX10. I'm not sure the SOX9/SOX10 interaction couldn't be parsed from DepMap alone.

      This is true, and the DepMap was actually somewhat of an inspiration for our own studies. We have modified the line to acknowledge this and explain the main advantage of our system is in vivo confirmation of what the DepMap had alluded to.

      433 - "fewer animals since all F1 animals (even those for recessive alleles) are informative."

      The fact that this is approach is faster and more efficient per animal is important to highlight (and very believable), but is this technically true given not all F1 fish will have Cas9 or a germline sgRNA integration?

      In considering this statement, we agree with you and decided to remove it from the text.

      We hope the comments in both the public and private reviews will help improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Overall, the boldness of the claims made in the manuscript should be reduced. Terms like "highly efficient" and "rapid" are unsupported due to the lack of comparison with other wellestablished methods, like MAZERATI.

      As discussed above, we agree with this and have now modified the manuscript to better reflect what our system achieves in comparison to the well developed systems such as MAZERATI. Because we have not done a direct comparison, we are not able to make any claims about comparative efficiency, and instead focus on the potential benefits of a knockin approach, which is the maintenance of endogenous regulatory elements.

      There are some minor discrepancies that should be edited in the manuscript: Fig.2A plasmid description is written oppositely in text; Fig.3 labels G-H are swapped in the legend description; Fig.5A MTdT is unexplained. This is a non-exhaustive list, and the authors are encouraged to carefully read through their manuscript to revise other minor mistakes and formatting errors.

      Figure 2A was revised to show the correct orientation of mitfa:GFP and the guide RNA cassette as described in the text. Figure 3 legend was fixed. We have gone through the manuscript again to make sure we have not made any other errors, to the best of our knowledge.

      The biggest concern is the expression of cas9 and the weak histological support shown in Fig.1 and Fig.S1. It would be a benefit to all readers and potential future users to know how robust cas9 expression is in the melanocyte lineage.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The second major concern is whether this model is genuinely more valuable than MAZERATI. A more elaborate discussion would benefit potential future users to guide their decision regarding which tool best suits their experimental goals.

      As noted above, we agree with this statement. The reviewers are correct in that we did not directly compare our system to MAZERATI, and therefore cannot make any claims about efficiency in a comparative regard. Therefore, in our revised Discussion, we talk about the relative strengths and weaknesses of each approach, and emphasize that our approach mainly has the advantage of retaining endogenous regulatory elements for mitfa, but that each user should decide which is the best approach for their problem.

      We hope the comments in both the public and private reviews will help improve the manuscript.

      Reviewer #2 (Recommendations for the authors):

      While that authors show the indel charts for the Crispr mutations generated in the supplement. However, I wonder if there is a way to analyze the percentage of cells that are mutated in each animal to understand the variability that can exist across animals with the method.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The analysis of the scRNA sequencing could be described more fully.

      More details have been added to the scRNA sequencing analysis including the functions that were used. 

      Reviewer #3 (Recommendations for the authors):

      This was an excellent read, and I'm very interested in seeing it in its final form. Congratulations! My larger critiques are outlined in the public reviews. A few smaller points:

      Are the mitfaCas9 fish used as homozygotes before the first cross? If so, might be nice to include their nacre-like phenotype in diagrams like Fig 2A.

      For these studies, heterozygous mitfaCas9 fish were used for all breedings and progeny were sorted for BFP+ eyes. This enabled the comparison to sibling controls without Cas9 expression. 

      BFP+ eye screening for mitfaCas9 is elegant and included nicely in the diagrams. Are germline sgRNA integrants identified in F1 with melanocyte GFP? Or present at a high enough efficiency that this is not relevant? This would be good to include in the diagrams.

      Germline sgRNA integrants are identified with melanocyte GFP in embryos. Figure 2A has been edited to show GFP expression. 

      Most cells are GFP positive in S3C (the F0 "mosaic"). It might be nice to show a single GFP stripe like in the other panels for direct comparison of edited/non-edited in the same fish.

      This figure (now S3E) has been edited to show a clear comparison between GFP+ and GFP- cells in the same fish. 

      177 - My understanding is that CRISPR-Seq is basically amplicon sequencing. This would measure efficiency but not "specificity" as described. Off-target activity would have to be measured at other loci etc. Not necessary to do in my opinion, but I don't think measured.

      In this case, “specificity” refers to cell type specificity, not genomic specificity. We are measuring cell type specificity by comparing on-target cutting in GFP+ cells (melanocytes) versus GFP- cells (non-mitfa expressing cells). We did not look at off-target activity of Cas9 in this study and have edited the text to make this clearer. 

      219 -"several gaps were visible"

      Fixed

      286 - TUBA1A should be italicized

      Fixed

      399 - I think I understand the logic of the DepMap argument, and the importance of studying tumor initiation in vivo stands for itself. But here is maybe not the best example (or might need clarification)? - SOX9's most enriched dependency in DepMap is cutaneous melanoma and its top co-essential gene is SOX10. I'm not sure the SOX9/SOX10 interaction couldn't be parsed from DepMap alone.

      This is true, and the DepMap was actually somewhat of an inspiration for our own studies. We have modified the line to acknowledge this and explain the main advantage of our system is in vivo confirmation of what the DepMap had alluded to.

      433 - "fewer animals since all F1 animals (even those for recessive alleles) are informative."

      The fact that this is approach is faster and more efficient per animal is important to highlight (and very believable), but is this technically true given not all F1 fish will have Cas9 or a germline sgRNA integration?

      In considering this statement, we agree with you and decided to remove it from the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors present a new protocol to assess social dominance in pairs and triads of C57BL/6j mice, based on a competition to access a hidden food pellet. Using this new protocol, the authors have been able to identify stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. Ranking readouts identified with this new apparatus were compared to the outcomes obtained with the same animals competing in the tube and in the warm spot tests, which have been both commonly used during the last decade to identify social ranks in rodents under laboratory conditions.

      Strengths:

      FPCT allows for easy and fast identification of a winner and a loser in the context of food competition. The apparatus and the protocol are relatively easy and quick to implement in the lab and free from any complex post-processing/analysis, which qualifies it for wide distribution, particularly within laboratories that do not have the resources to implement more sophisticated protocols. Hierarchical readouts identified through the FPCT correlate with social ranks identified with the tube and the warm spot tests, which have been widely adopted during the last decade and allow for study comparison.

      Weaknesses:

      While the FPCT is validated by the tube and the warm spot test, this paper would have gained strength by providing a more ethologically based validation. Tube and warm spot tests have been shown to provide conflicting results and might not been a sufficient measurement for social ranking (see Varholik et al, Scientific reports, 2019; Battivelli et al, Biological psychiatry, 2024). Instead, a general consensus pushing toward more ethological approaches for neuroscience studies is emerging.

      We appreciate all the reviewers for recognizing the strength of the FPCT setup and the data. We also appreciate the reviewers for pointing out weakness and giving us valuable suggestions that help us to improve the quality of our manuscript through revision.

      In this manuscript, we found the ranking results of the FPCT were largely consistent with the tube and the warm spot tests. Such a finding was unexpected by us as we considered that different competitive targets of different paradigms should provide the mice with distinct appeals and enable them to exert their specific advantages. However, the consistency between the FPCT and tube test was observed in the pairs of female mice, pairs of male mice and triads of male mice. The consistency between the FPCT, tube test and warm spot test was observed in pairs of male mice and triads of male mice. Thus, we concluded that there is a social rank-order stability of mice. 

      We acknowledge that it’d better if this conclusion could be validated by more ethological approaches like urine-marking analysis and water competition test. Whereas, we did not rule out inconsistency of ranking results between two or more paradigms. Actually, there were inconsistent cases in our experiments. The inconsistency of ranking results between paradigms, even between FPCT and tube test, could be amplified if the tests were operated with other details of experimental protocols and conditions. This is in that too many factors and aspects can affect the readouts, such as formation of colony, tasks, test protocols, habituation and training. Using tube test itself, both stable 1,2 and unstable 3 ranking results have been reported.

      Other papers already successfully identified social ranks dyadic food competition, using relatively simple scoring protocol (see for example Merlot et al., 2006), within a more naturalistic set-up, allowing the 2 opponents to directly interact while competing for the food. A potential issue with the FPCT, is that the opponents being isolated from each other, the normal inhibition expected to appear in subordinates in the presence of a dominant to access food, could be diminished, and usually avoiding subordinates could be more motivated to push for the access to the food pellet.

      The hierarchical structure of mice colony could be established on the basis of physical aspects—such as muscular strength, vigorousness of fighting—and psychological aspects— such as boldness, focused motivation, active self-awareness of status. In the contexts of currently available food contest paradigms where the mice compete with bodily interaction, the physical and psychological aspects are intermingled in the interpretation of the mice’s winning/losing. In the FPCT, the opponents are isolated from each other so that the importance of direct bodily interaction in a competition is minimized, facilitating the exposure of psychological factors contributing to the establishment and/or expression of social status of the mice. In this study, the overall stable ranking results across the FPCT, tube test and warm spot test indicate that the status sense of animals is part of a comprehensive identify of self-recognition of individuals in an established mice social colony.

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      We thank the reviewer for pointing out language issues. We have carefully corrected the grammar errors.

      Open question:

      Is food restriction mandatory? Palatable food pellet is not sufficient to trigger competition? Food restriction has numerous behavioral and physiological consequences that would be better to prevent to be able to clearly interpret behavioral outcomes in FPCT (see for example Tucci et al., 2006).

      We thank the reviewer for raising this question. In the preliminary experiments, we noticed that food restriction was mandatory and palatable food pellet was not sufficient to trigger competition. In order to limit the potential influence of food restriction on competitive behavior, the mice underwent only a 24-hour food deprivation period at the beginning of training, followed by mild restriction of food supply to meet basic energy requirement.

      Conclusive remarks:

      Although this protocol attempts to provide a novel approach to evaluate social ranks in mice, it is not clear how it really brings a significant advance in neuroscience research. The FPCT dynamic is very similar to the one observed in the tube test, where mice compete to navigate forward in a narrow space, constraining the opponent to go backward. The main difference between the FPCT and the tube test is the presence of food between the opponents. In the tube test, a food reward was initially used to increase motivation to cross the tube and push the opponent upon the testing day. This component has been progressively abandoned, precisely because it was not necessary for the mice to compete in the tube.

      This paper would really bring a significant contribution to the field by providing a neuronal imaging or manipulation correlate to the behavioral outcome obtained by the application of the FPCT.

      Thank the reviewer for this comment on the significance of the FPCT paradigm. In this manuscript, we think it is interesting to report that the ranking results were consistent across the FPCT, tube test and warm spot test. This finding indicates that the status sense of animals might be a part of a comprehensive identify of self-recognition of individuals in an established social colony. 

      Moreover, we are conducting researches on biological consequences and mechanisms of social competition. Hopefully, the results of the on-going project will be published in the near future.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have devised a novel assay to measure relative social rank in mice that is aimed at incorporating multiple aspects of social competition while minimizing direct contact between animals. Forming a hierarchy often involves complex social dynamics related to competitive drives for different fundamental resources including access to food, water, territory, and sexual mates. This makes the study of social dominance and its neural underpinnings hard, warranting the development of new tools and methods that can help understand both social functions as well as dysfunction.

      Strengths:

      This study showcases an assay called the Food Pellet Competition Test where cagemate mice compete for food, without direct contact, by pushing a block in a tube from opposite directions. The authors have attempted to quantify motivation to obtain the food independent of other factors such as age, weight, sex, etc. by running the assay under two conditions: one where the food is accessible and one where it isn't. This assay results in an impressive outcome consistency across days for females and males paired housed and for male groups of three. Further, the determined social ranks correlate strongly with two common assays: the tube test and the warm spot test.

      Weaknesses:

      This new assay has limited ethological validity since mice do not compete for food without touching each other with a block in the middle. In addition, the assay may only be valid for a single trial per day making its utility for recording neural recordings and manipulations limited to a single sample per mouse. Although the authors attempt to measure motivation as a factor driving who wins the social competition, the data is limited. This novel assay requires training across days with some mice reaching criteria before others. From the data reported, it is unclear what effects training can have on the outcome of social competition. Beyond the data shown, the language used throughout the manuscript and the rationale for the design of this novel assay is difficult to understand.

      We appreciate the reviewers for the valuable comments on the strength and weakness of our manuscript. 

      The design mentality of the FPCT was to (1) provide researchers with a choice of new food competition paradigm and (2) expose psychological factors influencing the establishment and/or expression social status in mice by avoiding direct physical competition between contenders (see revised Abstract and the last paragraph in the Introduction).

      As a result, the consistent ranking across the FPCT, tube test and warm spot test might indicate that the status sense of animals is part of a comprehensive identify of self-recognition of individuals in an established social colony. 

      We suggest to perform the FPCT test one trial per day per mouse as the mice might lose interest in the food pellet if it is tested frequently in a day, but it is practical to perform the FPCT assay for several days. 

      Regarding the training, we suggest 4-5 days for training as we did. In this revision, we add training data which show the progressing latency of food-getting of mice (Figure 1). At the last day of training, the mice would go directly to push the block and eat the food after they entered the arena.

      We thank the reviewer for pointing out language issues. We have carefully corrected the errors.

      Reviewer #3 (Public review):

      Summary:

      The laboratory mouse is an ideal animal to study the neural and psychological underpinnings of social dominance behavior because of its economic cost and the animals' readiness to display dominant and subordinate behaviors in simple and testable environments. Here, a new and novel method for measuring dominance and the individual social status of mice is presented using a food competition assay. Historically, food competition assays have been avoided because they occur in an open arena or the home cage, and it can be difficult to assess who gets priority access to the resource and to avoid aggressive interactions such as bite wounding. Now, the authors have designed a narrow rectangular arena separated in half by a sliding floor-to-ceiling obstacle, where the mice placed at opposite sides of the obstacle compete by pushing the obstacle to gain priority access to a food pellet resting on the arena floor under the obstacle. One can also place the food pellet within the obstacle to restrict priority access to the food and measure the time or effort spent pushing the obstacle back and forth. As hypothesized, the outcomes in the food competition test were significantly consistent with those of the more common tube test (space competition) and warm spot competition test. This suggests that these animals have a stereotypic dominance organization that exists across multiple resource domains (i.e., food, space, and temperature). Only male and female C57 mice in same-sex pairs or triads were tested.

      Strengths:

      The design of the apparatus and the inclusion of females are significant strengths within the study.

      Weaknesses:

      There are at least two major weaknesses of the study: neglecting the value of test inconsistency and not providing the mice time to recognize who they are competing with.

      Several studies have demonstrated that although inbred mice in laboratory housing share similar genetics and environment, they can form diverse types of hierarchical organizations (e.g., loose, stable, despotic, linear, etc.) and there are multiple resource domains in the home cage that mice compete over (e.g., space, food, water, temperature, etc.). The advantage of using multiple dominance assays is to understand the nuances of hierarchical organizations better. For example, some groups may have clear dominant and subordinate individuals when competing for food, but the individuals may "change or switch" social status when competing for space. Indeed, social relationships are dynamic, not static. Here, the authors have provided another test to measure another dimension of dominance: food competition. Rather than highlight this advantage, the authors highlight that the test is in agreement with the standard tube test and warm spot test and that C57 mice have stereotypic dominance across multiple domains. While some may find this great, it will leave many to continue using the tube test only (which measures the dimension of space competition) and avoid measuring food competition. If the reader looks at Figures 6E, F, and G they will see examples of inconsistency across the food competition test, tube test, and warm spot test in triads of mice. These groups are quite interesting and demonstrate the diversity of social dynamics in groups of inbred mice in highly standardized environmental conditions. Scientists interested in dominance should study groups that are consistent and inconsistent across multiple dimensions of dominance (e.g., space, food, mates, etc.).

      Unlike the tube test and warm spot test, the food competition test presented here provides no opportunity for the animals to identify their opponent. That is, they cannot sniff their opponent's fur or anogenital region, which would allow them an opportunity to identify them individually. Thus, as the authors state, the test only measures psychological motivation to get a food reward. Notably, the outcome in the direct and indirect testing of food competition is in agreement, leaving many to wonder whether they are measuring the social relationship or the effort an individual puts forth in attaining a food reward regardless of the social opponent. Specifically, in the direct test, an individual can retrieve the food reward by pushing the obstacle out of the way first. In the indirect test, the animals cannot retrieve the reward and can only push the obstacle back and forth, which contains the reward inside. In Figure 4E, you can see that winners spent more time pushing the block in the indirect test. Thus, whether the test measures a social relationship or just the likelihood of gaining priority access to food is unclear. To rectify this issue, the authors could provide an opportunity for the animals to interact before lowering the obstacle and raising(?) a food reward. They may also create a very long one-sided apparatus to measure the amount of effort an individual mouse puts forth in the indirect test with only one individual - or any situation with just one mouse where the moving obstacle is not pushed back, and the animal can just keep pushing until they stop. This would require another experiment. It also may not tell us much more since it remains unclear whether inbred mice can individually identify one another

      (see https://doi.org/10.1098/rspb.2000.1057 for more details).

      A minor issue is that the write-up of the history of food competition assays and female dominance research is inaccurate. Food competition assays have a long history since at least the 1950s and many people study female dominance now.

      Food competition: https://doi.org/10.1080/00223980.1950.9712776, https://psycnet.apa.org/fullte xt/1953-03267-

      001.pdf, https://doi.org/10.1016/j.bbi.2003.11.007, https://doi.org/10.1038/s41586-02204507-5

      Female dominance: history  https://doi.org/10.1016/j.cub.2023.03.020,  https://doi.org/10.1016/S0 031-9384(01)00494-2,  https://doi.org/10.1037/0735-7036.99.4.411

      We thank the reviewers very much for so many helpful comments and suggestions.

      In this manuscript, we want to address the overall and averagely consistency of ranking results between FPCT, tube test and warm spot test) as an unexpected finding. We agree that the inconsistency of social ranking occurred between trials and between paradigms should not be ignored. In the revision, we added description and discussion of inconsistent part of the different test paradigms (paragraph 2 in the section 3 of the Result, last 2 sentences of paragraph 4 in the Discussion)

      Although the two opponents were separated each other, they were able to see and sniff each other because the block is transparency, there are holes in the lower portion of the block, and there is the gap between the block and chamber (Supplementary figures 1 and 2). In the female but not male groups, the presence of a cagemate opponent during the test 1 could significantly disturb the female mice and increase the its latency to get the food, comparing with last day of training when there was no opponent (Figure 3A). This indicates that one mouse, at least female mouse, could identify the existence of the opponent in the opposite side of the chamber. To further see whether social relation was influential to readouts of the FPCT, we performed additional experiments using two groups of non-cagemate mice to perform the competition. We did not detect obviously different ranks between the two groups (Figure 1H-1J), suggesting that establishment of social colony is necessary for FPCT to distinguish social ranks of mice.

      Thank the reviewer for reminding us to recognize the history of food competition assays. We have added the citations and discussions of related literatures, both for male (paragraph 2 in the Introduction; paragraph 3 in the Discussion) and female (paragraph 1 of section 3 in the Results; paragraph 4 in the Discussion) mice. 

      Reviewer #1 (Recommendations for the authors):

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      We appreciate the reviewer for constructive comments and helpful corrections.

      “Despite that 6 in 9 groups of mice display some extent of flipped ranking (Figures 6B-6G) and only 3 in 9 groups displayed continuously unaltered ranking (Figure 6H) during a total of 9 trials consisting of 3 trials of FPCT, 3 trials of tube test and 1 trial of WST, an obvious stable linear intragroup hierarchy was observed throughout all the trials and tasks"

      The above sentence has been re-written as: The ranking result showed that 6 in 9 groups of mice displayed some extent of flipped ranking (Figures 4B-4G), and only 3 in 9 groups displayed continuously unaltered ranking (Figure 4H). Averagely, in the totally 27 trials consisting of 12 trials of FPCT, 12 trials of tube test and 3 trials of WST, an obvious stable linear intragroup hierarchy was observed across all the trials and tasks (paragraph 1 of section 4 in the Results).

      "it is hard to attribute winning a competition in a shared space to stronger motivation rather than muscular superiority".

      The above sentence has been deleted and re-written in paragraph 1 of section 4 in the Results and paragraph 3 in the Discussion.

      "Unexpectedly, in most of the trials the mice preserved the winner or loser identity acquired in FPCT into tube test and WST (Figures 5L-5O)".

      Why this is unexpected? Instead, it looks like this result is expected (tube test has been successfully applied to identify ranks in females, see Leclair et al, eLife, 2021).

      We thank the reviewer for raising this point. FPCT is different from tube test and warm spot test at least in two aspects: competition for food vs space; presence vs absence of direct bodily interaction during competition. Some mice might be active in food competition, but not in space competition, while others might be on the contrary. Some mice might be good at physical contest, while others might be good at play tricks. Therefore, these factors made us expect task-specific outcomes of ranking results.

      Vocabulary issues:

      "Stereotypic", to talk about rank stability in a different context does not look appropriate. In behavioral neuroscience, stereotypy is more excepted to intend abnormal repetitive behaviors. The stability that the authors seem to indicate with the word "stereotype" refers rather to the concept of "consistency" or "stability".

      We thank the reviewer for this detailed explanation. We have chosen to use "stability" to describe the data.

      "Society", to talk about groups or colonies of animals sounds a bit odd. Society evokes more abstract concepts more likely to fit with human organization. I suggest the use of "group" or "colony".

      "Hide" to qualify the block preventing access to the food pellet. It is said that the block is transparent. We suggest the use of "inaccessible" instead of hidden.

      We strongly encourage the authors to further edit the entire script to improve language.

      Thank the reviewer for kind correction. We have corrected the above vocabulary misuse. 

      Technical issues / typos:

      Figure 1. The picture does not seem optimal to visualize the apparatus.

      Missing unit legend in Figure 4E.

      Supplementary videos 2 and 4 are missing.

      We have added a frontal view of the apparatus in the figure (Supplementary Figure 1), added a unit to the Figure 2F (previous Figure 4E), and we will make sure to upload the missing videos.

      Reviewer #2 (Recommendations for the authors):

      While the assay shows promise as a tool for studying social dominance, the study suffers from some limitations such as lack of ethological relevance. In addition, there is a lack of rationale and methodological clarity in the manuscript that can impact the ability of other scientists to be able to perform this novel assay.

      (1) Related to lack of scientific rigor:

      a. In the first paragraph of the introduction, the authors mention that "disability in social recognition and unsatisfied social status are associated with brain diseases such as autism, depression and schizophrenia". Both papers that they cited refer to mouse models, not humans (which is the species that is attributed these diagnoses clinically). In addition, neither citation discusses schizophrenia. While social dysfunctions can indeed be related to these diseases, to my knowledge this is not caused by a change in "social status" and there is no human data with patient populations and social status. Therefore, this sentence is inaccurate and there is no research that demonstrates that.

      We thank the reviewer for raising this point. To express the opinion and cite literatures more accurately, we improved the sentence in the 1st paragraph of Introduction as follows: “Impaired awareness of social competition has been documented in individuals with autism spectrum disorder (ASD)4,5, and reduced social interaction has been characterized in corresponding animal models6. Similarly, maladaptive responses to social status loss has been associated with patient depressive disorders7,8 and animal models of depression1,9”. The reviewer is right that no patient disease is causally related with social status, and only depression has been proposedly associated with change of social status7,8.

      b. In the second paragraph of the introduction, the authors mention a scarcity of research papers with designs for food competition-based social hierarchy assays for mice. At least two such papers have been published in the past few years (DOIs https://doi.org/10.1038/s41586-

      021-04000-5 and https://doi.org/10.1038/s41586-022-04507-5). The authors should acknowledge the existence of these and other assays and discuss how their work would be related. In the same paragraph, they also mention that existing assays suffer from "hierarchy instability" and "complex calculations" without showing any citations or details for these claims.

      We thank the reviewer for raising this point. We acknowledged that there are some available food competitions to measure social hierarchy for mice. But relative to space competition, food competition tests have not been used so commonly and widely. No food competition paradigm has been accepted as generally as some space competition paradigms like tube test and warm spot test. To improve the language and scientific expression, we revised the sentences as follows: “Relative to space competition, food competition tests for mice have been designated and applied less commonly in animal studies despite its long history 28-30. Several issues could be thought to be the underlying limitations for the application of food competition paradigms. First, there are methodological issues in some of these approaches, such as long video recording duration and difficulty in analyzing animal’s behaviors during competitive physical interaction in videos, hindering their application by laboratories that cannot afford sophisticated equipment and analysis”. Corresponding citations have been updated (see paragraph 3 in the Introduction).

      c. The authors say that their study is the first to demonstrate that female mice follow social ranks. This is not the first study to do so and the authors should acknowledge existing publications that have done the same (eg DOI https://doi.org/10.7554/eLife.71401).

      We have followed the reviewer’s suggestion to increase citations regarding social ranking of female mice tested by competition paradigms, especially food competition paradigms (see paragraph 1 of section 3 in the Results; paragraph 4 in the Discussion).

      (2) Related to problems with interpretation of data:

      a. The authors showed the assay works for females and males in pairwise housing, but two mice don't make a hierarchy, as hierarchies require a minimum of three individuals. Therefore, whether the assay works for females caged in three is an important question that is unaddressed in this study and is a caveat. extended the competition assay to male mice that are housed in cages of three. It would be important to show whether the assay generalizes well for female mice with this three-animal housing as well as discuss the effect of using even bigger groups of mice on the results of the assay.

      We thank the reviewer for raising questions related to the interpretation of data and giving us the insightful the suggestions. We agree that it is interesting and important to probe if FPCT works for a group of three female mice. Although social rankings of pairs of male and female mice were not significantly different (new Figure 2D-2F and 3F-3H), that of triads of male and female mice could be different. We have tested trads of male mice and found that the mice displayed an overall linear hierarchical ranking. We would like to use FPCT to investigate the rankings of trads of female mice and even bigger group of mice in the future. In the present manuscript we’d like to address the feasible application of the FPCT in smaller groups. In the Discussion, we add contents commenting group size effect on social competition tests (see paragraph 4 in the Discussion).

      b. The authors claim that "test 2" of their assay helps assert the motivation of mice for social competition as in Figure 4E. This could simply be a readout of how strong the mice are (muscle mass). To claim that this is indeed related to motivation during the FPCT assay, the authors should show the correlation of this readout with the latency to push the block during the social competition task.

      We appreciate the reviewer for raising this question. The dimensions establishing the social structures include physical and psychological factors. In the FPCT paradigm, the two contenders are separated so that physical factors are minimized in this context and psychological factors should play more important role in competition in comparison with previous reported food competition paradigms. Therefore, in the revised manuscript we consider to attribute the ranking results mainly to psychological factors, rather than only motivation which is just one of the numerous psychological factors (paragraph 3 of Discussion). Moreover, in the Discussion we point out that we could not exclude physical factors still participate in the determination of competitive outcomes since some of mice pairs pushed the block simultaneously (paragraph 3 of Discussion).

      c.The authors mention that they are interested to understand which factors lead to the outcome of the competition such as age, sex, physical strength, training level, and intensity of psychological motivation. However, in all their runs of the assay, they always matched these variables between the competitors. They should clarify that they were instead controlling for these variables. Another thing to note here is that while they controlled the body mass of the animals, that isn't the same as physical strength, as a lighter mouse can have more muscle mass than a heavier mouse. They should either specify this limitation or quantify the additional metric of "muscle mass" which is a much better proxy for physical strength. Thus, the claim that the outcome of the competition is solely affected by motivation is not convincing since they didn't rule out the others such as quantifying the rate of learning during training and strength.

      We thank the reviewer for addressing this question. As our response to the question in (c), we acknowledge that it is not accurate to ascribe the outcomes of FPCT to psychological motivation. In the revised manuscript, the dimensions of contributing factors to the outcomes of FPCT have been simplified to physical and psychological factors. We consider that the psychological factor could be the main driver of mice participating in FPCT (see paragraph 3 of Discussion).

      d. In the discussion, the authors mention that their task only requires a single day of food deprivation (the day before the first trial) while other assays suffer from a continued food deprivation protocol. However, the authors also use 10g per cage as the amount of food instead of giving them ad libitum access. Limited food is a food deprivation method. Thus, this is an inaccurate claim.

      We thank the reviewer for raising this point. We have clarified the requirement of food restriction for FPCT in the revision. The mice were deprived of food for 24 hours while water consumption remained normally to enhance the appeal of the food pellet to the mice. Then, after 24 hours of food deprivation, each cage of mice was given 10 g of food every morning to meet their daily food requirements until the end of the test (see FPCT procedure section in Methods and materials).

      e.In the second section of the results, the authors run their assay with female mice that are housed in cages of two. This section suffers from the same limitations as the first and can be improved by showing the training data, correlations of competition outcome with "motivation" and ruling out the other factors that could contribute to the outcome. Further, the authors saying that their FPCT assay is enough to show that female mice follow a social hierarchy by itself is a weak claim. They should instead include their cross-validation with the others to strengthen it.

      We appreciate the reviewer for raising this question. We have taken the reviewer’s suggestion to show the training data (Figures 1E, 2A and 3A). As the factors contributing to the outcomes of FPCT are diverse, we’d like not to control and determine the exact factor in the current manuscript. We agree with the reviewer that cross-validation with different paradigms is suggested for the studies to rank social hierarchy as the ranking results could be variable with tasks, procedures and operations.

      f.  In the last paragraph of the introduction, the authors mention how their assay involves "peaceful competition" since the mice are not in direct contact and hence cannot exhibit aggression. The authors do not address the limitation that a lack of physical contact actually makes the assay less ethological. Further, since the mice are housed in groups of two and three, it is not guaranteed that the mice will not be aggressive during their time in the home cage, which could affect their behavior during the competition assay. Whether the assay causes more aggression in the cage due to the lack of physical contact during the competition is not addressed in this study.

      We thank the reviewer for raising this point. Diverse factors affect the outcomes of a food competition test, some of which belong to psychological factors and others belong to physical factors. We agree that a lack of physical contact makes the assay less naturally ethological. However, when the social statuses have been established during habituation housing a group of mice for enough time, the win/lose outcomes in the FPCT could be a readout of the expression of social statuses since the mice cannot exhibit aggression in the test. We have revised the Introduction and Discussion (paragraph 3 of Discussion). Thank you.

      (3) Related to lack of methodological rigor and rationale clarity:

      a. In the first section of the results, the authors run their assay with male mice that are housed in cages of two. While the data that they display is promising, we do not see how mice change behavior across days of training and how that relates to the outcome of the competition. It would be valuable to also show the training data for the mice, answering questions related to competency and any inter-animal variabilities prior to rank assessment. Plotting the training data across all days would be helpful for the other parts of the results as well. This is especially important because the methods mention that mice are trained until they get to the criterium, so this means that different individuals get different amounts of training.

      We appreciate the reviewer for addressing the importance of showing training data. We have taken the reviewer’s suggestion and shown the training data (Figures 1E, 2A and 3A).

      b.  It is unclear why the assay was run only once per mouse pair per day since most protocols for the tube test involve multiple repetitions each day while alternating the side from which the mice enter. The authors should address whether a single trial per day is enough to show consistent results and that it wouldn't vary with more.

      We suggest to run the FPCT once or twice per mouse per day under conditions of mild food restriction, training and test procedures in this manuscript. Frequent tests might make the mice’s interest in the food pellet gradually diminished because the food supply was not fully deprived. According to our data, the outcomes of FPCT in 4 consecutive days were overall stable.

      c.  In the results the authors say that they "raised 3 male mice" which may be incorrect because they report in the methods buying the mice buy mice and they housed all their mice for only three days before running the assay which might be too little for the hierarchy to stabilize. The authors should comment on what was the range of the cohabitation across different cages and whether it had an impact on the results.

      According to our experiments, housing the mice for 3 days is enough to establish a mice social colony with relative stable status structure. Prolonged housing may produce either similar, stabler or more dynamic social colony.

      d. There are also some formatting and/or convention issues in the results. The first figure callout in the results is for Figure 4 instead of Figure 1 (which is the standard). This is because the authors do not explain how the mice are trained for the task in the results section and show limited data about the training of the task. Not showing comprehensive training data would make replication of this study very difficult.

      We appreciate the reviewer for raising this question. We have re-arranged the figures. The new arrangement of figures started with schematic drawing of FPCT procedure and training data (Figure 1).

      e. The authors don't report the exact p-values in the figures

      We reported the difference level in the figures in the revised manuscript. Thank you.

      4. The writing of the manuscript suffers from a lack of clarity in most sections of the manuscript.

      Here are several examples that are critical:

      a. In the title and abstract, it isn't clear what the authors mean by "stereotype". It could be a behavior during the competition, or that the social ranks across assays are correlated or that the rank for the new assay is consistent across days.

      b. There are several instances where the authors anthropomorphize mice using human features such as "urbanization" and "society" which are not established factors affecting mouse hierarchy. This further extends to anthropomorphizing mice in ways that are not standard such as an animal being "timid" or "bold" which would be hard to measure in mice, if not impossible.

      c. Across the social dominance literature, relative social rank is described using more general "dominant" and "subordinate" titles instead of "superior" and "inferior" that are sometimes used in the manuscript. The authors should follow the standard language so that readers understand.

      d.  In the third paragraph of the introduction, the authors say "Thus, it is more likely expected that different paradigms to weigh the social competency and status may lead to diverse readouts, given that competitive factors are included in competition paradigms." This sentence suffers from multiple syntax errors thereby reducing clarity

      e. There are several typos in the manuscript such as using "dominate" instead of "dominant", "grades" instead of "outcomes" and "forth" instead of "fourth", to give a few examples.

      We thank the reviewer for careful reading of the manuscript and very helpful comments. We have taken the above suggestions and improved the writing of the manuscript. For examples, "stereotype" was replaced by “stability”, mice "society" was expressed by "colony", the sentence “Thus, it is more.... in competition paradigms” has been deleted.

      Reviewer #3 (Recommendations for the authors):

      (1) The justification for the design of this new test paradigm is unclear. In the abstract, you state that the field needs a reliable, valid, and easily executable test. Your test provides this, as you state, but how is it better than the tube test? Does the tube test suffer from taskspecific win-or-lose outcomes? Can you provide evidence for this? The nature methods protocol for the tube test (https://doi.org/10.1038/s41596-018-0116-4) "strongly suggest using more than two dominance measures, for example, by also carrying out the warm spot test, or territory urine marking or ultrasonic courtship vocalization assays." This would suggest that results from the tube test can be task-specific, but I am not convinced that you have demonstrated that results from your food competition test are not task-specific. Indeed, by your title, one must run multiple tests.

      This same problem is apparent in the introduction. In the second paragraph, there is a discussion of the tube test, warm spot test, and food competition tests. What is the problem with these tests?

      I believe that social dominance relationships are complex and dynamic social relationships indicating who has priority access to a resource between multiple animals that live together. In these living situations, several resources can often be capitalized competed over-for example, space, food, mates, temperature, etc. Currently, we have tests to measure space via the tube test or urine marking, mates via ultrasonic vocalization, temperature via warm spot test, and food via food competition assays. The tube test, urine marking assay, and ultrasonic vocalization test have been demonstrated to be reliable, valid, and easily executable. However, the food competition assays are often difficult to execute because it is difficult to interpret the dominant behaviors and aggressive behaviors like bite wounding can occur during the test. Here, you present a new food competition assay to address these issues and show that it can be used in conjunction with other assays to measure social dominance across multiple resources easily. In doing so, you revealed that many same-sex groups of C57 mice have a stereotypic pattern of dominance behavior when competing across multiple types of resources: space, temperature, and food.

      I ask that you please rebut if you disagree with me, and adjust your abstract, introduction, and discussion accordingly.

      We thank the reviewer for all the constructive comments. We have adjusted the Abstract, Introduction and Discussion of the manuscript.

      We recognize and appreciate the valuable tube test, warm spot test and many other competition tests, including food competitions. Tube test and warm spot test are space competition tasks. Relative to space competition, food competition tests for mice have been designated and applied less commonly in animal studies. Several issues (such as methodological issue, aggressive behaviors occurring in competition, and prolonged food deprivation) could be thought to be the underlying limitations of the application of food competition paradigms (paragraph 3 in the Introduction). Therefore, we clarify that the justification for the design of FPCT was “to have a new choice of food competition paradigm for mice, and to facilitate the exposure of psychological aspects contributing to the winning/losing outcomes in competitions” (last paragraph in the Introduction).

      FPCT is different from tube test and warm spot test at least in two ways. FPCT is food completion task where the mice need no physical contact during competition, while tube test and WST are space competition tasks where the mice need direct physical contact during competition. Therefore, we expected inconsistent evaluation results of competitiveness and rankings if we compared FPCT with typically available competition paradigms—tube test and WST (last paragraph in the Introduction).

      (2)  The design of the test needs to be described before the results. You can either move the methods section before the results or add a paragraph in the introduction to better describe the test. Here, you can also reference Figures 1 through 3 so that the figures are presented in the order of which they are mentioned in the paper. (It is very confusing that the first reference to a figure is Figure 4, when it should be Figure 1).

      We appreciate the reviewer for raising this point and giving us suggestions. We have added a new section (section 1) in the Results. In the revised manuscript, the figures in the Results start with Figure 1 which shows schematic drawing of FPCT procedure, training data and some test results (Figure 1).

      (3)  The sentence describing Figure 4H. You argue that this shows that the mice are well and equally trained. It also shows that they have the same motivation or preference for the food.

      We appreciate the reviewer for this helpful comment. Data in previous Figures 4H and 5I have been presented as new Figures 2A and 3A, respectively, of revised manuscript. These retrospect analysis of training data displayed similar training level of food-getting and craving state for food (Sections 2 and 3 in the Results).

      (4)  "Social ranking of multiple cagemate mice using FPCT, tube test and WST"

      Here, you claim that "comparison of inter-task consistency revealed that the ranks evaluated by FPCT, tube test and WST did not differ from each other...Figure 6K." Okay, however, it is important to discuss the three cases when there wasn't consistency between the tests! Figure 6E-G.

      We appreciate the reviewer for raising this point. In the revised manuscript, we add description and discussion of inconsistent part of the different test paradigms (paragraph 2 in the section 3 of the Result, last 2 sentences of paragraph 4 in the Discussion)

      (5)  Replace all instances of "gender" with "sex". Animals do not have a gender.

      (6)  Adjust the strain of the mice to C57BL/6JNifdc.

      We have replaced "gender" with "sex" and “C57BL/6J” with “C57BL/6JNifdc”. Thank you for your careful correction.

      (7)  What is the justification for running the warm spot test for one day and the other tests for four days?

      From the consecutive FPCT and tube test, we already knew that the ranking results were overall stable. This stability was still observed in the day of warm spot test. A bad point for frequent warm spot test is that mice get much stress due to exposure in ice-cold environment. Therefore, we terminated the competition test after only one trial of warm spot test.

      (8)  Grammar

      The second sentence of the abstract: ...recognized as a valuable...

      Results, sentence after "...was observed (Figure 4G)." it should be "Fourth"

      We have corrected these and other grammar errors. We appreciate the reviewers for very careful review and all helpful comments.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations. 

      Our manuscript contains a discussion of both conventional EM and cryoET of synapses. We apologise if we have omitted referencing or discussing any earlier cryoET work. This was certainly not our intention, and we include a more complete discussion of published cryoET work on synapses in our revised manuscript.

      The reviewer is concerned that the synaptic vesicles in some synapse tomograms are “stretched” and that this may reflect poor preservation.  We would like to point out that such non-spherical synaptic vesicles have also been previously reported in cryoET of primary neurons grown on EM grids (Tao et al., J. Neuro, 2018). Indeed, there is no reason per se to suppose synaptic vesicles are always spherical and there are many diverse families of proteins expressed at the synapse that shape membrane curvature (BAR domain proteins, synaptotagmin, epsins, endophilins and others). We will add further discussion of this issue in the revised manuscript.

      The reviewer regards ‘cryo-sectioning’ as outdated and cryoET data from these preparations as “poor quality”. We respectfully disagree. Preparing brain tissues for cryoET is generally considered to be challenging. The first successful demonstration of preparing such samples was before the advent of the cryoEM resolution revolution (with electron counting detectors) by Zuber et al (Proc. Natl. Acad. Sci.,2005) preparing cryo-sections/CEMOVIS of in vitro brain cultures. We followed this technique to prepare tissue cryo-sections for cryoET in our manuscript. Recently, cryoFIB-SEM liftout has been developed as an alternative method to prepare tissue samples for cryoET (Mahamid et al., J. Struct. Biol., 2015) and only more recently this method became available to more laboratories. Both techniques introduce damage as has been described (Han et al., J. Microsc., 2008; Lucas et al., Proc. Natl. Acad. Sci., 2023). Importantly no like-for-like, quantitative comparison of these two methodologies has yet been performed. We have recently demonstrated that the molecular structure of amyloid fibrils within human brain is preserved down to the protein fold level in samples prepared by cryo-sectioning (Gilbert et al., Nature, 2024). We will add further detail on the process by which we excluded poor quality tomograms from our analysis, which we described in detail in our methods section.

      The reviewer asks what the physiological effect is of adding 20% w/v ~40,000 Da dextran? This is a reasonable concern since this could in principle exert osmotic pressure on the tissue sample. While we did not investigate this ourselves, earlier studies have (Zuber et al, 2005) showing cell membranes were not damaged by and did not have any detectable effect on cell structure in the presence of this concentration of dextran.

      The reviewer is not convinced by our analysis of the apparent molecular density of macromolecules in the postsynaptic compartment that in conventional EM is called the postsynaptic density. However, the reviewer provides no reasoning for this assessment nor alternative approaches that could be attempted. We would like to add that we have tested multiple different approaches to objectively measure molecular crowding in cryoET data, that give comparable results. We believe that our conclusion – that we do not observe an increased molecular density conserved at the postsynaptic membrane, and that the PSD that we and others observed by conventional EM does not correspond to a region of increased molecular density - is well supported by our data.  We and the other reviewers consider this an important and novel observation.

      Reviewer #2 (Public review)

      Summary: 

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths: 

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim. 

      We thank the reviewer for their generous assessment of our manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses. 

      Strengths: 

      (1)The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2)I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3)The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4)The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Weaknesses: 

      The authors show informative segmentations in their figures but none have been overlayed with any of the tomograms in the submitted videos. It would be helpful for data evaluation to a broad audience to be able to view these together as videos to study these tomograms and extract more information. Deposition of segmentations associated with the tomgrams would be tremendously helpful to Neurobiologists, cryo-ET method developers, and others to push the boundaries.

      Impact on community: 

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabelled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

      We thank the reviewer for their supportive assessment of our manuscript.  We thank the reviewer for suggesting overlaying segmentations with videos of the raw tomographic volumes. We will include this in our revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      Major comments: 

      (1) The previous literature on synaptic cryo-ET studies is systematically ignored. The results presented here (and their novelty) must be compared directly with this body of work, rather than with classical EM.

      Our submitted manuscript included a 3-paragraph discussion of earlier synaptic cryoET studies, albeit we apologize that a seminal citation was missing, which we have corrected in our revised manuscript. We have now also included an additional brief discussion related to several more recent cryoET studies (see citations below) that were published after our pre-print was first deposited in 2021.

      (1) Held, R.G., Liang, J., and Brunger, A.T. (2024). Nanoscale architecture of synaptic vesicles and scaffolding complexes revealed by cryo-electron tomography. Proc. Natl. Acad. Sci. 121, e2403136121. https://doi.org/10.1073/pnas.2403136121.

      (2) Held, R.G., Liang, J., Esquivies, L., Khan, Y.A., Wang, C., Azubel, M., and Brunger, A.T. (2024). In-Situ Structure and Topography of AMPA Receptor Scaffolding Complexes Visualized by CryoET. bioRxiv, 2024.10.19.619226. https://doi.org/10.1101/2024.10.19.619226.

      (3)Matsui, A., Spangler, C., Elferich, J., Shiozaki, M., Jean, N., Zhao, X., Qin, M., Zhong, H., Yu, Z., and Gouaux, E. (2024). Cryo-electron tomographic investigation of native hippocampal glutamatergic synapses. eLife 13, RP98458. https://doi.org/10.7554/elife.98458.

      (4)Glynn, C., Smith, J.L.R., Case, M., Csöndör, R., Katsini, A., Sanita, M.E., Glen, T.S., Pennington, A., and Grange, M. (2024). Charting the molecular landscape of neuronal organisation within the hippocampus using cryo electron tomography. bioRxiv, 2024.10.14.617844. https://doi.org/10.1101/2024.10.14.617844.

      We discuss the above papers in our revised manuscript with the following:

      “Since submission of our manuscript, several reports of synapse cryoET from within cultured primary neurons (Held et al., 2024a, 2024b)  and mouse brain(Glynn et al., 2024; Matsui et al., 2024) were prepared by cryoFIB-milling. These new datasets are largely consistent with the data reported here. CryoFIB-SEM has the advantage of overcoming the local knife damage caused by cryo-sectioning but introduces amorphization across the whole sample that diminishes the information content (Al-Amoudi et al., 2005; Lovatt et al., 2022; Lucas and Grigorieff, 2023). We have recently shown cryoET data is capable of revealing subnanometer resolution in-tissue protein structure from vitreous cryo-sections (Gilbert et al., 2024) and near-atomic structures within cryo-sections has recently been demonstrated (Elferich et al., 2025).”

      Although there is variation between individual synapses, PSDs are clearly visible in several previous cryo-ET studies (even if it's not as striking as in heavy-metal stained samples). In fact, although the contrast of the images is generally poor, PSDs are also visible in several examples shown in Figure 1 - Supplement 3. Not being able to detect them seems more of a problem of the workflow used here than of missing features. The authors should also discuss why heavy-metal stains would accumulate on a non-existing structure (PSD) in conventional EM.

      We agree that apparent higher molecular density can be observed in example tomographic data of earlier cryoET studies. We also report individual examples of similar synapses in our dataset. A key strength of our approach is that we have assessed the molecular architecture of large numbers of adult brain synapses acquired by an unbiased approach (solely guided by PSD95 cryoCLEM), which indicate that a higher molecular density proximal to the postsynaptic membrane is not a conserved feature of glutamatergic synapses in the adult brain. There is no rationale for our cryoCLEM approach being a ‘problem of the workflow’.

      The reviewer misunderstands the weaknesses of conventional/room temperature EM workflows (including resin-embedding and freeze substitution). It is unavoidable that most proteins are damaged by denaturation and/or washed away by washing samples in organic solvents (methanol/acetone that directly denature most proteins) during tissue preparation for conventional EM. It is therefore conceivable that in such preparations a relative increase in contrast proximal to the postsynaptic membrane (‘PSD’) would appear if cytoplasmic proteins were washed away during these harsh organic solved washing steps, leaving only those denatured proteins that are tethered to the postsynaptic membrane. It is not that the PSD is absent in cryoEM, rather that this difference in molecular crowding is not evident when tissues are imaged directly by cryoEM and have not undergone the harsh sample preparation required for conventional/room temperature EM.

      (2) Whether the synapses examined here are in a more physiological state than those analyzed in other papers remains absolutely unclear. For example, the quality of the tomographic slice shown in Figure 1C is poor, with the majority of synaptic vesicles looking suspiciously elongated. 

      We addressed this in our public reviews.

      (3) How were actin filaments segmented and quantified (e.g. for Fig 1E)? Apart from actin, can the authors show some examples of other macromolecular complexes (e.g. ribosomes) that they are able to identify in synapses (based on the info in supplementary tables)? Also, the mapping of glutamatergic receptors is not convincing, as the molecules were picked manually. To analyze their distribution, they should be mapped as comprehensively as possible by e.g. template matching.

      Actin filaments identified by ~7 nm diameter with ~70° branch points were manually segmented in IMOD. The number of filaments was counted per postsynaptic compartment. We have amended the methods section to include this description.

      “In the PoSM, F-actin formed a network with ~70° branch points (Figure 1–figure supplement 1C) likely formed by Arp2/3, as expected(Pizarro-Cerdá 2017,Fäßler 2020) . Putative filament copy number in the PoSM was estimated by manual segmentation in IMOD.” Manual picking was validated by the quality of the subtomogram average, which although only reached modest resolution (25 Å) is consistent with the identification of ionotropic glutamate receptors.

      (4) In the section "Synaptic organelles" the authors should provide some general information on the average number and size of synaptic vesicles (for the in-tissue tomograms).

      We have provided this information in the methods section:

      “The average diameter of synaptic vesicles was 40.2 nm and the minimum and maximum dimensions ranged from 20 to 57.8 nm, measured from the outside of the vesicle that included ellipsoidal synaptic vesicles similar to those previously reported (Tao et al., 2018).” A detailed survey of the presynaptic compartment, including the number of presynaptic vesicles was not the focus of our manuscript. We have deposited all tomograms from our dataset for any further data mining.

      Can the "flat tubular membranes compartments" be attributed to ER? The angular vesicles certainly have a typical ER appearance, as such morphology has been seen in several cryo-ET studies of neuronal and non-neuronal cells.

      In neuronal cells we regard it as unsafe to describe an intracellular organelle as being endoplasmic reticulum on the basis of morphology alone (eg. Smooth ER described widely in conventional EM) because of the apparent diversity of distinct organelles. As described in our methods section, we could have confidence that a membrane compartment is ER when we observe ribosomes tethered to the membrane. In instances where flat/tubular membranes did not have associated ribosomes, we take the cautious view that there is not sufficient evidence to define these as ER.

      Importantly, polyhedral vesicles were distinct from the flat/tubular membranes that resembled ER and are at present organelles of unknown identity. It will be important in future experiments to determine what are the protein constituents of these distinct organelle types to understand both their functions and how these distinct membrane architectures are assembled.

      Therefore, the sentences in lines 198-199 are simply wrong. Additionally, features of even higher membrane curvature are common in the ER (e.g. Collado et al., Dev Cell 2019). 

      We thank the reviewer for bringing our attention to this excellent paper (Collado et al.). We agree that the sentence describing the curvature being higher than all other membranes except mitochondrial cristae is wrong. We have removed this sentence in the revised manuscript.

      (5)The quality of the tomographic data for the in-tissue sample is low, likely due to cryo-sectioning-induced artifacts, as extensively documented in the literature. Additionally, the authors used 20% dextran as cryo-protectant for high-pressure freezing, which contrasts with statements like those in lines 342-344. Given that several publications describing the in-tissue targeting of synapses (e.g. from Eric Gouaux's lab) are available, the quality of the tomographic data presented in this work is underwhelming and limits the conclusions that can be drawn, not providing a solid basis for future studies of in-tissue synapse targeting. However, the complete workflow (excluding the sectioning part) can be adapted for a cryo-FIB approach. The authors should discuss the limitations of their approach. 

      Our manuscript preprint was deposited in the Biorxiv several years before Matsui/Gouaux’s recent ELife paper that reported a novel work-flow for in-tissue cryoET. It is difficult to directly compare data from our and Matsui/Gouaux’s approach because the latter reported a dataset of only 3 tomograms. Note also that Matsui/Gouaux followed our approach of using 20% dextran 40,000 as a cryo-preservative. The use of 20% dextran 40,000 as a cryo-protectant was first established by Zuber et al., 2005 (PMID: 16354833) and shown avoid hyper-osmotic pressure and cell membrane rupture. However, Matsui/Gouaux additionally included 5% sucrose in their cryoprotectant. We did not include sucrose as cryo-preservative because this exerts osmotic pressure and was not necessary to achieve vitreous tissues in our workflow.

      Before high-pressure freezing, Matsui/Gouaux also incubated tissue slices in a HEPES-buffered artificial cerebrospinal fluid (that included 2 mM CaCl2 but did not include glucose as an energy source) for 1 h at room temperature to label AMPA receptors with Fab fragment-Au conjugates. Under these conditions, neurons can elicit both physiological and excitotoxic action potentials (even though AMPARs were themselves antagonised with ZK-200775). The absence of glucose is a concern, and it is unclear to what extent tissue viability is affected by this incubation step. In contrast, we chose to use an NMDG-based artificial cerebrospinal fluid for slice preparation and high-pressure freezing that is a well-established method for preserving neuronal viability (Ting et al., 2018).

      We addressed the supposed limitations of cryo-sectioning versus cryoFIB-SEM in our public response. In particular, we have recently shown that cryo-sectioning produced a  subnanometer resolution in-tissue structure of a protein, that has so far only been achieved for ribosome within cryoFIB-SEM sample preparations. A discussion of cryo-sectioning versus cryoFIB-SEM must be informed by new data that directly compares these methods, which is not the subject of our eLife paper. We also cite a recent preprint directly comparing cryoFIB-milled lamellae with cryo-sections and showing that near atomic resolution structures can also be obtained from the latter sample preparations (Elferich et al., 2025).

      (6) The authors show (in Supplementary) putative tethers connecting SV and the plasma membrane. Is it possible to improve the image quality (e.g. some sort of filtering or denoising) so that the tethers appear more obvious? Can the authors observe connectors linking synaptic vesicles? 

      We have tested multiple iterative reconstruction and denoising approaches, including SIRT and noise2noise filtering in Isonet. We observed instances of macromolecular complexes linking one synaptic vesicle with another. However, there was no question we sought to answer by performing a quantitative analysis of these linkers.

      (7) Figure 4F is missing. 

      Thank you for spotting this omission. We have corrected this in the revised manuscript.

      (8) Most quantifications lack statistical analyses. These need to be included, and only statistically significant findings should be discussed. Terms like "significantly" (e.g. Line 144) should only be used in these cases.

      We used the term ‘significantly’ in the results section (line 143 and line 166 in revised text, we cite figure 1H and 2F showing analyses in which we have in fact performed statistical tests (t-tests with Bonferroni correction) comparing the voxel intensities in regions of the cytoplasm that are proximal versus distal to the postsynaptic membrane. We have amended the main text to include the details of the statistical test that we performed. Also, we neglected to include a description of the statistical test in line 241, which cites Figure 3G. We have corrected this in the revised text.

      Minor comments: 

      (1) Can the authors comment on why only 1-2 grids are prepared per mouse brain (in M&M -section)?

      We prepared only two grids in order to have prepared samples within 2 minutes, to limit deterioration of the sample.

      (2) Figure 1 Supplement 2 and its legend are confusing (averaging of non-aligned versus aligned post-synaptic membrane). Can the authors describe more clearly their molecular density profile analysis?

      We apologise that this figure legend was insufficient. We have included a detailed description of our molecular density profile analysis in the methods section entitled ‘Molecular density profile analysis’. In the revised manuscript we have now also included a citation to this methods section in Figure – figure 1 supplement 2 legend.

      (3) Please clarify with higher precision the areas were recorded in relation to the fluorescent spots (e.g. Figures 3A-C).

      We have included a white rectangular annotation in the cryoCLEM inset panels of Figures 3A-C to indicate the field of view of each corresponding tomographic slice. This shows that PSD95-GFP puncta localise to the postsynaptic compartments in each tomogram.

      (4) Figure 4 Supplement 2D is not clear: the connection between receptors and actin should be shown in a segmentation.

      We agree with the reviewer. A ‘connection’ is not clear, which is expected because the cytoplasmic domain of ionotropic glutamate receptor subunits is composed of a non-globular/intrinsically disordered sequence. We have amended our description of the proximity of actin cytoskeleton to ionotropic glutamate receptor clusters in the main text replacing “associated with” to “adjacent to”.

      (5) Line 341: the reference is referred to by a number (56) at the end of the sentence, rather than by name.

      Good spot. We have corrected this in the revised manuscript.

      (6) Line 968: tomograms is misspelled. 

      Good spot. We have corrected this error (line 1018 in our revised manuscript).

      Reviewer #2 (Recommendations for the authors): 

      (1) On page 11: "The position of (i)onotropic receptor...". 

      Good spot. We have corrected this.

      (2) On page 13: "Slightly higher relative molecular density..." this line ends with a citation to reference '56', but the works cited are not numbered.

      Good spot. We have corrected this in the revised manuscript.

      (3) On page 46: "as described in (69)..." the works cited are not numbered. 

      Good spot. We have corrected this in the revised manuscript.

      Reviewer #3 (Recommendations for the authors): <br /> (1) The title does not do the work justice. The authors make many exciting discoveries, e.g. PSD appearance, new polyhedral vesicles, ionotropic receptor positions, and intermembrane distance changes even within the synaptic cleft, but title their manuscript "The molecular infrastructure of glutamatergic synapses in the mammalian forebrain". It is also a bit misleading, since one would have expected more molecular detail and molecular maps as part of the work, so the authors may think about updating the title to reflect their exciting work. 

      We thank the reviewer for recognising the exciting discoveries in our manuscript. Summarising all these in a title is challenging. We intend ‘molecular infrastructure’ to mean a structure composed of many molecules including proteins (by analogy ‘transport infrastructure’ is composed of many roads, ports and train lines).

      (2) It would be in the spirit of eLife and open science if the authors could submit their segmentations alongside the tomographic data to either EMPIAR or pdb-dev (if they accept it) or the new CZII cryoET data portal for neurobiologists, method developers, and others to use. 

      We agree with the reviewer. We have deposited in subtomogram averaged map of AMPA receptor in EMDB, and all tilt series and 4x binned tomographic reconstructions described in our manuscript (figure 1- table1 and figure 2 -table 2), together with segmentations in EMPIAR.  

      (3) Methods: the authors establish an exciting new workflow to get from living mice to frozen specimens within 2 minutes and perform many unique analyses that would be useful to different fields. Their methods section overall is well described and contains criteria and details that should allow others to apply experiments to their scientific problems. However, it would be very helpful to expand on the methods in the 'annotation and analysis [...]' and "Subtomogram averaging" sections, to at least in short describe the steps without having to embark on a reference journey for each method and generally provide more detail. For the annotation section, the software used for annotation is not listed. Table 1 only contains the list of the counts of organelles etc. identified in each tomogram, no processing details. 

      We have revised the methods section ‘annotation and analysis’ including software used (IMOD). We have also included a slightly more detailed description of subtomogram averaging. We did not include ‘processing details’ because there are none - identification of constituents in each tomogram was carried out manually, as described in the methods section.

      (4) Some of the tomograms submitted as videos may have slipped through as an early version since they appear to be originating from not perfectly aligned tiltseries; vesicles and membranes can be observed 'rubberbanding'. The authors should go through and check their videos. 

      We thank the referee for suggesting we double check our tomogram videos. All movies are representative tomographic reconstructions from ultra-fresh synapse preparations (Figure 1 – videos 1-7) and synapses in tissue cryo-sections (Figure 2 – videos 1-2). We have double checked that the videos correspond to tomograms that were aligned as good as possible. In general, tissue cryo-section tomograms reconstructed less well than ultra-fresh synapse tomograms, which limits the information content of these data, as expected. Consequently, the reconstructions shown in these videos were all reconstructed as best we could (testing multiple approaches in IMOD, and more recent software packages, eg. AreTomo). While we think it is important to share all tomograms, regardless of quality, we were careful to exclude tomograms for analysis that did not contain sufficient information for analysis (as described in the methods section).

      Minor suggestions: 

      (1) Page 13, line 341, reference 56, but references are not numbered. Please update.

      Good spot. We have corrected this in the revised manuscript.

      (2) Page 33, line 746, the figure legend is not referencing the correct figure panels G-K should be I-K;

      We have amended the Figure 3 legend to “(G-K) Snapshots and quantification of membrane remodeling within glutamatergic synapses”.

      (3) Page 33, line 750; reads 'same as E', but should be 'same as G'. 

      Good spot. We have corrected this in the revised manuscript.

      (4) Page 35, Figure 4: Please use more labels: Figure 4B: it would be helpful to use different colors for each view and match to the tomogram - then non-experts could easily relate the projections and real data; Figure 4C: please label domains; Figure 4F: the figure panel got lost. 

      This is an interesting idea. While our subtomgram average of 2522 subvolumes provided decent evidence that these are ionotropic receptors, we are reluctant to label specific putative domains of individual subvolumes in the raw tomographic slice because the resolution of the raw tomogram (particularly in the Z-direction) is worse and may not be sufficient to resolve definitely each domain layer. We hope the reviewer appreciates our cautious approach.

      (5) Page 42, line 933: incomplete sentence. 

      Good spot. We have corrected this in the revised manuscript.

      (6) Page 46, line 1038; Reference 69 is in brackets, but references are not numbered. Please update.

      Good spot. We have corrected this in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note : The original preprint version of our manuscript has been reviewed by 3 subject experts for Review Commons. All the three reviewers’ comments on the original version of our manuscript have been fully addressed. Their input was extremely valuable in helping us clarify and refine the presentation of our results and conclusions. Their feedback contributed to making the study both more thoroughly developed and more accessible to a broad readership, while preserving its mechanistic depth. We believe that this revised version more effectively highlights the conceptual advances brought by our findings.

      Reviewer #1

      Evidence, reproducibility and clarity

      The manuscript "Key roles of the zona pellucida and perivitelline space in promoting gamete fusion and fast block to polyspermy inferred from the choreography of spermatozoa in mice oocytes" by Dr. Gourier and colleagues explores the poorly understood process of gamete fusion and the subsequent block to polyspermy by live-cell imaging of mouse oocytes with intact zona pellucida in vitro. The new component in this study is the presence of the ZP, which in prior studies of live-cell imaging had been removed before. This allowed the authos to examine contributions of the ZP to the block in polyspermy in relation to the timing of sperm penetrating the ZP and sperm fusing with the oocyte. By carefully analysing the timing of the cascade of events, the authors find that the first sperm that reaches the membrane of the mouse oocyte is not necessarily the one that fertilizes the oocytes, revealing that other mechanisms post-ZP-penetration influence the success of individual sperm. While the rate of ZP penetration remains constant in unfertilized oocytes, it decreases upon fertilization for subsequent sperm, providing direct evidence for the known 'slow block to polyspermy' provided by changes to the ZP adhesion/ability to be penetrated. Careful statistical analyses allow the authors to revisit the role of the ZP in preventing polyspermy: They show that the ZP block resulting from the cortical reaction is too slow (in the range of an hour) to contribute to the immediate prevention of polyspermy in mice. The presented analyses reveal that the ZP does contribute to the block to polyspermy in two other ways, namely by effectively limiting the number of sperm that reach the oocyte surface in a fertilization-independent manner, and by retaining components like JUNO and CD9, that are shed from the oocyte plasma membrane after fertilization, in the perivitelline space, which may help neutralize surplus spermatozoa that are already present in the PVS. Lastly, the authors report that the ZP may also contribute to channeling the flagellar oscillations of spermatozoa in the PVS to promote their fusion competence.

      Major comments:

      • Are the key conclusions convincing?

      The authors provide a careful analysis of the dynamics of events, though the analyses are correlative, and can only be suggestive of causation. While this is a limitation of the study, it provides important analysis for future research. Moreover, by analysing also control oocytes without fertilization and the timing of events, the authors have in some instances clear 'negative controls' for comparison.

      Some claims would benefit from rewording or rephrasing to put the findings better in the context of what is already known and what is novel:

      • the phrasing 'challenging prior dogma' might be too strong since it had been observed before that it is not necessarily the first sperm that gets through the ZP that fertilizes the egg (though I am afraid that I do not have any citations or references for this). However, given that in the field people generally think it is not necessarily and always the first sperm, the authors may want to consider weakening this claim.

      Only real-time imaging of in vitro fertilization of zona pellucida-intact oocytes, as performed in our study, is capable of determining which spermatozoon crossing the zona pellucida fuses with the oocyte. However, such studies are rare, and most do not specifically address this question. As Reviewers 1 & 3, we have not found any citation or reference telling or showing that it is not necessarily the first spermatozoon to penetrate the zona pellucida that fertilizes the egg. In contrast, at least one reference (Sato et al., 1979) explicitly reports the opposite. If, as suggested by Reviewer 1 and 3, it has indeed been observed before that the first sperm to pass the ZP is not always the one that fertilizes, and if this idea is generally accepted in the field, then it is all the more important that a study demonstrates and publishes this point. This is precisely what our study makes possible. However, in case we may have overlooked a previous reference making the same observation as ours, we have removed the phrasing ‘challenging prior dogma’. That being said, the key issue is not so much that it is not necessarily the first spermatozoon penetrating the perivitelline space that fertilizes, but rather why spermatozoa that successfully reach the PVS of an unfertilized oocyte may fail to achieve fertilization. This is one of the central questions our study sought to address.

      • I do think the cortical granule release could still contribute to the block to polyspermy though - as the authors here nicely show - at a later time-point only, and thus not the major and not the immediate block as previously thought. The wording in the abstract should therefore be adjusted (since it could still contribute...)

      We are concerned that we may disagree on this point. The penetration block resulting from cortical granule release progressively reduces the permeability of the zona pellucida to spermatozoa, relative to its baseline permeability prior to sperm–oocyte fusion. Any decrease in this baseline permeability occurring before the fusion block becomes fully effective can contribute to the prevention of polyspermy by limiting the number of sperm that can access the oolemma at a time when fusion is still possible. In contrast, once the fusion block is fully established, limiting the number of spermatozoa traversing the ZP becomes irrelevant regarding the block to polyspermy, as the fusion block alone is sufficient to prevent additional fertilizations, rendering the penetration block obsolete. The only scenario that could challenge this obsolescence is if the fusion block were transient. In that case, as Reviewer 1 suggests, the penetration block could indeed play a role at a later time-point. However, taken together, our study and that of Nozawa et al. (2018) support the conclusion that this is not the case in mice:

      • Our in vitro study using kinetic tracking shows that the time constant for completion of the fusion block is typically 6.2 ± 1.3 minutes. During this time window, we observe that the permeability of the zona pellucida to spermatozoa does not yet decrease significantly from the baseline level it exhibited prior to sperm–oocyte fusion (see Figures 5B and S1B in the revised manuscript, and Figures 5A and 5B in the initial version). Consequently, before the fusion block is fully established, the penetration block can contribute only marginally—if at all—to the prevention of polyspermy. In contrast, the naturally low baseline permeability of the ZP—independent of any fertilization-triggered penetration block—as well as the relatively long timing of fusion ( minutes on average) after sperm penetration in the perivitelline space, are factors that contribute to the preservation of monospermic while the fusion block is still being established.
      • Our in vitro study using kinetic tracking shows that once the fusion block is completed following the first fusion event, no additional spermatozoa are able to fuse with the oocyte until the end of the experiment, 4 hours post-insemination (see blue points and fitting curve in Figure 5C). Meanwhile, one or more additional spermatozoa—most of them motile and therefore viable—are present in the perivitelline space in 50% of the oocytes analyzed (purple point in Figure 5C). This demonstrates that, once established, the fusion block remains effective for at least the entire duration of the experiment, supporting the idea of a fully functional and long-lasting fusion block.
      • Nozawa et al. (2018) found that female mice lacking ovastacin—the protease released during the cortical reaction that renders the zona pellucida impenetrable—are normally fertile. They additionally reported that the oocytes recovered from these females after mating are monospermic despite the systematic presence of additional spermatozoa in the perivitelline space. These findings further support the conclusion that in mice the fusion block is both permanent and sufficient to prevent polyspermy. For all these reasons, we believe that even at a later time-point, the penetration block does not contribute to the prevention of polyspermy in mice.

      To clarify the fact that the penetration block does not necessarily contribute to prevent polyspermy, which indeed challenges the commonly accepted view, we have substantially revised the discussion. Furthermore, Figure 9 from the initial version of the manuscript has been replaced by Figure 8 in the revised version. This new figure provides a more didactic illustration of the inefficacy of the penetration block in preventing polyspermy in mice, by showing the respective impact of the fusion block, the penetration block, as well as fusion timing and the natural baseline permeability of the zona pellucida, on the occurrence of polyspermy.

      As for the abstract, it has also been thoroughly revised. The content related to this section is now expressed in a way that emphasizes the factors that actively contribute to the prevention of polyspermy in mice, rather than those with no or marginal contribution (such as the penetration block in this case).

      • release of OPM components - in the abstract it's unclear what the authors mean by this - in the results part it becomes clear. Please already make it clear in the abstract that it is the fertility factors JUNO/CD9 that could bind to sperm heads upon their release and thus 'neutralize' them? I would also recommend not referring to it as 'outer' plasma membrane (there is no 'inner plasma membrane'). Moreover, in the abstract please clarify that this release is happening only after fusion of the first sperm and not all the time. In the abstract it sounds as if this was a completely new idea, but there is good prior evidence that this is in fact happening (as also then cited in the results part) - maybe frame it more as the retention inside the PVS as new finding.

      We thank reviewer 1 for pointing out the lack of precision in the abstract regarding the “components” released from the oolemma, and the fact that our phrasing may have given the impression that the post-fertilization release of CD9 and JUNO is a novel observation. The new observation is that CD9 and JUNO, which are known to be massively released from the oolemma after fertilization, bind to spermatozoa in the perivitelline space. However, we cannot rule out the possibility that other oocyte-derived molecules not investigated here may undergo a similar process. This is why we employed the broader term “components”, which encompasses both CD9 and JUNO as well as potential additional molecules. That said, we acknowledge the lack of precision introduced by this terminology. To address this, we have revised the corresponding sentence in the abstract to better reflect our new findings relative to previous ones, and to eliminate the ambiguity introduced by the word “component”.

      The revised sentence of the abstract reads as follows:

      “Our observation that non-fertilizing spermatozoa in the perivitelline space are coated with CD9 and JUNO oocyte’s proteins, which are known to be massively released from the oolemma after gamete fusion, supports the hypothesis that the fusion block involves an effective perivitelline space-block contribution consisting in the neutralization of supernumerary spermatozoa in the perivitelline space by these and potentially other oocyte-derived factors.”

      Moreover, we cannot state in the abstract that the release of CD9 and JUNO occurs only after the fusion of the first spermatozoon and not before, since some CD9 and JUNO are already detectable in the perivitelline space (PVS) prior to fusion. What our study shows is that, before fertilization, CD9 and JUNO are predominantly localized at the oocyte membrane. In contrast, after fusion (four hours post-insemination), oocyte CD9 is distributed between the membrane and the PVS, and the only JUNO signal detectable in the oocyte is found in the PVS. This is what we describe in the Results section on page 15.

      Regarding the acronym “OPM” in the initial version of the manuscript, although it was defined in the introduction as referring to the oocyte plasma membrane and not the outer plasma membrane (which, indeed, would not be meaningful), we acknowledge that it may have caused confusion to people in the field due to its resemblance to the commonly used meaningful acronym “OAM” for outer acrosomal membrane. To avoid any ambiguity, we have replaced the acronym “OPM” throughout the revised manuscript with the term “oolemma”, which unambiguously refers to the plasma membrane of the oocyte.

      It is unclear to me what the relevance of dividing the post-fusion/post-engulfment into different phases as done in Fig 2 (phase 1, and phase 2) - also for the conclusions of this paper this seems rather irrelevant and overly complicated, since the authors never get back to it and don't need it (it's not related to the polyspermy block analyses). I would remove it from the main figures and not divide into those phases since it is distracting from the main focus.

      Sperm engulfment and PB2 extrusion are two processes that follow sperm–oocyte fusion. As such, they are clear indicators that fusion has occurred and that meiosis has resumed. Their progression over time is readily identifiable in bright-field imaging: sperm engulfment is characterized by the gradual disappearance of the spermatozoon head from the oolemma, whereas PB2 extrusion is observed as the progressive emergence of a rounded protrusion from the oocyte membrane (Figure 2 in the initial manuscript and Figure S2 A&B in the revised version). The kinetics of these events, measured from the arrest of “push-up–like” movement of the sperm head against the oolemma —assumed to coincide with sperm-oocyte fusion, as further justified in a later response to Reviewer 1—provide reliable temporal landmarks for estimating the timing of fusion when the fusion event itself is not directly observed in real time (Figure S2 C&D).

      The four landmarks used in this estimation are:

      (i) the disappearance of the sperm head from the oolemma due to internalization (28 ± 2 minutes post-arrest, mean ± SD);

      (ii) the onset of PB2 protrusion from the oolemma (28 ± 2 minutes post-arrest);

      (iii) the moment when the contact angle between the PB2 protrusion and the oolemma shifts from greater than to less than 90° (49 ± 6 minutes post-arrest);

      (iv) the completion of PB2 extrusion (73 ± 10 minutes post-arrest).

      The approach used to determine the fusion time window of a fertilizing spermatozoon from these landmarks is detailed in the “Determination of the Fertilization Time Windows” section of the Materials and Methods. Compared to the initial version of the manuscript, we have added a paragraph explaining the rationale for using the arrest of the push-up–like movement as a reliable indicator for sperm–oocyte fusion and have clarified the description of the approach used to determine fertilization timing.

      The timed characterization of sperm engulfment and PB2 extrusion kinetics is highly relevant to the analysis of the penetration and fusion blocks, however we agree that its place is more appropriate in the Supplementary Information than in the main text. In accordance with the reviewer’s recommendation, this section has therefore been moved to the Supplementary Information SI2.

      For the statistical analysis, I am not sure whether the assumption "assumption that the probability distribution of penetration or fertilization is uniform within a given time window" is in fact true since the probability of fertilizing decreases after the first fertilization event.... Maybe I misunderstood this, but this needs to be explained (or clarified) better, or the limitation of this assumption needs to be highlighted.

      During in vitro fertilization experiments with kinetic tracking, each oocyte is observed sequentially in turn. As a result, sperm penetration into the perivitelline space or fusion with the oolemma may occur either during an observation round or in the interval between two rounds. In the former case, penetration or fusion is directly observed in real time, allowing for high temporal precision in determining the moment of the event. In contrast, when penetration or fusion occurs between two observation rounds, the precise timing cannot be directly determined. We can only ascertain that the event took place within the time window we have determined. Because, within a given penetration or fusion time window, we do not know the exact moment at which the event occurred, there is no reason to favor one time over another. This justifies the assumption that all time points within the window are equally probable. This explanation has been added in the section Statistical treatment of penetration and fertilization chronograms to study the kinetics of fertilization, penetration block and fusion block of the main text and in the section Statistical treatment of penetrations and fertilizations chronograms to study penetration and fusion blocks of the material and methods.

      -Suggestion for additional experiments:

      If I understood correctly, the onset of fusion in Fig 2C is defined by stopping of sperm beating? If it is by the sudden stop of the beating flagellum, this should be confirmed in this situation (with the ZP intact) that it correctly defines the time-point of fusion since this has not been measured in this set-up before as far as I understand. In order to measure this accurately, the authors will need to measure this accurate to be able to acquire those numbers (of time from fusion to end of engulfment), e.g. by pre-loading the oocyte with Hoechst to transfer Hoechst to the fusing sperm upon membrane fusion.

      The nuclear dye Hoechst is widely used as a marker of gamete fusion, as it transfers from the ooplasm—when preloaded with the dye—into the sperm nucleus upon membrane fusion, thereby signaling the happening of the fusion event. This technique is applicable in the context of in vitro fertilization using ZP-free oocytes. However, it is not suitable when cumulus–oocyte complexes are inseminated, as is the case in both in vitro experimental conditions of the present study (standard IVF and IVF with kinetic tracking). Indeed, when cumulus–oocyte complexes are incubated with Hoechst to preload the oocytes, the numerous surrounding cumulus cells also take up the dye. Consequently, upon insemination, spermatozoa acquire fluorescence while traversing and dispersing the cumulus mass—before reaching the ZP—thus rendering Hoechst labeling ineffective as a specific marker of membrane fusion. This remains true even under optimized conditions involving brief Hoechst incubation of cumulus–oocyte complexes ( Nonetheless, we have strong evidence supporting the use of the arrest of sperm movement as a surrogate marker for the moment of fusion. In our previous study (Ravaux et al., 2016; ref. 4 in the revised manuscript), we investigated the temporal relationship between the abrupt cessation of sperm head movement on the oolemma—resulting from strong flagellar beating arrest—and the fusion event, using ZP-free oocytes preloaded with Hoechst. That study revealed a temporal delay of less than one minute between the cessation of sperm oscillations and the actual membrane fusion, thereby supporting the conclusion that in ZP-free oocytes, the arrest of vigorous sperm movement at the oolemma is a reliable indicator of the moment at which fusion occurs. In the same study, the kinetics of sperm head internalization into the ooplasm were also characterized, typically concluding within 20–30 minutes after movement cessation. These findings are fully consistent with our current observations in ZP-intact oocytes, where sperm head engulfment was completed approximately 24 ± 3 minutes after the arrest of sperm oscillations. Taken together, these results strongly support the conclusion that, in both ZP-free and ZP-intact oocytes, the arrest of sperm movement is a reliable indicator of the fusion event. This assumption formed the basis for our determination of fertilization time points in the present study.

      These justifications were not fully detailed in the original version of the manuscript. We have addressed this in the revised version by explicitly presenting this rationale in the Materials and Methods section under Determination of the Fertilization Time Windows.

      Fig 8: 2 comments

      • To better show JUNO/CD9 pre-fusion attachment to the oocyte surface and post-fusion loss from the oocyte surface (but persistence in the PVS), an image after removal of the ZP (both for pre-fertilization and post-fertilization) would be helpful - the combination of those images with the ones you have (ZP intact) would make your point more visible.

      We have followed this recommendation. Figure 8 of the initial manuscript has been replaced by Figure 6 in the revised manuscript, which illustrates the four situations encountered in this study: fertilized and unfertilized oocytes, each with and without unfused spermatozoa in their PVS. To better show JUNO/CD9 pre-fusion presence to the oocyte plasma membrane, as well as their post-fusion partial (for CD9) and near-complete (for JUNO) loss from the oocyte membrane (but persistence in the PVS), paired images of the same oocyte before and after of ZP removal are now provided, both for unfertilized (Figure 6A) and fertilized oocytes (Figure 6C).

      • You show that the heads of spermatozoa post fusion are covered in CD9 and JUNO, yet I was missing an image of sperm in the PVS pre-fertilization (which should then not yet be covered).

      As staining and confocal imaging of the oocytes were performed 4 hours after insemination, images of sperm in the PVS of an oocyte “pre-fertilization” cannot be strictly obtained. However, we can have images of spermatozoa present in the PVS of oocytes that remained unfertilized. This situation, now illustrated in Figure 6B of the revised manuscript, shows that these spermatozoa are also covered in JUNO and CD9, which they may have progressively acquired over time from the baseline presence of these proteins in the PVS of unfertilized oocytes. This also may provide a mechanistic explanation for their inability to fuse with the oolemma, and, consequently, for the failure of fertilization in these oocytes.

      Minor comments:

      • The videos were remarkable to look at, and great to view in full. However, for the sake of time, the authors might want to consider cropping them for the individual phases to have a shorter video (with clear crop indicators) with the most important different stages visible in a for example 1 min video (e.g. video.

      We have followed this recommendation. The videos have been cropped and annotated in order to highlight the key events that support the points made in the result section from page 9 to 11 in the revised manuscript.

      • In general, given that the ZP, PVS and oocyte membrane are important components, a general scheme at the very beginning outlining the relative positioning of each before and during fertilization (and then possibly also including the second polar body release) would be extremely helpful for the reader to orient themselves.

      A general scheme addressing Reviewer 1 request, summarizing the key components and concepts discussed in the article and intended to help guide the reader, has been added to the introduction of the revised manuscript as Figure 1.

      • first header results "Multi-penetration and polyspermy under in vivo conditions and standard and kinetics in vitro fertilization conditions" is hard to understand - simplify/make clearer (comparison of in vivo and in vitro conditions? Establishing the in vitro condition as assay?)

      The title of the first Results section has been revised in accordance with Reviewer 1 suggestion. It now reads: Comparative study of penetration and fertilization rates under in vivo and two distinct in vitro fertilization conditions.

      • Large parts of the statistical analysis (the more technical parts) could be moved to the methods part since it disrupts the flow of the text.

      In the revised version of our manuscript, we have restructured this part of the analysis to ensure that more technical or secondary elements do not disrupt the flow of the main text. Accordingly, the equations have been reduced to only what is strictly necessary to understand our approach, their notation has been greatly simplified, and the statistical analysis of unfertilized oocytes whose zona pellucida was traversed by one or more spermatozoa has been moved to the Supplementary Information (SI1).

      • To me, one of the main conclusions was given in the text of the results part, namely that "This suggests that first fertilization contributes effectively to the fertilization-block, but less so to the penetration block". I would suggest that the authors use this conclusion to strengthen their rationale and storyline in the abstract.

      We agree with Reviewer 1 suggestion. Accordingly, we have not only thoroughly revised our abstract, but also the introduction and discussion, in order to better highlight the rationale of our study, its storyline, and the new findings which not only challenge certain established views but also open new research directions in the mechanisms of gamete fusion and polyspermy prevention.

      • Wording: To characterize the kinetics with which penetration of spermatozoa in the PVS falls down after a first fertilization," falls down should be replaced with decreases (page 10 and page 12)

      Falls down has been removed from the new version and replaced with decreases


      Significance

      Overall, this manuscript provides very interesting and carefully obtained data which provides important new insights particularly for reproductive biology. I applaud the authors on first establishing the in vivo conditions (how often do multiple sperm even penetrate the ZP in vivo) since studies have usually just started with in vitro condition where sperm at much higher concentration is added to isolated oocyte complexes. Thank you for providing an in vivo benchmark for the frequency of multiple sperm being in the PVS. While this frequency is rather low (somewhat expectedly, with 16% showing 2-3 sperm in the PVS), this condition clearly exists, providing a clear rationale for the investigation of mechanisms that can prevent additional sperm from entering.

      My own expertise is experimentally - thus I don't have sufficient expertise to evaluate the statistical methods employed here.

      __ __


      Reviewer #2

      Evidence, reproducibility and clarity

      Overall, this is a very interesting and relevant work for the field of fertilization. In general, the experimental strategies are adequate and well carried out. I have some questions and suggestions that should be considered before the work is published.

      1) Why are the cumulus cells not mentioned when the AR is triggered before or while the sperms cross it? It seems the paper assumes from previous work that all sperm that reach ZP and the OPM have carried out the acrosome reaction. This, though probably correct, is still a matter of controversy and should be discussed. It is in a way strange that the authors do not make some controls using sperm from mice expressing GFP in the acrosome, as they have used in their previous work.

      We do not mention the cumulus cells or whether the acrosome reaction is triggered before, during, or after their traversal (i.e., upon sperm binding to the ZP), as this question, while scientifically relevant, pertains to a distinct line of investigation that lies beyond the scope of the present study. Even with the use of spermatozoa expressing GFP in the acrosome, addressing this question would require a complete redesign of our kinetic tracking protocol, which was specifically conceived to monitor in bright field the dynamic behavior of spermatozoa from the moment they begin to penetrate the perivitelline space of an oocyte. Accordingly, we imaged oocytes that were isolated 15 minutes after insemination of the cumulus–oocyte complexes, by which time most (if not all) cumulus cells had detached from the oocytes, as explained in the fourth paragraph of the material and methods of both the initial and revised versions of the manuscript. The spermatozoa we had access to were therefore already bound to the zona pellucida at the time of removal from the insemination medium, and had thus necessarily passed through the cumulus layer. It is unclear for us why Reviewer 2 believes that we “assume from previous work that all sperm that reach ZP has carried out the acrosome reaction”. We could not find any statement in our manuscript suggesting, let alone asserting, such an assumption, which we know to be incorrect. Based on both published work from Hirohashi’s group in 2011 (Jin et al., 2011, DOI: 10.1073/pnas.1018202108) and our own unpublished observation (both involving cumulus-oocyte masses inseminated with spermatozoa expressing GFP in the acrosome), it is established that only a subset of spermatozoa reaching the ZP after crossing the cumulus layer has undergone acrosome reaction. Moreover, from the same sources—as well as from a recent publication by Buffone’s group (Jabloñsky et al., 2023 DOI: 10.7554/eLife.93792 ) which is the one to which reviewer 2 refers in her/his 3rd comment, it is also well established that spermatozoa have all undergone acrosome reaction when they enter the PVS. To the best of our knowledge, this latter point has long been widely accepted and is not questioned. Therefore, stating this in the first paragraph of the Discussion in the revised manuscript, while referencing the two aforementioned published studies, should be appropriate. What remains a matter of ongoing debate, however, is the timing and the physiological trigger(s) of the acrosome reaction in fertilizing spermatozoa. The 2011 study by Hirohashi’s group challenged the previously accepted view that ZP binding induces the acrosome reaction, showing instead that most spermatozoa capable of crossing the ZP and fertilizing the oocyte had already undergone the acrosome reaction prior to ZP binding. However, as this issue lies beyond the scope of our study, we do not consider it appropriate to include a discussion of it in the manuscript.

      2) In the penetration block equations, it is not clear to me why (𝑡𝑃𝐹1) refers to both PIPF1 and 𝜎𝜎𝑃I𝑃𝐹1. Is it as function off?

      That is correct: (tPF1) means function of the time post-first fertilization. Both the post-first fertilization penetration index (i.e. PIPF1) and its incertainty (i.e. 𝜎𝑃I𝑃𝐹1 ) vary as a function of this time. However, as mentioned in a previous response to Reviewer 1, this section has been rewritten to improve clarity and readability. The equations have been limited to those strictly necessary for understanding our approach, and their notation has been significantly simplified.

      3) Why do the authors think that the flagella stops. The submission date was 2024-10-01 07:27:26 and there has been a paper in biorxiv for a while that merits mention and discussion in this work (bioRxiv [Preprint]. 2024 Jul 2:2023.06.22.546073. doi: 10.1101/2023.06.22.546073.PMID: 37904966).

      Our experimental approach allows us to determine when the spermatozoon stops moving, but not why it stops. We thank Reviewer 3 for pointing out this very relevant paper from Buffone’s group (doi: 10.7554/eLife.93792) which shows the existence of two distinct populations of live, acrosome-reacted spermatozoa. These correspond to two successive stages, which occur either immediately upon acrosome reaction in a subset of spermatozoa, or after a variable delay in others, during which the sperm transitions from a motile to an immotile state. The transition from the first to the second stage was shown to follow a defined sequence: an increase in the sperm calcium concentration, followed by midpiece contraction associated with a local reorganization of the helical actin cortex, and ultimately the arrest of sperm motility. For fertilizing spermatozoa in the PVS, this transition was shown to occur upon fusion. However, it was also reported in some non-fertilizing spermatozoa that this transition took place within the PVS. These findings are consistent with the requirement for sperm motility in order to achieve fusion with the oolemma. Moreover, the fact that some spermatozoa may prematurely transition to the immotile state within the PVS can therefore be added to the list of possible reasons why a spermatozoon that penetrates the PVS of an oocyte might fail to fuse.

      This discussion has been added to the first paragraph of the Discussion section of our revised manuscript.

      4) Please correct at the beginning of Materials and Methos: Sperm was obtained from WT male mice, it should say were.

      Thank you, the correction has been done.

      5) This is also the case in the fourth paragraph of this section: oocyte were not was.

      The sentence in question has been modified as followed: “In the in vitro fertilization experiments with kinetic tracking, a subset of oocytes—together with their associated ZP-bound spermatozoa—was isolated 15 minutes post-insemination and transferred individually into microdrops of fertilization medium to enable identification.”


      Significance

      Understanding mammalian gamete fusion and polyspermy inhibition has not been fully achieved. The authors examined real time brightfield and confocal images of inseminated ZP-intact mouse oocytes and used statistical analyses to accurately determine the dynamics of the events that lead to fusion and involve polyspermy prevention under conditions as physiological as possible. Their kinetic observations in mice gamete interactions challenge present paradigms, as they document that the first sperm is not necessarily the one that fertilizes, suggesting the existence of other post-penetration fertilization factors. The authors find that the zona pellucida (ZP) block triggered by the cortical reaction is too slow to prevent polyspermy in this species. In contrast, their findings indicate that ZP directly contributes to the polyspermy block operating as a naturally effective entry barrier inhibiting the exit from the perivitelline space (PVS) of components released from the oocyte plasma membrane (OPM), neutralizing unwanted sperm fusion, aside from any block caused by fertilization. Furthermore, the authors unveil a new important ZP role regulating flagellar beat in fertilization by promoting sperm fusion in the PVS.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      SUMMARY: This study by Dubois et al. utilizes live-cell imaging studies of mouse oocytes undergoing fertilization. A strength of this study is their use of three different conditions for analyses of events of fertilization: (1) eggs undergoing fertilization retrieved from females at 15 hr after mating (n = 211 oocytes); (2) cumulus-oocyte complexes inseminated in vitro (n = 220 oocytes), and (3) zona pellucida (ZP)-intact eggs inseminated in vitro, transferred from insemination culture once sperm were observed bound to the ZP for subsequent live-cell imaging (93 oocytes). This dataset and these analyses are valuable for the field of fertilization biology. Limitations of this manuscript are challenges arise with some conclusions, and the presentation of the manuscript. There are some factual errors, and also some places where clearer explanations should to be provided, in the text and potentially augmented with illustrations to provide more clarity on the models that the authors interpret from their data.

      MAJOR COMMENTS:

      The authors are congratulated on their impressive collection of data from live-cell imaging. However, the writing in several sections is challenging to understand or seems to be of questionable accuracy. The lack of accuracy is suspected to be more an effect of overly ambitious attempts with writing style, rather than to mislead readers. Nevertheless, these aspects of the writing should be corrected. There also are multiple places where the manuscript contradicts itself. These contradictions should be corrected. Finally, there are factual points from previous studies that need correction.

      Second, certain claims and the conclusions as presented are not always clearly supported by the data. This may be connected to the issues with writing style, word and phrasing choices, etc. The conclusions could be expressed more clearly, and thus may not require additional experiments or analyses to support them. The authors might also consider illustrations as ways to highlight the points they wish to make. (Figure 7 is a strong example of how they use illustrations to complement the text).

      In response to Reviewer 3's concern about the writing style, which made several sections difficult to understand, we have thoroughly revised the entire manuscript to improve clarity, and precision. To further enhance comprehension, we have added illustrations in the revised version of the manuscript:

      • Figure 1A presents the gamete components; Figure 1B depicts the main steps of fertilization considered in the present study; and Figure 1C illustrates the penetration and fusion blocks, along with the respective contributing mechanisms: the ZP-block for the penetration block, and the membrane-block and PVS-block for the fusion block

      • Figure 2A provides a description of the three experimental protocols used in this study: Condition 1, in vivo fertilization after mating; Condition 2, standard in vitro fertilization following insemination of cumulus-oocyte complexes; and Condition 3, in vitro fertilization with kinetic tracking of oocytes isolated from the insemination medium 15 min after insemination of the cumulus-oocyte complexes.

      • Figure 4 (formerly Figure 7 in the initial version) now highlights all fusing and non-fusing situations documented in videos 1-6 and associated paragraphs of the Results section.

      • In the Discussion, Figure 9 from the original version has been replaced by Figure 8, which now provides a more pedagogical illustration of the inefficacy of the penetration block in preventing polyspermy in mice. This figure illustrates the respective contributions of the fusion block, the penetration block, fusion timing, and the intrinsic permeability of the zona pellucida to the occurrence of polyspermy.

      We hope that this revised version of the article will guide the reader smoothly throughout, without causing confusion.

      Regarding the various points that Reviewer 3 perceives as contradictions or factual errors, or the claims and the conclusions which, as presented, should not always supported by the data, we will provide our perspective on each of them as they are raised in the review.

      SPECIFIC COMMENTS:

      (1) The authors should use greater care in describing the blocks to polyspermy, particularly because they appear to be wishing to reframe views about prevention of polyspermic fertilization. The title mentions of "the fast block to polyspermy;" this problematic for a couple of different reasons. There is no strong evidence for block to polyspermy in mammals that occurs quickly, particularly not in the same time scale as the first-characterized fast block to polyspermy. To many biologists, the term "fast block to polyspermy" refers to the block that has been described in species like sea urchins and frogs, meaning a rapid depolarization of the egg plasma membrane. However, such depolarization events of the egg membrane have not been detected in multiple mammalian species. Moreover, the change in the egg membrane after fertilization does not occur in as fast a time scale as the membrane block in sea urchins and frogs (i.e., is not "fast" per se), and instead occurs in a comparable time frame as the conversation of the ZP associated with the cleavage of ZP2. Thus, it is misleading to use the terms "fast block" and "slow block" when talking about mammalian fertilization. This also is an instance of where the authors contradict themselves in the manuscript, stating, "the membrane block and the ZP block are established in approximatively the same time frame" (third paragraph of Introduction). This statement is indeed accurate, unlike the reference to a fast block to polyspermy in mammals.

      We fully agree with Reviewer 3 on the importance of clearly defining the two blocks examined in the present study—the penetration block and the fusion block (as referred to in the revised version) —and of situating them in relation to the three blocks described in the literature: the ZP-block, membrane-block, and PVS-block. We acknowledge that this distinction was not sufficiently clear in the original version of the manuscript. In the revised version, these two blocks and their relationship to the ZP-, membrane-, and PVS-blocks are now clearly introduced in the second paragraph of the Introduction section and illustrated in the first figure of the manuscript (Fig. 1C). They are then discussed in detail in two dedicated paragraphs of the Discussion, entitled Relation between the penetration block and the ZP-block and Relation between the fusion block and the membrane- and PVS-blocks.

      The penetration block refers to the time-dependent decrease in the number of spermatozoa penetrating the perivitelline space (PVS) following fertilization, whereas the fusion block refers to the time-dependent decrease in sperm-oolemma fusion events after fertilization. It is precisely to the characterization of these two blocks that our in vitro fertilization experiments with kinetic tracking allow us to access.

      In this study, as in the literature, fusion-triggered modifications of the ZP that hinder sperm traversal of the ZP are referred to as the ZP-block (also known as ZP hardening). The ZP-block thus contributes to the post-fertilization reduction in sperm penetration into the PVS and thereby underlies the penetration block. Similarly, fusion-triggered alterations of the PVS and the oolemma that reduce the likelihood of spermatozoa that have reached the PVS successfully to fuse with the oolemma are referred to as the PVS-block and membrane-block, respectively. These two blocks act together to reduce the probability of sperm-oolemma fusion after fertilization, and thus contribute to the fusion block.

      The time constant of the penetration block was found to be 48.3 ± 9.7 minutes, which is consistent with the typical timeframe of ZP-block completion—approximately one hour post-fertilization in mice—as reported in the literature. By contrast, the time constant of the fusion block was determined to be 6.2 ± 1.3 minutes, which is markedly faster than the time typically reported in the literature for the completion of the fusion-block (more than one hour in mice). This strongly suggests that the kinetics of the fusion block are not primarily governed by its membrane-block component, but rather by its PVS-block component—about which little to nothing was previously known.

      Contrary to what Reviewer 3 appears to have understood from our initial formulation, there is therefore no contradiction or error in stating that "the membrane block and the ZP block are established within approximately the same timeframe", while the fusion block, which proceeds much more rapidly, is likely to rely predominantly on the PVS-block. We have thoroughly revised the manuscript to clarify this key message of the study.

      However, we understand Reviewer 3’s objection to referring to the fusion block (or the PVS-block) as a fast block, given that this term is conventionally reserved for the immediate fertilization-triggered membrane depolarization occurring in sea urchins and frogs. Although the kinetics we report for the fusion block are considerably faster than those of the penetration block, they occur on the scale of minutes, and not seconds. In line with the reviewer's recommendation, we have therefore modified both the title and the relevant passages in the text to remove all references to the term fast block in the revised version.

      (2) The authors aim to make the case that events occurring in the perivitelline space (PVS) prevent polyspermic fertilization, but the data that they present is not strong enough to make this conclusion. Additional experiments would optional for this study, but data from such additional experiments are needed to support the authors' claims regarding these functions in fertilization. Without additional data, the authors need to be much more conservative in interpretations of their data. The authors have indeed observed phenomena (the presence of CD9 and JUNO in the PVS) that could be consistent with a molecular basis of a means to prevent fertilization by a second sperm. However, the authors would need additional data from additional experimental studies, such as interfering with the release of CD9 and JUNO and showing that this experimental manipulation leads to increased polyspermy, or creating an experimental situation that mimics the presence of CD9 and JUNO (in essence, what the authors call "sperm inhibiting medium" on page 20) and showing that this prevents fertilization.

      A major section of the Results section here (starting with "The consequence is that ... ") is speculation. Rather than be in the Results section, this should be in the Discussion. The language should be also softened regarding the roles of these proteins in the perivitelline space in other portions of the manuscript, such as the abstract and the introduction.

      Finally, the authors should do more to discuss their results with the results of Miyado et al. (2008), which interestingly, posited that CD9 is released from the oocytes and that this facilitates fertilization by rendering sperm more fusion-competent. There admittedly are two reports that present data that suggest lack of detection of CD9-containing exosomes from eggs (as proposed by Miyado et al.), but nevertheless, the authors should put their results in context with previous findings.

      We generally agree with all the remarks and suggestions made here. In the revised version of the manuscript, we have retained in the Results section (pp. 14–15) only the factual data concerning the localization of CD9 and JUNO in unfertilized and fertilized oocytes, as well as in the spermatozoa present in the PVS of these oocytes. We have taken care not to include any interpretive elements in this section, which are now presented exclusively in a dedicated paragraph of the Discussion, entitled “Possible molecular bases of the membrane-block and ZP-block contributing to the fusion block” (p. 21). There, we develop our hypothesis and discuss it in light of both the findings from the present study and previous work by other groups. In doing so, we also address the data reported by Miyado et al. (2008, https://doi.org/10.1073/pnas.0710608105), as well as subsequent studies by two other groups—Gupta et al. (2009, https://doi.org/10.1002/mrd.21040) and Barraud-Lange et al. (2012, https://doi.org/10.1530/REP-12-0040)—that have challenged Miyado’s findings.

      We are fully aware that our interpretation of the coverage of unfused sperm heads in the perivitelline space (PVS) by CD9 and JUNO, released from the oolemma—as a potential mechanism of sperm neutralization contributing to the PVS block—remains, at this stage, a plausible hypothesis or working model that, as such, warrants further experimental investigation. It is precisely in this spirit that we present it—first in the abstract (p.1), then in the Discussion section (p. 21), and subsequently in the perspective part of the Conclusion section (p. 22).

      (3) Many of the authors' conclusions focus on their prior analyses of sperm interaction - beautifully illustrated in Figure 7. However, the authors need to be cautious in their interpretations of these data and generalizing them to mammalian fertilization as a whole, because mouse and other rodent sperm have sperm head morphology that is quite different from most other mammalian species.

      In a similar vein, the authors should be cautious in their interpretations regarding the extension of these results to mammalian species other than mouse, given data on numbers of perivitelline sperm (ranging from 100s in some species to virtually none in other species), suggesting that different species rely on different egg-based blocks to polyspermy to varying extents. While these observations of embryos from natural matings are subject to numerous nuances, they nevertheless suggest that conclusions from mouse might not be able to be extended to all mammalian species.

      It is not clear to us whether Reviewer 3’s comment implies that we have, at some point in the manuscript, generalized conclusions obtained in mice to other mammalian species—which we have not—or whether it is simply a general, common-sense remark with which we fully agree: that findings established in one species cannot, by default, be assumed to apply to another.

      We would like to emphasize that throughout the manuscript, we have taken care to restrict our interpretations and conclusions to the mouse model, and we have avoided any unwarranted extrapolation to other species.

      To definitively close this matter—if there is indeed a matter—we have added the following clarifying statements in the revised version of the manuscript:

      In the introduction, second paragraph (pp. 2–3):"The variability across mammalian species in both the rate of fertilized oocytes with additional spermatozoa in their PVS (from 0 to more than 80%) after natural mating and the number of spermatozoa present in the PVS of these oocytes (from 0 to more than a hundred) suggests that the time for completion of the penetration block and thus its efficiency to prevent polyspermy can vary significantly between species."

      At the end of the preamble to the Results section (p. 4):"This experimental study was conducted in mice, which are the most widely used model for studying fertilization and polyspermy blocks in mammals. While there are many interspecies similarities, the findings presented here should not be directly extrapolated to humans or other mammalian species without species-specific validation."

      In the Conclusion, the first sentence is (p.22) : “This study sheds new light on the complex mechanisms that enable fertilization and ensure monospermy in mouse model.”

      Within the Conclusion section, among the perspectives of this work (p. 22):"In parallel, comparative studies in other mammalian species will be needed to assess the generality of the PVS-block and its contribution relative to the membrane-block and ZP-blocks, as well as the generality of the mechanical role played by flagellar beating and ZP mechanical constraint in membrane fusion."

      (4) Results, page 4 - It is very valuable that the authors clearly define what they mean by a penetrating spermatozoon and a fertilizing spermatozoon. However, they sometimes appear not to adhere to these definitions in other parts of the manuscript. An example of this is on page 10; the description of penetration of spermatozoon seems to be referring to membrane fusion with the oocyte plasma membrane, which the authors have alternatively called "fertilizing" or fertilization - although this is not entirely clear. The authors should go through all parts of the manuscript very carefully and ensure consistent use of their intended terminology.

      Overall, while these definitions on page 4 are valuable, it is still recommended that the authors explicitly state when they are addressing penetration of the ZP and fertilization via fusion of the sperm with the oocyte plasma membrane. This help significantly in comprehension by readers. An example is the section header in the middle of page 9 - this could be "Spermatozoa can penetrate the ZP after the fertilization, but have very low chances to fertilize."

      We chose to define our use of the term penetration at the beginning of the Results section because, as readers of fertilization studies, we have encountered on multiple occasions ambiguity as to whether this term was referring to sperm entry into the perivitelline space following zona pellucida traversal, or to the fusion of the sperm with the oolemma. To avoid such ambiguity, we were particularly careful throughout the writing of our original manuscript to use the term penetration exclusively to describe sperm entry into the PVS. The terms fertilizing and fusion were reserved specifically for membrane fusion between the gametes. However, as occasional lapses are always possible, we followed Reviewer 3’s recommendation and carefully re-examined the entire manuscript to ensure consistent use of our intended terminology. We did not identify any inconsistencies, including on page 10, which was cited as an example by Reviewer 3. We therefore confirm that, in accordance with our predefined terminology, all uses of the term penetration, on that page and anywhere else in our original manuscript, refer exclusively to sperm entry into the PVS and do not pertain to fusion with the oolemma.

      That said, it is important that all readers— including those who may only consult selected parts of the article—are able to understand it clearly. Therefore, despite the potential risk of slightly overloading the text, Reviewer 3’s suggestion to systematically associate the term penetration with ZP seems to us a sound one. However, we have opted instead to associate penetration with PVS, as our study focuses on the timing of sperm penetration into the perivitelline space, rather than on the traversal of the zona pellucida itself. Accordingly, except in a few rare instances where ambiguity seemed impossible, we have systematically used the phrasing “penetration into the PVS” throughout the revised version of the manuscript.

      Another variation of this is in the middle of page 9, where the authors use the terms "fertilization block" and "penetration block." These are not conventional terms, and venture into being jargon, which could leave some readers confused. The authors could clearly define what they mean, particularly with respect to "penetration block,"

      This point has already been addressed in our response to Comment 1 from Reviewer 3. We invite Reviewer 3 to refer to that response.

      This extends to other portions of the manuscript as well, such as Figure 2C, with the label on the y-axis being "Time after fertilization." It seems that what the authors actually observed here was the cessation of sperm tail motility. (It is not evident they they did an assessment of sperm-oocyte fusion here.)

      Regarding Figure 2C (original version), it has been merged with Figure 2B (original version) to form a single figure (Figure S2D), now included in Supplementary Information SI2. This new figure retains all the information originally presented in Figure 2C and indicates the time axis origin as the time when oscillatory movements of the sperm cease.

      That said, for the reasons detailed in our response to Reviewer 1 and in the Materials and Methods, we explain why it is legitimate to use the cessation of sperm head oscillations on the oolemma as a marker for the timing of the fusion event. We invite the reviewers to refer to that response for a full explanation of our rationale.

      (5) Several points that the authors try to make with several pieces of data do not come across clearly in the text, including Figure 2 on page 6, Figure 4 on page 9, and the various states utilized for the statistical treatment, "post-first penetration, post-first fertilization, no fertilization, penetration block and polyspermy block" on page 10. Either re-writing and clearer definitions'explanations are needed, and/or schematic illustrations could be considered to augment re-written text. Illustrations could be a valuable way present the intended concepts to readers more clearly and accurately. For example, Figure 4 and the associated text on page 9 get particularly confusing - although this sounds like a quite impressive dataset with observations of 138 sperm. Illustrations could be helpful, in the spirit of "a picture is worth 1000 words," to show what seem to be three different situations of sequences of events with the sperm they observed. Finally, the text in the Results about the 138 sperm is quite difficult to follow. It also might help comprehension to augment the percentages with the actual numbers of sperm - e.g., is 48.6% referring 67 of the total 138 sperm analyzed? Does the 85.1% refer to 57 of these 67 sperm?

      Figure 2 in the original version of our manuscript concerns sperm engulfment and PB2 extrusion. As already mentioned in our response to Reviewer 1, the characterization of sperm engulfment and PB2 extrusion kinetics is highly relevant to the analysis of the penetration and fusion blocks. However, we agree that its presence in the main text may distract the reader from the main focus of the study. Therefore, this figure and the associated text have been moved to the Supplementary Information in the revised manuscript (SI 2, pages 26–27).

      Regarding Figure 4 (original version), in response to Reviewer 3’s concern about the difficulty in grasping the message conveyed in its three graphs and associated text we have completely rethought the way these data are presented. Since the three graphs of Figure 4 were directly derived from the experimental timing data of sperm entry in the PVS and fusion with the oolemma in fertilized oocytes (originally shown in Figure 3A), we have combined them into a single figure in the revised manuscript: Figure 3 (page 8). This new Figure 3 now comprises three components:

      • Figure 3A remains unchanged from the original version and shows the timing of sperm penetration and fusion in fertilized oocytes. Each sperm category (fused or non-fused , penetrated in the PVS before fusion or after fusion) is represented using a color code clearly explained in the main text (last paragraph of page 7).
      • Figure 3B focuses specifically on the first spermatozoon to penetrate the PVS of each oocyte. It reports how many of these first-penetrating spermatozoa succeeded in fusing versus how many failed to do so, highlighting that being the first to arrive is not sufficient for fusion—other factors are involved. This is explained simply in the first paragraph of page 9.
      • Figure 3C considers all spermatozoa that entered the PVS of fertilized oocytes, classifying them into three categories: those that penetrated the PVS before fertilization, those that did so after fertilization, and those for which the timing could not be precisely determined. Such classification makes it apparent that the number of spermatozoa penetrating before and after fertilization is of the same order of magnitude, indicating that fertilization is not very effective at preventing further sperm entry into the PVS for the duration of our observations (~4 hours). To facilitate the identification of these three categories, the same color code used in Figure 3A is applied. In addition, within each category, the number of spermatozoa that successfully fused are indicated in black. This allows the reader to quickly assess the fertilization probability for each category—high for sperm entering before fertilization, very low or null for those entering after fertilization. This analysis shows that fertilization is far more effective at blocking sperm fusion than at blocking sperm penetration. This is clearly explained in the second paragraph of page 9. Regarding__ statistical analysis__, as already mentioned in our responses to Reviewers 1 and 2, this section has been rewritten to improve clarity and readability. The notation has also been significantly simplified. To improve the overall fluidity of the text related to the statistical analysis, Figure 3B (original version), which presented the timing of penetration into the perivitelline space of oocytes that remained unfertilized, along with its associated statistical analysis previously in Figure 5B), have been revised and transferred together in a single Figure S1 of the Supplementary Information (SI1, pages 26; now Figures S1A and S1B).

      (6) Introduction, page 2 - it is inaccurate to state that only diploid zygotes can develop into a "new being." Triploid zygotes typically fail early in develop, but can survive and, for example, contribute to molar pregnancies. Additionally, it would be beneficial to be more scientifically precise term than saying "development into a new being." This is recommended not only for scientific accuracy, but also due to current debates, including in lay public circles, about what defines "life" or human life.

      In response to Reviewer 3’s comment, we no longer state in the revised version of the manuscript that only diploid zygotes can develop into a new being. We have modified our wording as follows, on page 2, second paragraph: “In mammals, oocytes fertilized by more than one spermatozoon cannot develop into viable offspring.”

      (7) Introduction, page 2 - The mammalian sperm must pass through three layers, not just two as stated in the first paragraph of the Introduction. The authors should include the cumulus layer in this list of events of fertilization.

      The sentence from the introduction from the original manuscript mentioned by Reviewer 3 was: “To fertilize, a spermatozoon must successively pass two oocyte’s barriers.” This statement is accurate in the sense that the cumulus cell layer is not part of the oocyte itself, unlike the two oocyte’s barriers: the zona pellucida and the oolemma. Moreover, the traversal of the cumulus layer is not within the scope of our study, unlike the traversal of the zona pellucida and fusion with the oolemma. However, it is also correct that in our study the spermatozoa have passed through the cumulus layer before reaching the oocyte. Therefore, in response to Reviewer 3’s comment, we have revised the sentence to clarify this point as follows:

      “Once a spermatozoon has passed through the cumulus cell layer surrounding the oocyte, it still must overcome two oocyte’s barriers to complete fertilization.”

      (8) Introduction, page 2 - While there is evidence that zinc is released from mouse egg upon fertilization, the evidence is not convincing or conclusive that zinc is released from cortical granules or via cortical granule exocytosis.

      To better highlight the rationale, storyline, and scope of our study, the introduction has been thoroughly streamlined. In this context, the section discussing the cortical reaction and zinc release seemed more appropriate in the Discussion, specifically within the paragraph titled “Relationship between the penetration block and the ZP-block.”

      To address the uncertainty raised by Reviewer 3 regarding the origin of the zinc spark release, we have rephrased this part as follows:

      “The fertilization-triggered processes responsible for the changes in ZP properties are generally attributed to the cortical reaction—a calcium-induced exocytosis of secretory granules (cortical granules) present in the cortex of unfertilized mammalian oocytes—and to zinc sparks. As a result, proteases, glycosidases, lectins, and zinc are released into the perivitelline space (PVS), where they act on the components of the zona pellucida. This leads to a series of modifications collectively referred to as ZP hardening or the ZP-block”.

      (9) The authors inaccurately state, "only if monospermic multi-penetrated oocytes are able to develop normally, which to our knowledge has never been proven in mice" (page 4) - This was demonstrated with the Astl knockout, assuming that the authors use of "multi-penetrated oocytes" here refers to the definition of penetration that they use, namely penetrating the ZP. This also is one of the instances where the authors contradict themselves, as they note the results with this knockout on page 18.

      Thank you for bringing this point to our attention. Nozawa et al. (2018) found that female mice lacking ovastacin (Astl)—the protease released during the cortical reaction that plays a key role in rendering the zona pellucida impenetrable—are normally fertile. They also reported that oocytes recovered from these females after mating were monospermic, despite the consistent presence of additional spermatozoa in the perivitelline space. We can indeed consider that taken together these findings demonstrate that the presence of multiple spermatozoa in the PVS does not impair normal development, as long as the oocyte remains monospermic. In our study, we re-demonstrated this in a different way (by reimplantation of monospermic oocytes with additional spermatozoa in their PVS) in a more physiological context of WT oocytes, but we agree that we cannot state: “which to our knowledge has never been proven in mice.” This part of the sentence has therefore been removed. In the revised version of the manuscript, the sentence is now formulated in the first paragraph of page 5 as follows: “However, the contribution of the fusion block to prevent polyspermy has physiological significance only if monospermic oocytes with additional spermatozoa in their PVS can develop into viable pups.”

      Minor comments:

      There are numerous places where this reader marked places of confusion in the text. A sample of some of these:

      We will indicate hereinafter how we have modified the text in the specific examples provided by Reviewer 3. Beyond these, however, we would like to emphasize that we have thoroughly revised the entire manuscript to improve clarity and precision.

      Page 4 - "continuously relayed by other if they detach" - don't know what this means

      Replaced now p 5 by “can be replaced by others if they detach”

      Page 6 - "hernia" - do the authors mean "protrusion" on the oocyte surface?

      The paragraph from the Results section in question has now been moved to the Supplementary Information, on pages 26 and 27. The term hernia has been systematically replaced with protrusion, including in the Materials and Methods section on page 24.

      Page 10 - "penetration of spermatozoa in the PVS falls down" - don't know what this means

      Falls down has been removed from the new version and replaced with decreases

      Page 12 - "spermatozoa linked to the oocyte ZP" - not clear what "linked" means here

      Replaced now page 16 by “spermatozoa bound to the oocyte ZP”

      Page 14 - "by dint of oscillations" - don't know what this means

      Replaced now page 10 by “the persistent flagellum movements”

      Specifics for Materials and Methods:

      Exact timing of females receiving hCG and then being put with males for mating - assume this was immediate but this is an important detail regarding the timing for the creation of embryos in vivo.

      That is correct: females were placed with males for mating immediately after receiving hCG. This clarification has been added in the revised version of the manuscript.

      Please provide the volumes in which inseminations occurred, and how many eggs were placed in this volume with the 10^6 sperm/ml.

      The number of eggs may vary from one cumulus–oocyte complex to another. It is therefore not possible to specify exactly how many eggs were inseminated. However, we now indicate on page 23 the number of cumulus–oocyte complexes inseminated (4 per experiment), the volume in which insemination was performed (200 mL), and the sperm concentration used 106 sperm/mL.

      **Referees cross-commenting**

      I concur with Reviewer 1's comment, that the 'challenging prior dogma' about the first sperm not always being the one to fertilize the egg is too strong. As Reviewer 1 notes, "it had been observed before that it is not necessarily the first sperm that gets through the ZP that fertilizes the egg." I even thought about adding this comment to my review, although held off (I was hoping to find references, but that was taking too long).

      Please refer to our response to Reviewer 1 regarding this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewing Editor Comments:

      Focus and Scope:

      The paper attempts to address too many topics simultaneously, resulting in a lack of focus and insufficient depth in the treatment of individual components.

      We have moved this selective clinical review section that was previously Part I in the paper now to Part II, given the importance of leading off with the meta-analysis and resource before doing a selective review, which are now Part I. In the lead in to Part II, we now indicate that the review is not intended to be comprehensive, because there are other recent comprehensive reviews, which we cite. This part of the paper merely aims to generate hypotheses on the directionality of effects ripe for testing on how TUS could be used to excite or suppress function, illustrated with specific clinical examples. The importance of this section, even though not comprehensive, is that it should provide the reader with examples on how the directionality of TUS could be used specifically in a range of clinical applications. The reader will find that the same hypotheses do not apply to different clinical disorder. Therefore, patient specific hypotheses need to be motivated and then subsequently tested with empirical application of TUS, which Part II provides.

      Part II. Selective TUS clinical applications review and TUS directionality hypotheses starts at line 458. Part I, the meta-analysis and resource section starts at line 199, after the Introduction on TUS and the importance on understanding how the directionality of TUS effects could be better understood.

      Strengthening the Meta-Analysis:

      The meta-analysis is the strongest aspect of the paper and should be expanded to include the relevant statistics. However, it currently omits several key concepts, studies, and discussion points, particularly related to replication and the dominance of results from specific groups. These omissions should be addressed even with a focus on meta-analysis.

      We thank the reviewer for their enthusiasm about the meta-analysis, which we have now promoted to Part I in the revised paper. We have substantially updated the latest database (inTUS_DATABASE_1-2025.csv) and ensured that the R markdown script can re-generate all of the results and statistical values. We have inserted additional statistical values in the main manuscript, as requested. The inTUS Resource is located here (https://osf.io/arqp8/ under Cafferatti_et_al_inTUS_Resource), and we have aimed to make it as user friendly to use and contribute to as possible. For instance, the reader can find them all in the HTML link summarizing the R markdown output with all statistical values here: https://rpubs.com/BenSlaterNeuro/1268823, a part of the inTUS resource.

      Since the last submission, there has been a tremendous increase in the number of TUS studies in healthy participants. We have curated and included all of the relevant studies we could find in the 1-2025 database, as the next large expansion of the database (now including 52 experiments in healthy participants). We then reran and report the results of the statistical tests via the R markdown script (starting at line 336). Finally, the online database (inTUS_DATABASE_1-2025.csv) has additional columns, suggested by the reviewers, including one to identify the same groups that conducted the TUS study, based on a social network analysis. The manuscript figures (Table 1 and Table 2) did not have the space to expand the data tables, but these additional columns are available in the database online. Finally, we have ensured that the resource is as easy to use as possible (line 862 has the Introduction to the inTUS Resource – which is also the online READ ME file), and we have been in contact with the iTRUSST consortium leads who are interested in discussing hosting the resource and helping it to become self-sustaining.

      Conceptual Development:

      The more conceptual part of the paper is underdeveloped. It lacks sufficient supporting data, a well-articulated argument, and a clear derivation or development of a concrete model.

      To ensure that the conceptual sections are well developed, we have revised the introduction, including the background on TUS and bases for the interest in the directionality of effects. We have also revised the TUS mechanisms background as suggested by the reviewers. For Part I, the meta-analysis basis and hypotheses we have ensured the rationale is clearer. The hypotheses are based on several lines of research in the animal model and human literature as cited (starting with line 211). For Part II, the selective clinical review, we have revised this section as well to have each section on lowintensity TUS and end in a hypothesis on the directionality of TUS effects. Starting at line 199 we have clarified the scope of the review and ensured that all the relevant experiments in healthy participants (n = 52 experiments) have now been included in the next key update of the resource and meta-analysis in this key paper update.

      Database Curation:

      The authors should provide more detailed information about how the database will be curated and made accessible. They may consider collaborating with ITRUSST.

      We have expanded the information on the Resource documents (starting at line 862) to make the resource as user friendly as possible. At the beginning of the resource development stage we had contacted but not heard from the ITRUSST consortium. Encouraged by this comment we again reached out and are now in contact with the ITRUSST consortium leads who are interested in discussing sustaining the resource. It would be wonderful to have the resource linked to other ITTRUST tools, since it was inspired by the organization. Practically what this means is that the resource rather than being hosted on Open Science Framework, would potentially be hosted on the ITRUSST web site (https://itrusst.com/). These discussions are in progress, but the next key update to the database (1-2025) is already available and reported in this key update to our original paper.

      Reviewer #1: (Public Review)

      Summary:

      This paper is a relevant overview of the currently published literature on lowintensity focussed ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems nevertheless relevant to summarize the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data-free permutation of the parameter space.

      The meta-analysis concludes that there are, at best, weak trends toward specific parameters predicting the direction of the stimulation effects. The data have been incorporated into an open database, that will ideally continue to be populated by the community and thereby become a helpful resource as the field moves forward.

      Strengths:

      The current state of human TUS is concisely and well summarized. The methods of the meta-analysis are appropriate. The database is a valuable resource.

      Weaknesses:

      These are not so much weaknesses but rather comments and suggestions that the authors may want to consider.

      We thank the reviewer for their support of the resource and meta-analysis. We have implemented the suggestions next as follows.

      I may have missed this, but how will the database be curated going forward? The resource will only be as useful as the quality of data entry, which, given the complexity of TUS can easily be done incorrectly.

      We have added a paragraph on how authors could use the Qualtrics form to submit their data and the curation process involved (from line 891). Currently, this process cannot be automated because we continue to find that reported papers do not report the TUS parameters that ITRUSST has encouraged the community to report (Martin et al., 2024). We can dedicate for a TUS expert to ensure that every 6 or 12 months the data base is curated and expanded. The current version is the latest 1-2025 update to the data base. Longer term we are in discussion with ITRUSST on whether the resource could become self sustaining when TUS papers regularly reporting all the relevant parameters such that the database expansion becomes trivial, and then the Resource R markdown script and other tools can be used to re-evaluate the statistical tests and the user can conduct secondary hypothesis testing on the data.

      It would be helpful to report the full statistics and effect sizes for all analyses. At times, only p-values are given. The meta-analysis only provides weak evidence (judged by the p-values) for two parameters having a predictive effect on the direction of neuromodulation. This reviewer thinks a stronger statement is warranted that there is currently no good evidence for duty cycle or sonication direction predicting outcome (though I caveat this given the full stats aren't reported). The concern here is that some readers may gallop away with the impression that the evidence is compelling because the p-value is on the correct side of 0.05.

      We have ensured that the R script can generate the full statistics from the tests and the effect sizes for all the analyses, and now also report more of the key statistical values in the revised paper (starting at line 336). As suggested, we have also ensured that the interpretation is sufficiently nuanced given the small sample sizes and the p-values below 0.1 but above 0.05 are interpreted as a statistical trend.

      This reviewer thinks the issue of (independent) replication should be more forcefully discussed and highlighted. The overall motivation for the present paper is clearly and thoughtfully articulated, but perhaps the authors agree that the role that replication has to play in a nascent field such as TUS is worth considering.

      We completely agree and have added additional columns to the online database to identify unique groups, using a social network analysis, and independent replications. These expanded tables did not fit in the manuscript versions of Tables 1 and 2 but are fully available in the Resource data tables ready for further analysis by interested resource users.

      A related point is that many of the results come from the same groups (the so-called theta-TUS protocol being a clear example). The analysis could factor this in, but it may be helpful to either signpost independent replications, which studies come from the same groups, or both.

      In the expanded database tables (inTUS_DATABASE_1-2025.csv: https://osf.io/arqp8/ under Cafferatti_et_al_inTUS_Resource) we have added a column to identify independent replication.

      The recent study by Bao et al 2024 J Phys might be worth including, not least because it fails to replicate the results on theta TUS that had been limited to the same group so far (by reporting, in essence, the opposite result).

      Thank you. We have added this study and over a dozen recent TUS studies in healthy participants to the database and redone the analyses.

      The summary of TUS effects is useful and concise. Two aspects may warrant highlighting, if anything to safeguard against overly simplistic heuristics for the application of TUS from less experienced users. First, could the effects of sonication (enhancing vs suppressing) depend on the targeted structure? Across the cortex, this may be similar, but for subcortical structures such as the basal ganglia, thalamus, etc, the idiosyncratic anatomy, connectivity, and composition of neurons may well lead to different net outcomes. Do the models mentioned in this paper account for that or allow for exploring this? And is it worth highlighting that simple heuristics that assume the effects of a given TUS protocol are uniform across the entire brain risk oversimplification or could be plain wrong? Second, and related, there seems to be the implicit assumption (not necessarily made by the authors) that the effects of a given protocol in a healthy population transfer like for like to a patient population (if TUS protocol X is enhancing in healthy subjects, I can use it for enhancement in patient group Y). This reviewer does not know to which degree this is valid or not, but it seems simplistic or risky. Many neurological and psychiatric disorders alter neurotransmission, and/or lead to morphological and structural changes that would seem capable of influencing the impact of TUS. If the authors agree, this issue might be worth highlighting.

      We agree that given the divergence in circuits and cellular constituents between cortical and subcortical areas, it is important to distinguish studies that have focused on cortical or subcortical brain areas. The online data tables identify the target region. The analyses can be used to focus on the cortical or subcortical sites for analysis, although for the current version of the database there are too few subcortical sites with which to conduct analyses on subcortical sites. On the second point, that pathology may have affected the results, we completely agree and have clarified that the current database only includes healthy participant experiments for this reason. We are considering future updates to the resource may include clinical patient results (Line 247).

      Reviewer #1 (Recommendations for the authors):

      Minor edits (I wouldn't call them "corrections").

      We sincerely appreciate the constructive comments and have aimed to address them all as suggested.

      Perhaps the most relevant edit pertains to the statistics.

      We now report the more complete statistical results (line 336) and the R markdown script can re-generate all the statistical values for the tests.

      The issue of replication also seems relevant and ought to be raised. This reviewer does not want to prescribe what to do or impose the view the authors ought to adopt.

      In the online version of the data tables for the latest dataset, we have added a column in the data table as suggested that identifies independent groups and replications.

      The other points are left to the authors' discretion.

      We have aimed to address all of the reviewer’s points. Thank you for the constructive input which has helped to improve the expanded database and resource.

      Reviewer #2: (Public Review)

      Summary:

      This paper describes a number of aspects of transcranial ultrasound stimulation (TUS) including a generic review of what TUS might be used for; a meta-analysis of human studies to identify ultrasound parameters that affect directionality; a comparison between one postulated mechanistic model and results in humans; and a description of a database for collecting information on studies.

      Strengths:

      The main strength was a meta-analysis of human studies to identify which ultrasonic parameters might result in enhancement or suppression of modulation effects. The meta-analysis suggests that none of the US parameters correlate significantly with effects. This is a useful result for researchers in the field in trying to determine how the parameter space should be further investigated to identify whether it is possible to indeed enhance or suppress brain activity with ultrasound.

      The database is a good idea in principle but would be best done in collaboration with ITRUSST, an international consortium, and perhaps should be its own paper.

      Weaknesses:

      The paper tries to cover too many topics and some of the technical descriptions are a bit loose. The review section does not add to the current literature. The comparison with a mechanistic model is limited to comparing data with a single model at a time when there is no general agreement in the field as to how ultrasound might produce a neuromodulation effect. The comparison is therefore of limited value.

      We appreciate the reviewer’s assessment and interest in the meta-analysis and database to guide the development of TUS for more systematic control of the directionality of neuromodulation. With this next key expansion of the database (inTUS_DATABASE_1-2025.csv) we have added over a dozen new studies that have been published since our original submission (n = 52 experiments). We have also moved the ‘review’ part of the paper below the meta-analysis and resource description. We have clarified that the clinical review section (now Part II in the revised manuscript) is not intended as a comprehensive review but as a selective review showing how hypotheses on the directionality of TUS effects need to be carefully developed for specific patient groups that require different effects to be induced at specific brain areas. Finally, we have gotten in contact with the ITRUSST consortium leads, as suggested, and are in discussion on whether the inTUS resource could be hosted by ITRUSST. Since these discussions are ongoing practically what this might mean is moving the resource from the Open Science Framework to ITRUSST webpages, which would be a trivial update of the link to the resource in OSF.

      We also sincerely appreciate the time and care the reviewer has given to provide us with the below guidance, all of which we have aimed to take on board in the revised paper.

      Reviewer #2 (Recommendations for the authors):

      Line 24/25 - I suggest avoiding using the term "deep brain stimulation" in reference to TUS as the term is normally used to describe electrically implanted electrodes.

      We have removed the term “deep” brain stimulation in reference to TUS to avoid confusion with electrical DBS for patient treatment [Line 24].

      Line 25 - I don't think "computational modelling" has changed how TUS can be done. There is still much to be understood about mechanisms. I think the modelling aspects of the paper should be toned down. Indeed the NICE data that is presented later appears to have a weak, if any, correlation to the outcomes.

      We have revised the manuscript text throughout to ensure that the computational modeling contributions are not overstated, as noted, given the lack of strong correlation to the NICE model outcomes by the meta-analysis including in the latest results with the more extensive database (n = 52).

      Line 32 - "exponentially increasing" is a well-defined technical term and the increase in studies should be quantified to ensure it is indeed exponential. I agree that TUS studies in humans are increasing but a quick tally of the data by year in the meta-analysis reported here doesn't suggest that it follows an "exponential" growth.

      We have changed “exponential” to “to increase”. [Line 32]

      Line 50 - I would suggest using the term sub-MHz rather than 100-1,000 kHz as it is challenging to deliver ultrasound at 1 MHz through the skull. The highest frequency in the meta-analysis is 850 kHz; but the majority are in the 200-500 kHz range.

      We have made this correction to sub-MHz. [Line 54]

      Line 58/59 - Is the FDA publication on diagnostic imaging relevant for saying that 50 W/cm2 is a lowintensity TUS? I think it's perhaps reasonable to say that intensities below diagnostic thresholds are "low intensities" but that is not clear in the text. I would refer to ITRUSST on what is appropriate for defining what is low, medium, or high.

      We have cut the reference to the FDA here since it is, as noted, not as relevant as pointing to the ITRUSST definition.

      Line 65/66 - I agree that ultrasound for neuromodulation is gaining traction and there is an increase in activity, but it also has a long history with the work of the Fry brothers published in the 1950s; and extensive work of Gavrilov in humans starting in the 1970s.

      We have added citations to the Fry brothers and Gavrilov to the text in this section. [Line 69/70]

      Line 75 - I think the intermembrane cavitation mechanism is unlikely to be due to "microbubbles" in a lipid membrane. The predicted displacements are on the order of nanometres, so they are unlikely to generate microbubbles. The work on comparing with NICE is limited. Note there are a number of experimental papers that have reported an absence of intra-membrane cavitation, including the Yoo et al 2022 which is referenced later in the paragraph. Also, there are other models, such as Liao et al 2021 (https://www.nature.com/articles/s41598020-78553-2).

      As suggested, we have removed this phrase on microbubble formation as a likely mechanism. We have also added the Liao paper to this paragraph as it is relevant.

      Line 83 - "At the lower intensities..." it is not clear whether this means all TUS intensities or the lower end of intensities used in TUS.

      We now use the following wording here: “low intensities”. [Line 86]  

      Line 85/86 - "more continuous stimulation" the modulation paradigms haven't been described yet and so pulse vs continuous hasn't been made clear to the reader. Also "more continuous" is very loose terminology. Something is either continuous or it isn't.

      We agree and have removed “more” to be clear that the stimulation is continuous. [Line 88]

      Line 87/88 - "TUS does not .. cavitation ..when ..ISPTA...<14 W/cm2". You can't use ISPTA to determine cavitation. It is the peak negative pressure which is the key driver for cavitation and the MI which is the generally accepted (although grudgingly by some) metric for assessing cavitation risk. You can link the negative pressure to ISPPA but not really to ISPTA. In histotripsy for example the ISPTA is low due to the low duty cycles to avoid heating but the cavitation is a huge effect. Technical terminology is loose.

      We have corrected this to “TUS does not appear to cause significant heating or cavitation of brain tissue when the intensity remains low, based on Mechanical and Thermal Index values and recommendations of use”. [Line 90/91]

      Line 89 - What is meant by "low intensity TUS"? I think all TUS used in the literature counts as low intensity - in that it is below the level allowed for diagnostic imaging.

      We have ensured that the text is focused on TUS being low-intensity and only in the introduction do we distinguish low intensity TUS from moderate and high intensity TUS, such as used for thermal ablation [Lines 62-66].

      Line 88/89 - Most temperature rises in brain tissue in TUS are well below 1 C - will this really change membrane capacitance significantly? If so it would have been good to consider a model for it.

      We have revised this statement as “thermal effects could at least minimally alter cell membrane capacitance…”. [Line 93]

      Line 111 - The text refers to "recent studies" but then the next two references are from 1990 and 2005 which I would argue don't count as "recent".

      We have corrected this wording to “previous studies”. [Line 114]

      Lines 122/129 - This paragraph on TMS pulsing should be linked to the TUS paragraph on pulsing (lines 109/116). The intervening paragraph on anaesthesia is relevant but breaks the flow.

      We have merged the paragraph on anesthesia to the prior one on TUS so that the TMS paragraph is linked more closely to it [starting on line 112].

      Line 130/131 - It is not clear to me that current studies are being guided by computational models. I think there is still no generally accepted theory for mechanisms. If the authors want to do a mechanisms paper then they should compare a few.

      We have revised this as suggested to not overstate the contribution of the limited computational modeling studies throughout the manuscript.

      Line 132 on - There are a number of studies that suggest that NICE is likely not the mechanism by which TUS produces neuromodulation.

      We have revised this sentence as follows: “Although it remains questionable whether intramembrane cavitation is a key mechanism for TUS, the NICE model simulations explored a broad set of TUS parameters, including TUS intensity and the continuity of stimulation (duty cycle) on modelled neuronal responses.” [Lines 139/142]

      Lines 137-140 - Terms are defined after their use. Things like ISPPTA, PRF, TI, and MI have been discussed already and so the terms should have been defined earlier. The authors should think carefully about how the material is presented to make it more logical for the reader.

      We have ensured that the definitions precede the use of abbreviations and have added abbreviations to the tables.

      Part I Line 180-437 - The review of potential applications for TUS reads like an introductory chapter of a thesis. It is entirely proper for a thesis to have a chapter like this, but it is not really relevant for a peer-reviewed research article. There are also numerous applications, e.g. mapping areas associated with decisions, or treating patients with addiction, which are not included, so it is not exhaustive. I would suggest this part be removed.

      We have moved the ‘review’ part of the paper to Part II, given the metaanalysis and resource should be more prominent as Part I. In the review now Part II of the paper we also now make it clear that there are recent comprehensive reviews of the clinical literature ( line 465/467). Namely, the purpose of our selective review is to demonstrate how directionality of TUS effects need to be specific for the clinical application intended, given the great variability in clinical effects that might be desired, brain areas targeted and pathology being treated. We have also aimed to ensure that each section summary is scholarly and academically written to a high level. All the co-authors contributed to these sections so we have also edited to have some consistency across sections, with sections ending with directionality of TUS hypotheses that could be developed for empirical testing.

      Line 453 - It is stated that "ISPTA, which mathematically integrates ISSPA by the sonication DC" It sounds rather grand to mathematically integrate but you can't integrate with respect to DC, you can integrate with respect to time. If you integrate intensity with respect to time over pulse and over the sonication time then one finds that ISPTA = DC x ISPPA, multiplication is also an important mathematical function and should be given its due. Lastly, I think there is a typo and ISSPA should read ISPPA

      We have corrected the typo and the statement to “mathematically multiplies ISPPA by the continuity of sonication”. [Line 221/222]

      Line 454 - I don't think ISPTA is a good measure of "dose." In radiation physics dose is well defined in terms of absorbed energy. The equivalent has yet to be defined for TUS so I would avoid using dose. The ISPTA does relate to TI - although it depends not just on the spatial peak but also on the spatial distribution and the frequency-dependent absorption coefficient of the tissue. I would just avoid the use of "dose" until the field has a better idea of what is going on.

      We have cut this phrase on dose as suggested.

      Page 16 Box 1 - TI is defined as diagnostic ultrasound imaging it is based on. Also, I think TI is dimensionless; it is referenced to a 1-degree temperature rise and so it can be interpreted in terms of celsius or kelvin; but to be technically accurate it is dimensionless.

      We have made TI dimensionless in Box 1

      Page 17 Box 2 - Here you have no units for TI - which is correct but inconsistent with Box 1. But the legend suggests a 2 K temperature rise where as your Box allows for 6 K. The value of 6 is consistent with FDA but my understanding of the BMUS guidelines is the TI must be less than or equal to 0.7 for unlimited time or less than 3 if the duration is less than 1 minute. I accept that the table is labelled FDA limits, but the bold table caption is "Recommendations for TUS parameters" I think you should give the ITRUSST values rather than FDA.

      We have revised this Box legend to better distinguish the FDA and ITRUSST recommendation where they differ (e.g., the importance of ISPTA and the TI values). See revised legend for Box 2.

      Page 18 Box 3 - Not sure what this is trying to show? Also, what is "higher intensity" and "lower intensity"?

      Why not just give a range of values in each box?

      We agree that the higher and lower intensities likely to lead to enhancement or suppression are poorly defined and have noted this in the legend: “Note that the threshold for ISPPA qualifying as ‘higher’ or ‘lower’ intensity is currently poorly understood, or may non-linearly interact with other factors” [Line 751/754, Box 3].

      Line 444 - The hypotheses should be stated more clearly. Maybe I am just dense, but it is not obvious to me from box 3.

      We provide the basis for the hypotheses in the manuscript text on the paragraph [Lines 106-179].

      Line 481/482 - The intensity of a diagnostic ultrasound system is very well characterised. It just might be that the authors didn't report it. It is not clear what is meant by the "continuity." I guess it's to do with pulsing - which is also well defined but perhaps also not reported.

      We agree and have revised this as follows “For the meta-analysis, we only included studies that either reported a basic set of TUS stimulation parameters or those sufficient for estimating the required parameters or those sufficient for estimating the required parameters necessary for the meta-analysis” [Lines 256/258]

      Figure 2 - What is the purpose of this figure? Did you carry out simulations for all the studies? It doesn't seem to be relevant to the data here.

      This figure illustrates the TUS targeting approach and simulations, in this case conducted in k-plan. These were conducted to evaluate approximations to ISPPA in brain values from the studies that did not report these values [Lines 264/268]).  

      Figure 4 - The data in these figures is nice (and therefore doesn't need to have a NICE curve) To me it clearly shows that the data in the literature does not obviously segment into enhancement vs suppression with DC. I suspect it is the same with PRF. I think it would have been better if C and D had PRF on the horizontal axis for on-line and off-line so that effect could be seen more clearly.

      We have kept the NICE curve only for a reference that some readers familiar with the NICE model might want to see overlaid in the figure, but have ensured that the text throughout makes clear that the NICE model predictions are not as statistically robust as initially anecdotally thought. PRF results are not significant but we do show a panel with the PRF measures on one axis (Fig. 4D). Figure 5 also shows box plot results with PRF as well as the other key TUS parameters. Moreover, in the inTUS resource we have provided an app for users to explore the data (https://benslaterneuro.shinyapps.io/Caffaratti_inTUS_Resource/).

      Figure 5 - The text on the axes is too small to read. Was the DC significant for both on-line and offline? What about ISPPA for off-line. At least by eye, it looks as different as DC. Figure 5C doesn't add anything.

      We have boosted the font for Figure 5 and have cut panel 5C since it was not adding much. We have also checked whether DC parameter was significant separately for on-line and off-line effects, but the sample sizes were too small for significance, and the statistical test was not significantly different for Online and Offline effects even in the 12025 database. Therefore they might look stronger for Offline effects in some of the plots in Figure 5, but are currently statistically indistinguishable [Lines 347/348].

      Table 1 - There is a typo in the 3rd column. FF should have units of kHz, not KHz. In addition, SD should have units of s as that is the SI symbol for seconds. I would swap columns 9 and 10 so that ISPPA in water and ISPPA in the brain are next to each other.

      We have corrected the typo in the 3rd column and ensured that units are kHz. SD in the tables has units of ‘s’ for seconds and have put ISPPA in water and in brain next to each other in the data tables.

      Line 767 - "M.K. was supported..." There are TWO MKs in the author list.

      We have changed this to M.Ka. for Marcus Kaiser.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers and editors for their careful consideration of our work and pointing out areas where the current version lacked clarity or necessary experiments. Based on the reviews we have made the following significant changes to the revised version:

      (1) Revised the text to focus on the distinct pathogen responses to indole in isolation versus fecal material.

      We believe the key takeaway from this work is that the native context of a given effector, in this case indole, can elicit markedly different bacterial responses compared to the pure compound in isolation. This is because natural environments contain multiple, often conflicting, stimuli that complicate predictions of overall chemotactic behavior. For example, while indole has been proposed to mediate chemorepulsion and contribute to colonization resistance against enteric pathogens, our findings challenge this model. We provide evidence that feces, the intestinal source of indole, actually induces attraction, and that indole taxis may in fact benefit the pathogen through prioritizing niches with low microbial competition. Put another way, the biological reservoir of indole, fecal material, generates an attraction response but indole regulated the degree of attraction.

      Most current understanding of chemotaxis is based on responses to individual, purified effectors. Our study highlights the need to investigate chemotactic responses in the presence of native mixtures, which better reflect the complexity of natural environments and may reveal new functional insights relevant for disease.

      Reviewer comments indicated that these core points above were not clearly conveyed in the previous version, and that the manuscript's logical flow needed improvement. In this revised version, we have substantially rewritten the text and removed extraneous content to sharpen the focus on these central findings. We have also aligned our discussion more closely with the experimental data. While we appreciated the reviewers’ thoughtful suggestions, we chose not to expand on topics that fall outside the scope of our current experiments.

      (2) Provide new chemotaxis data with mixtures of fecal effectors (Fig. 5).

      Related to the above, the reviewers and editors brought up concerns that our discovery of pathogen fecal attraction was underexplored. Although we showed Tsr to be important for mediating fecal attraction, even the tsr mutant showed attraction to a lesser degree, and the reviewers noted that we did not identify what other fecal attractants could be involved.

      Fecal material is a complex biological material (as noted by Reviewer 3) and contains effectors already characterized as chemoattractants and chemorepellents. It would be ideal to be able to perform some experiment where individual effectors are removed from fecal material and then quantify chemotaxis. We considered methods to do this but ultimately found this approach unfeasible. Instead, we employed a reductionist approach and developed a synthetic approximate of fecal material containing a mixture of known chemoeffectors at fecal-relevant concentrations (Fig. 5). We used this defined system as a way to test the specific roles of the Tsr effectors L-Ser (attractant) and indole (repellent) in relation to glucose, galactose, and ribose (sensed through the chemoreceptor Trg), and L-Asp (sensed through the chemoreceptor Tar). We chose these effectors as they have reasonable structure-function relationships established in prior work, and had information available about their concentrations in fecal material. We present these data as a new Figure 5, and also provide videos clearly showing the responses to each treatment (Movies 7-10).

      This defined system provided several new insights that help understand and model indole taxis amidst other fecal effectors. First, the complete effector mixture, like fecal treatment, elicits attraction. Second, L-Ser is able to negate indole chemorepulsion in cotreatments of the two effectors, and also other chemoattractants in the absence of L-Ser also negate this repulsion, albeit to a lesser degree, helping to explain why the tsr mutant still shows attraction to fecal material. Lastly, we also show that the degree of attraction in this system is controlled by indole, with mixtures containing greater indole showing less attraction. We feel this is an important addition to the study because it provides a new view on how indole-taxis functions in pathogen colonization; rather than causing the pathogen to swim away (like pure indole does) indole helps the pathogen rank and prioritize its attraction to fecal effector mixtures, biasing navigation toward lower indolecontaining niches.

      We also acknowledge that this defined system does not capture all possible interactions. Indeed, there are even a few chemoreceptors in Salmonella for which the sensing functions remain poorly understood. Nonetheless, we believe the data offer mechanistic context for understanding fecal attraction and suggest that factors beyond Tsr, L-Ser, and indole also contribute to the observed behaviors, aligning with other data we present.

      (3) Provide new data that show that E. coli MG1655, and disease-causing clinical isolate strains of the Enterobacteriaceae Tsr-possessing species E. coli, Citrobacter koseri, and Enterobacter cloacae exhibit fecal attraction (Fig. 4).

      An important new finding from this study is our direct test of whether indole-rich fecal material elicits repulsion. Contrary to expectations, given that for E. coli indole is a wellcharacterized strong chemorepellent, we show that fecal material instead elicits attraction in non-typhoidal Salmonella.

      Reviewers raised the question of whether our observations regarding indole taxis and attraction to indole-rich feces in Salmonella are similar or relevant to E. coli. While a full dissection of indole taxis in E. coli is beyond the scope of this study and has been the focus of extensive prior research, we sought to address this point by examining whether other enteric pathogens respond similarly to the native indole reservoir, fecal material. To this end, we present new data demonstrating that, like S. Typhimurium, E. coli and other representative enteric pathogens and pathobionts possessing Tsr are also attracted to indole-rich feces (Fig. 4, Movies 4–6, Fig. S4).

      Notably, these new results represent some of the first characterizations of chemotactic behavior in the clinical isolates we examined, including E. coli NTC 9001 (a urinary tract infection isolate), Citrobacter koseri, and Enterobacter cloacae, adding another element of novelty to this work.

      (4) Repeated all of the explant Salmonella Typhimurium infection studies and added a new experimental control competition between WT and an invasion-deficient mutant (invA).

      Although our new colonic explant system was noted as a novelty and strength of this work, it was also seen as a weakness in that some of the results were surprising and difficult to link to chemotactic behavior. Reviewer 3 also brought up the need to be clear about our usage of the term ‘invasion’ in reference to S. Typhimurium entering nonphagocytic host cells, and requested we test an invasion-inhibited mutant (which we do in new experiments, now Fig. S1). We also note that some of the interpretations of these data were made challenging by result variability.

      To help address these issues we performed additional replicates for all of our explant experiments (contained within Figure 1, Fig. S1-S2, and Data S1), to provide greater power for our analyses. These new data provide a clearer view of this system that revise our interpretations from the prior version of this study. While treatment with indole alone does suppress the WT advantage over chemotactic mutants for both total colonization and cellular invasion, essentially all other treatments have a similar result with a timedependent increase in both colonization and invasion, dependent on chemotaxis and Tsr. A remaining unique feature of fecal treatment is an increase in the cellular invaded population of the cells at 3 h post-infection. As requested by Reviewer 3, we provide new experimental data showing that in competitions between WT and an invasion-deficient mutant (invA), with fecal material pretreatment, we see the WT has an advantage only for the gentamicin-treated qualifications, providing some support that our model selects for the invaded sub-population. Although we note that the invA still can invade through alternative mechanisms (as discussed in earlier work such as here: https://doi.org/10.1111/1574-6968.12614), so the relative amount of presumed cellular invasion is less than WT, and not zero, in our experiments (Fig. S1).

      One point of confusion in the previous version of the text was the assay design for the explant experiments, which is important to understand in order to interpret the results. During the explant infection bacteria are not immersed in the effector treatment solution, rather the tissue is soaked in the effector solution beforehand and then exposed to a 300 µl buffer solution containing the bacteria. This means that the bacteria experience only the residue of that treatment at concentrations far lower. We have added clarity about this through revising Fig. 1 to include a conceptual diagram of the assay (Fig. 1C), and added a new supplementary Fig. S5 that summarizes the explant data in this same conceptual model. We provide detail on the method in the text in lines 115-137. In describing the results, and synthesizing them in the discussion, we now state:

      Line 112: “This establishes a chemical gradient which we can use to quantify the degree to which different effector treatments are permissive of pathogen association with, and cellular invasion of, the intestinal mucosa (Fig. 1C).”

      And, a new section in the discussion devoted to describing the explant infections:

      Line: 366: “Our explant experiments can be thought of as testing whether a layer of effector solution is permissive to pathogen entry to the intestinal mucosa, and whether chemotaxis provides an advantage in transiting this chemical gradient to associate with, and invade, the tissue (Fig. 1C, Fig. S5).”

      As mentioned above, we have honed the text to focus on the disparity between the effects of indole alone versus treatments with indole-rich feces to help clarify how these data advance our understanding of the indole taxis in directing pathogenesis. While our explant studies still confirm the role of factors other than L-Ser, indole, and Tsr in directing Salmonella infection and cellular invasion, we now include further analyses of other fecal effectors (described above) that provide some insights into how fecal effectors have some redundancy in their impact.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study shows, perhaps surprisingly, that human fecal homogenates enhance the invasiveness of Salmonella typhimurium into cells of a swine colonic explant. This effect is only seen with chemotactic cells that express the chemoreceptor Tsr. However, two molecules sensed by Tsr that are present at significant concentrations in the fecal homogenates, the repellent indole and the attractant serine, do not, either by themselves or together at the concentrations in which they are present in the fecal homogenates, show this same effect. The authors then go on to study the conflicting repellent response to indole and attractant response to serine in a number of different in vitro assays.

      Strengths:

      The demonstration that homogenates of human feces enhance the invasiveness of chemotactic Salmonella Typhimurium in a colonic explant is unexpected and interesting. The authors then go on to document the conflicting responses to the repellent indole and the attractant serine, both sensed by the Tsr chemoreceptor, as a function of their relative concentration and the spatial distribution of gradients.

      Thank you for your summary and acknowledgement of the strengths of this work. We hope the revised text and additional data we provide further improve your view of the study.

      Weaknesses:

      The authors do not identify what is the critical compound or combination of compounds in the fecal homogenate that gives the reported response of increased invasiveness. They show it is not indole alone, serine alone, or both in combination that have this effect, although both are sensed by Tsr and both are present in the fecal homogenates. Some of the responses to conflicting stimuli by indole and serine in the in vitro experiments yield interesting results, but they do little to explain the initial interesting observation that fecal homogenates enhance invasiveness.

      Thank you for noting these weaknesses. We have provided new data using a defined mixture of fecal effectors to further investigate the roles of L-Ser, indole, and other effectors present in feces that we did not initially study. We have refined our discussion of these results to hopefully improve the clarity of our conclusions. We show now both in explant studies (Fig. 1I) and chemotaxis responses to a defined fecal effector system (Fig. 5) that L-Ser is able to abolish both the suppression of indole-mediated WT advantage and also indole chemorepulsion, respectively. We also show the latter can be accomplished by other fecal chemoattractants (Fig. 5). This is in line with our earlier finding that Tsr, the sensor of indole and L-Ser, is an important mediator of fecal attraction but not the sole mediator.

      As this reviewer points out, there are indeed other factors mediating invasion that we do not elucidate here, but we do note these possibilities in the text (lines: 125-127):

      “This benefit may arise from a combination of factors, including sensing of host-emitted effectors, redox or energy taxis, and/or swimming behaviors that enhance infection [5,30,31,35].”

      Reviewer #2 (Public review):

      Summary:

      The manuscript presents experiments using an ex vivo colonic tissue assay, clearly showing that fecal material promotes Salmonella cell invasion into the tissue. It also shows that serine and indole can modulate the invasion, although their effects are much smaller. In addition, the authors characterized the direct chemotactic responses of these cells to serine and indole using a capillary assay, demonstrating repellent and attractant responses elicited by indole and serine, respectively, and that serine can dominate when both are present. These behaviors are generally consistent with those observed in E. coli, as well as with the observed effects on cell invasion.

      Strengths:

      The most compelling finding reported here is the strong influence of fecal material on cell invasion. Also, the local and time-resolved capillary assay provides a new perspective on the cell's responses.

      Thank you for acknowledging these aspects of the study.

      Weaknesses:

      The weakness is that indole and serine chemotaxis does not seem to control the fecal-mediated cell invasion and thus the underlying cause of this effect remains unclear.

      In addition, the fact that serine alone, which clearly acts as a strong attractant, did not affect cell invasion (compared to buffer) is somewhat puzzling. Additionally, wild-type cells showed nearly a tenfold advantage even without any ligand (in buffer), suggesting that factors other than chemotaxis might control cell invasion in this assay, particularly in the serine and indole conditions. These observations should probably be discussed.

      Addressed above.

      Final comment. As shown in reference 12, Tar mediates attractant responses to indole, which appear to be absent here (Figure 3J). Is it clear why? Could it be related to receptor expression?

      Thank you for noting this. We now mention this in the discussion. In the course of this work, we encountered a number of apparent inconsistencies, or differences, between what we were observing with S. Typhimurium and what had been reported previously in studies of Tsr function in E. coli. We indeed noted that some studies had investigated a role of Tar for indole taxis (in E. coli), hence why we determined whether, and confirmed, that Tsr is required for indole taxis for S. Typhimurium (Fig. 6).

      We do not know the reason for this apparent difference between the two bacteria, but we have previously shown with our same strain of S. Typhimurium IR715, under the same growth assay, and preparation protocol, that L-Asp is a strong chemoattractant for both WT and the tsr mutant (see Glenn et al. 2024, eLife, Fig. 5G: https://iiif.elifesciences.org/lax:93178%2Felife-93178-fig5-v1.tif/full/1500,/0/default.jpg).

      This supports that this strain of Salmonella indeed has a functional Tar present and is expressed at a level sufficient for sensing L-Asp. So, if Tar generally mediates indole sensing we do not know why we would not see that in Salmonella. Hence, we do not see any role for Tar in indole chemorepulsion in our strain of study, which is different than reported for E. coli, but we cannot confirm the reason.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Franco and colleagues describe careful analyses of Salmonella chemotactic behavior in the presence of conflicting environmental stimuli. By doing so, the authors describe that this human pathogen integrates signals from a chemoattractant and a chemorepellent into an intermediate "chemohalation" phenotype.

      Strengths:

      The study was clearly well-designed and well-executed. The methods used are appropriate and powerful. The manuscript is very well written and the analyses are sound. This is an interesting area of research and this work is a positive contribution to the field.

      Thank you for your comments.

      Weaknesses:

      Although the authors do a great job in discussing their data and the observed bacterial behavior through the lens of chemoattraction and chemorepulsion to serine and indole specifically, the manuscript lacks, to some extent, a deeper discussion on how other effectors may play a role in this phenomenon. Specifically, many other compounds in the mammalian gut are known to exhibit bioactivity against Salmonella. This includes compounds with antibacterial activity, chemoattractants, chemorepellers, and chemical cues that control the expression of invasion genes. Therefore, authors should be careful when making conclusions regarding the effect of these 2 compounds on invasive behavior.

      Thank you for this comment, and we agree with your point. We hope we have revised the text and provided new data to address your concern. We have also chosen for clarity to keep our text close to our experimental data and so have refrained from speculating about some topics, even though you are absolutely correct about the immense complexity of these systems.

      It is important that the word invasion is used in the manuscript only in its strictest sense, the ability displayed by Salmonella to enter non-phagocytic host cells. With that in mind, authors should discuss how other signals that feed into the control of Salmonella invasion can be at play here.

      Thank you for your recommendation. We have revised the text to hopefully be clearer on our meaning of invasion in regard to Salmonella entering non-phagocytic host cells, essentially changing our usage to ‘cellular invasion’ throughout.

      It is also a commonly-used phrase in reference to enteric infections and the colonization resistance conferred by the microbiome to refer to ‘invading pathogens’ (i.e. invasion in the sense of a new microbe colonizing the intestines), For instance, this recent review on Salmonella makes use of the term invading pathogen (https://www.nature.com/articles/s41579-021-00561-4). We acknowledge the confusion by this dual use of the term. We have mostly removed our statements using invasion in this context. We hope our language is clearer in this revised version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It was difficult to understand the true intent or importance of the study described in this manuscript. The first figure in the paper showed that a Salmonella Typhimurium strain lacking either CheY, and thus incapable of any chemotaxis, or the Tsr chemoreceptor, and thus incapable of sensing serine or indole, was modestly inferior to the wild-type version of that strain in invading the cells of a swine colonic explant. It then showed that, in the presence of a human fecal homogenate, the wild-type strain had a much greater advantage in invading the colonic cells. Thus, the presence of the fecal homogenate significantly increased invasiveness in a way that depends on chemotaxis and the Tsr chemoreceptor.

      As human feces were determined to contain 882 micromolar indole and 338 micromolar serine, the effects of those concentrations of either indole or serine alone or in combination were tested. The somewhat surprising finding was that neither indole nor serine alone nor in combination changed the result from the experiment done with just buffer in the colonic explant.

      The clear conclusion of this initial study is that both chemotaxis in general and chemotaxis mediated by Tsr improve the invasiveness of S. Typhimurium. They provide a much bigger advantage in the presence of human feces. However, two molecules present in the feces that are sensed by Tsr, serine, and indole, seem to have no effect on invasiveness either alone or in combination.

      At this point, the parsimonious interpretation is that there is something else in human feces that is responsible for the increased invasiveness, and the authors acknowledge this possibility. However, they do not take what appears to be the obvious approach: to look for additional factors in human feces that might be responsible, either by themselves or in combination with indole and/or serine, for the increased invasiveness. Instead, they carry out a detailed examination of the counteracting effects of indole as a repellent and of serine as an attractant as a function of their relative concentrations and their spatial distributions.

      Thank you for your comments. In our revised version, we have undertaken some additional studies of other fecal effectors that help better understand the relationship between L-Ser and indole, but also the roles of other chemoattractants (glucose, galactose, ribose, L-Asp) in mediating fecal attraction (Fig. 5). We agree with the reviewer and conclude that fecal attraction and the cell invasion phenotype mediated by fecal treatment are influenced by factors other than only Tsr, indole, and L-Ser. Our new data do show that L-Ser is sufficient to block both the invasion suppression effects of indole (negating the WT advantage) and also indole chemorepulsion, therefore making our detailed examination of the counteracting effects more relevant for understanding this system.

      What they find is what other studies have shown, primarily with S. Typhimurium's relative, the gamma-proteobacterium Escherichia coli.

      At high indole and low serine concentrations, the repulsion by indole wins out. At low indole and high serine concentrations, attraction by serine wins out. What is perhaps novel is what happens at an intermediate ratio of concentrations. Repulsion by indole dominates at short distances from the source, so there is a zone of clearing. At longer distances, attraction by serine dominates, so there is an accumulation of cells in a "halo" around the zone of clearing. Thus, assuming that serine and indole diffuse equally, the repulsive effect of indole dominates until its concentration falls below some critical level at which the concentration of serine is still high enough to exert an attractive effect.

      They go on to show, using ITC, that serine binds to the periplasmic ligand-binding domain (LBD) of Tsr, something that has been studied extensively with very similar E. coli Tsr.

      They also show that indole does not bind to the Tsr LBD, which also is known for E. coli Tsr.

      This would be newsworthy only if the results were different for S. Typhimurium than for E. coli. As it is, it is merely confirmatory of something that was already known about Tsr of enteric bacteria.

      An idea that the authors introduce, if I understand it correctly, is that a repellent response to something in feces, perhaps indole, drives S. Typhimurium chemotactically competent cells out of the colonic lumen and promotes invasion of the bacteria into the cells of the colonic lining. If the feces contain both an attractant and a repellent, bacteria might be attracted by the feces to the lining of the intestine and then enter the colonic cells to escape a repellent, perhaps indole. That is an interesting proposition.

      In summary, I think that the initial experimental approach is fine. I do not understand the failure to follow up on the effect of the fecal homogenates in promoting invasion by chemotactic bacteria possessing Tsr. It seems there must be something else in the homogenates that is sensed by Tsr. Other amino acids and related compounds are also sensed by Tsr. Perhaps it is energy or oxygen taxis, which is partially mediated by Tsr, as the authors acknowledge.

      Much of the work reported here is quasi-repetitive with work done with E. coli Tsr. Minimally, previous work on E. coli Tsr should be explained more thoroughly rather than dealt with only as a citation.

      Thank you for your comments.

      We would like to confirm our agreement that E. coli and S. enterica indeed possess similarities. They are Gammaproteobacteria and inhabit/infect the gut. But also we note they diverged evolutionarily during the Jurassic period (ca. 140 million years ago, see: PMC94677). In the context of colonizing humans, the former is a pathobiont, indoleproducer, and a native member of the microbiome, whereas the latter is a frank pathogen and does not produce indole. Hence, there are many reasons to believe one is not an approximate of the other, especially when it comes to causing disease.

      We agree that much of what is known about indole taxis has come from excellent studies in well-behaved laboratory strains of E. coli, a powerful model. We believe that expanding this work to include clinically relevant pathogens is important for understanding its role in human disease. In this study, we contribute to that broader understanding by providing new mechanistic insights into Tsr-mediated indole taxis in S. Typhimurium, along with data demonstrating fecal attraction in other enteric pathogens and pathobionts. These findings help define a more general role for Tsr in enteric colonization and disease. While some of our results indeed confirm and extend prior findings, we respectfully believe that such confirmation in relevant pathogenic strains adds value to the field.

      Regarding our ITC studies, to our knowledge no other study has investigated, using ITC whether indole does or does not bind the LBD (which we show it does not), nor investigated whether it interferes with L-Ser sensing (which we show it does not). Hence, these are not duplicate findings, although we do acknowledge this leaves the mechanism of indolesensing undiscovered. If we are incorrect in this regard, please provide us a citation and we will be happy to include it and revise our comments.

      We now clarify in the text on lines 378-381: “While these leave the molecular mechanism of indole-sensing unresolved, it does eliminate two possibilities that have not, to our knowledge, been tested previously. Overall, our data add support to the hypothesis that a non-canonical sensing mechanism is employed by Tsr to respond to indole [8,18,69].”

      Lastly, as noted by the reviewer, and which we mention in the text, essentially all prior studies on indole taxis were conducted in E. coli, and this is not what is new and novel about the work we present, which is focused on S. Typhimurium and testing the prediction that fecal indole protects against pathogen invasion. We have added in a few additional points of comparisons between our results and prior studies. While we appreciate that much understanding has come from E. coli as a model for indole taxis, we feel discussing prior work in extensive detail would be more suitable for a review and would occlude our new findings about Salmonella, and other enterics.

      In an earlier version of the manuscript, we included more background on E. coli indole taxis. However, we found that the historical literature in this area was somewhat inconsistent, with different assays using varying time points and indole concentrations, often leading to results that were difficult to reconcile. Providing sufficient context to explain these discrepancies required considerable space and, ultimately, detracted from the focus of our current study. Hence, we have only brought in comparisons with E. coli where most relevant to the present work. Also, we provide new data that E. coli also exhibits fecal attraction, and so there is reason to believe the mechanisms we study here are also relevant to that system.

      Some minor points

      (1) Hyphens are not needed with constructs like "naturally occurring" or "commonly used".

      Thank you. Revisions made throughout.

      (2) The word "frank" as in "frank pathogen" seems odd. It seems "potent" would be better.

      Thank you for this comment. Per your recommendation, we have removed this term.

      The term ‘frank pathogen’ is standard usage in the field of bacterial pathogenesis in reference to a microbe that always causes disease in its host (in this case humans) and causes disease in otherwise healthy hosts (example: https://www.sciencedirect.com/science/article/pii/S1369527420300345). We actually used this specific term to distinguish an aspect of novelty of our study because E. coli can, sometimes, be a pathogen (i.e. a pathobiont) and of course E. coli indole taxis has been previously studied. Ours is the first study of indole taxis in a frank pathogen.

      (3) It is unnecessary to coin a new word, chemohalation, to describe a phenomenon that is a simple consequence of repulsion by higher concentrations of a repellent and attraction by lower concentrations of attractant to generate a halo pattern of cell distribution.

      Thank you for your opinion on this. We have softened our statements on this point, and in the newly revised version of the text less space is devoted to this idea. We now state in line 304-307:

      “There exists no consensus descriptor for taxis of this nature, and so we suggest expanding the lexicon with the term “chemohalation,” in reference to the halo formed by the cell population, and which is congruent with the commonly-used terms chemoattraction and chemorepulsion.”

      We appreciate the reviewer’s perspective and agree that the behavior we describe can be viewed as the result of competing attractant and repellent cues. However, we find that the traditional framework of “chemoattraction” and “chemorepulsion” is often insufficient to describe the spatial positioning behaviors we observe in our system. In our experience presenting and discussing this work, especially with audiences outside the chemotaxis field, it has been challenging to convey these dynamics clearly using only those two terms.

      For this reason, we introduced the term chemohalation to describe this more nuanced behavior, which appears to reflect a balance of signals rather than a simple unidirectional response. More bacteria enter the field of view, but they are clearly positioned differently than regular ‘chemoattraction.’ We also note that Reviewers 2 and 3 did not raise concerns about the term, and after careful consideration, we have opted to retain it in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Lines 143-156 seem somewhat overcomplicated and may be confusing. For example: in line 143: "However, when colonic tissue was treated with purified indole at the same concentration, the competitive advantage of WT over the chemotactic mutants was abolished compared to fecaltreated tissue...". But indole was tested alone, so it did not abolish the response; rather the absence of fecal material did.

      We appreciate your point. We have made revisions throughout to help improve the clarity of how we discuss the explant infection data and provide new visuals to help explain the experiment and data (Fig. 1C, Fig. S5).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 46 - Are references 9-11 really about topography?

      Thank you. You are correct. Revised and eliminated this statement.

      (2) Lines 87-89 - It seems to me that a bit more information on this would be helpful to the reader.

      In our revision of the text, to make it more centered on our primary findings of the differences between indole taxis when indole is the sole effector versus amidst other effectors, we have removed this section.

      (3) Line 112 - When mentioning the infection of the cecum and colon, authors should specify that this is in mice.

      Thank you for this comment. In our revised version we provide references both for animal model infections and work in human patients (ex: https://www.sciencedirect.com/science/article/abs/pii/S0140673676921000)

      We have revised our statement to be (Line 99-100: “Salmonella Typhimurium preferentially invades tissue of the distal ileum but also infects the cecum and colon in humans and animal models [42–46].”

      (4) Lines 122-123 - Authors state that "This experimental setup simulates a biological gradient in which the effector concentration is initially highest near the tissue and diffuses outward into the buffer solution.". Was this experimentally demonstrated? If not, authors should tone this down.

      We have removed this comment and instead present a conceptual diagram illustrating this idea (Fig. 1C). Also, addressed by above.

      (5) When looking at the results in Figure 1, I wonder what the results of this experiment would be if the authors tested an invasion mutant of Salmonella. In a strain that is able to perform chemotaxis (attraction and repulsion) but unable to actively invade, would there be a phenotype here? Is it possible that the fecal material affects cellular uptake of Salmonella, independently of active invasion? I don't think the authors necessarily need to perform this experiment, but I think it could be informative and this possibility should at least be discussed.

      Thank you for your comments and suggestions. We have included new data of an explant co-infection experiment with WT and an invasion-deficient mutant invA (Fig. S1). Under these conditions, WT exhibits an advantage in the gentamicin-treated homogenate, but not the untreated homogenate, suggestive of an advantage in cellular invasion.

      However, we did not repeat all experiments with this genetic background. We felt that would be outside the scope of this work, and would probably require dual chemotaxis/invA deletions to assess the impact of each, which also could be difficult to interpret. The hypothesis mentioned by the Reviewer is possible, but we were not able to devise a way to test this idea, as it seems we would need to deactivate all other mechanisms of Salmonella invasion.

      (6) Lines 137-140 - Because this is a competition experiment and results are plotted as CI, the reader can't readily assess the impact of human feces on invasion by WT Salmonella.

      Thank you for pointing this out. We want to mention that the data are plotted as CI in the main text, but the supplemental contains the disaggregated CFU data (Fig. S1-2) and the numerical values (Data S1).

      Please include the magnitude of induction in this sentence, compared to the buffer control.

      The text of this section has been changed to account for new data.

      Additionally, although unlikely, the presence of the chemotaxis mutants in the same infection may be a confounding factor. In order to irrefutably ascertain that feces induces invasion, I suggest authors perform this experiment with the wildtype strain (and mutant) alone in different conditions.

      Thank you for this suggestion, although after careful consideration we have decided not to repeat these explant studies with monoinfections. Coinfections are a common tool in Salmonella pathogenesis studies, including prior chemotaxis studies which our work builds upon (ex: https://pmc.ncbi.nlm.nih.gov/articles/PMC3630101/). The explant experiments, even controlling as many aspects as we did, still show lots of variability and one way to mitigate this is through competition experiments so that each strain experiences the same environment.

      We agree that a cost of this approach is that one strain may affect the other, or may alter the environment in a way that impacts the other. Thus, the resulting data must also be understood through this lens. We have revised the text to stay closer to the competitive advantage phenotype.

      (7) Line 150 - Authors state that bacterial loads are similar. However, authors should perform and report statistical analyses of these comparisons, at least in the supplementary data.

      We have removed this statement as requested. We do note, however, that the mean CFU values across treatments at identical time points appear qualitatively similar, which is an observation that does not require statistical testing.

      (8) Lines 154-154 - This seems incorrect, as the effect observed with the mixture of indole and serine is very similar to the addition of serine alone. Therefore, there was no "neutralization" of their individual effects.

      We have revised this statement.

      (9) Line 159-161 - I strongly suggest authors reword this sentence. I don't think this is the best way to describe these results. The stronger phenotype observed was with the fecal material. Therefore, it is the indole (alone) condition that does not "elicit a response". Focusing on indole too much here ignores everything else that is present in feces and also the fact that there was a drastic phenotype when feces were used.

      Thank you for your opinion on this. We believe this is one of the ways in which our earlier draft was unclear. It was actually a primary motivation of this work to test whether there were differences in pathogen infection, mediated by chemotaxis, in the presence of indole as a singular effector or in its near-native context in fecal material, and our revised text centers our study around this question. We believe this distinction is important for the reasons mentioned earlier.

      Relative to buffer treatment, indole changes the behavior of the system, eliminating the WT advantage, and this is the effect we refer to. We have made many revisions to the text of these sections and hope it better conveys this idea. We expect we may still have differences regarding the interpretation of these results, but regardless, thank you for your suggestions and we have tried to implement them to improve the clarity of the text.

      (10) Line 162 - Again, I disagree with this. Indole does not have an effect to be cancelled out by serine.

      Addressed above, and this text has been changed. Also, we provide new chemotaxis data that at fecal-relevant concentrations of indole and L-Ser, indole chemorepulsion is overridden (Fig. 5).

      (11) Lines 166-168 - Again, this is a skewed analysis. Indole and serine could not possibly provide an "additive effect" since they do not provide an effect alone. There is nothing to be added.

      This text has been deleted.

      (12) Lines 168-170 - Most of the citations provided to this sentence are inadequate. Our group has previously shown that the mammalian gut harbors thousands of small molecules (Antunes LC et al. Antimicrob Agents Chemother 2011). You obviously do not have to cite our work, but there is significant literature out there about the complexity of the gut metabolome.

      Thank you for this comment. We have revised this particular text, but do make mention of potential other effectors driving these effects, which was also requested by the other reviewers.

      Your work and others indeed support there being thousands of molecules in the gut, but our work centers on chemotaxis, and bacteria have a small number of chemoreceptors and only sense a very tiny fraction of these molecules as effectors. Since the impacts of infection of the explants depends on chemotaxis, we keep our comments restricted to those, but agree that there are likely many interactions involved, such as those impacting gene expression.

      Please note our more detailed description of the explant infection assay (and shown in Fig. 1C) that may change your view on the significance of non-chemotaxis effects. The bacteria only experience the effectors at low concentration, not the high concentration that is used to soak and prepare the tissue prior to infection.

      (13) Figure 2 - The letter 'B' from panel B is missing.

      Thank you very much for bringing this oversite to our attention. We have fixed this.

      (14) Legend of Figure 3 - Panel J is missing a proper description. Figure legends need improvement in general, to increase clarity.

      Thank you for noting this. This is now Fig. 6E. We have provided an additional description of what this panel shows. We have edited the legend text to read: “E. Shows a quantification of the relative number of cells in the field of view over time following treatment with 5 mM indole for a competition experiment with WT and tsr (representative image shown in F).”

      We also have made other edits to figure legends to improve their clarity and add additional experimental details and context. By breaking up larger figures into smaller figures, we also hope to have improved the clarity of our data presentation.

      (15) Lines 264-265 - Maybe I am missing something, but I do not see the ITC data for serine alone.

      We have clarified in the text that this was measured in our previous study https://elifesciences.org/articles/93178). The present study is a ‘Research Advance’ article format, and so builds on our prior observation.

      We have revised the text to read: “To address these possibilities, we performed ITC of 50 μM Tsr LBD with L-Ser in the presence of 500 μM indole and observed a robust exothermic binding curve and KD of 5 µM, identical to the binding of L-Ser alone, which we reported previously (Fig. 6H) [36].”

      (16) Lines 296-297 - What is the effect of these combinations of treatments on bacterial cells? I commend the authors for performing the careful growth assays, but I wonder if bacterial lysis could be a factor here. I am not doubting the effect of chemotaxis, but I am wondering if toxic effects could be a confounding factor. For instance, could it be that the "avoidance" close to the compound source and subsequent formation of a halo suggest bacterial death and lysis? I suggest the authors perform a very simple experiment, where bacteria are exposed to the compounds at various concentrations and combinations, and cells are observed over time to ensure that no bacterial lysis occurs.

      Thank you for mentioning this possibility. If we understand correctly, the Reviewer is asking if the chemohalation effect we report could be from the bacteria lysing near the source. Our data actually argue against this possibility through a few lines of evidence.

      First, if this were the case in experiments with the cheY mutant, we would also see an effect near the source. But actually, in experiments with either the cheY mutant or the tsr mutant, neither of which can sense indole, the bacteria just ignore the stimulus and show an even distribution (see current Fig. 6F).

      Second, our calculations suggest that in the chemotaxis assay (CIRA), the bacteria only experience rather low local concentration of indole, mostly I the nM concentration range, because as soon as the effector treatment is injected into the greater volume, it is immediately diluted. This means the local concentration is far below what we see inhibits growth of the cells in the long run and may not be toxic (Fig. 7, Fig. S3).

      Lastly, in the representative video presented we can observe individual cells approach and exit the treatment (Movie 11). Due to the above we have not performed additional experiments to test for lysis.

      (17) Lines 310-311 - Isn't this the opposite of the model you propose in Figure 5? The higher the concentration of indole in the lumen the more likely Salmonella is to swim away from it and towards the epithelium, favoring invasion, no?

      We appreciate the opportunity to clarify this point and apologize for any confusion caused. In response, we have revised the text to place less emphasis on chemohalation, and the specific statement and model in question have now been removed. Instead, we provide a summary of our explant data in light of the other analyses in the study (Fig. S5).

      What we meant here was in relation to the microscopic level, not whether or not a host/intestine is colonized. To put it another way, we think our data supports that the pathogen colonizes and infects the host regardless of indole presence, but it uses indole as a means to prioritize which tissues are optimal for colonization at the microscopic level. The prediction made by others was that bacteria swim away from indole source and therefor this could prevent or inhibit pathogen colonization of the intestines, which our data does not support.

      (18) Lines 325-326 - Maybe, but feces also contain several compounds with antibacterial activity, as well as other compounds that could elicit chemorepulsion. This should be stated and discussed.

      We have removed this statement since we did not explicitly test the growth of the bacteria with fecal treatments. We have refrained from speculating further in the text since we do not have direct knowledge of how that relationship with differing effectors could play out.

      We agree with the reviewer that the growth assays are reductionist and give insight only into the two effectors studied. We provide evidence from several different types of enterics that they all exhibit fecal attraction, and it seems unlikely the bacteria would be attracted to something deleterious, but we have not confirmed.

      (19) Lines 371-374 - How preserved (or not) is the mucus layer in this model? The presence of an inhibitory molecule in the lumen does not necessarily mean that it will protect against invasion. It is possible that by sensing indole in the lumen Salmonella preferentially swims towards the epithelium, thus resulting in enhanced evasion.

      The text in question has been removed. However, we acknowledge the reviewer’s point, and that these explant tissues do not fully model an in vivo intestinal environment. Other than a gentle washing with PBS to remove debris prior to the experiment the tissue is not otherwise manipulated, and feasibly the mucus layer is similar to its in vivo state.

      In mentioning this hypothesis about indole, which our data do not support, we were echoing a prediction from the field, proposed in the studies we cite. We agree with the reviewer that there were other potential outcomes of indole impacting chemotaxis and invasion, and indeed our data supports that.

      (20) Lines 394-395 - The authors need to remember that the ability to invade the intestinal epithelium is not only a product of chemoattraction and repulsion forces. Several compounds in the gut are used by Salmonella as cues to alter invasion gene expression. See PMID: 25073640, 28754707, 31847278, and many others.

      Thank for you for this point, and we now include these citations. We have revised the text in question, stating:

      “In addition to the factors we have investigated, it is already well-established in the literature that the vast metabolome in the gut contains a complex repertoire of chemicals that modulate Salmonella cellular invasion, virulence, growth, and pathogenicity [79–81].”

      Our intent is not to diminish the role of other intestinal chemicals but rather to put our new findings into the context of bacterial pathogenesis. We do provide evidence that specific chemoeffectors present in fecal material alter where bacteria localize through chemotaxis, which is one method of control over colonization.

      (21) Line 408 - I think it could be hard to observe this using your experimental approach.

      Because you need to observe individual cells, the number of cells you observe is relatively small. If, in a bet-hedging strategy, the proportion of cells that were chemoattracted to indole was relatively low you likely would not be able to distinguish it from an occasional distribution close to the repellent source. You may or may not want to discuss this.

      Thank you for this observation. It is indeed challenging to both observe large scale population behaviors and also the behaviors of individual cells in the same experiment. Our ability to make this distinction is similar to the approach used in the study we cite, so that is our comparison.

      But, if there was a subpopulation that was attracted we would predict a ‘bull’s-eye’ population structure, with some cells attracted and other avoiding the source, which we do not see - we see the halo. So, we find no evidence of the bet-hedging response seen in a different study using E. coli and using different time scales than we have.

      (22) Lines 410-411 - What could the other attractants be? Would it be possible/desirable to speculate on this?

      We have changed the text here, but we present new data that examines some of these other attractants (Fig. 5).

      (23) Line 431 - What exactly do you mean by "running phenotype"? Please, provide a brief explanation.

      We have removed this text, but a running phenotype means the swimming bacteria rarely make direction changes (i.e. tumbles), which has been associated with promoting contact with the epithelium, described in the references we cite. Hence, this type of swimming behavior could contribute to the effects we observe in the explant studies, potentially explaining some of the Tsr-mediated advantage that was not dependent on L-Ser/indole.

      (24) Line 441 - Other work has shown that feces contain inhibitors of invasion gene expression. The authors should integrate this knowledge into their model. In fact, indole has been shown to repress host cell invasion by Salmonella, so it is important that authors understand and discuss the fact that the impact of indole is multifaceted and not only a reflection of its action as a chemorepellent. PMID: 29342189, 22632036.

      We agree with the reviewer about this point, and mention this in the text (lines 55-57): “Indole is amphipathic and can transit bacterial membranes to regulate biofilm formation and motility, suppress virulence programs, and exert bacteriostatic and bactericidal effects at high concentrations [16–18,20–22].”

      We have added in the references suggested.

      What we test here is the specific hypothesis made by others in the field about indole chemorepulsion serving to dissuade pathogens from colonizing.

      For instance, the statement from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190613

      “Since indole is also a chemorepellent for EHEC [23], it is intriguing to speculate that in addition to attenuating Salmonella virulence, indole also attenuates the recruitment and directed migration of Salmonella to its infection niche in the GI tract.”

      And from: https://doi.org/10.1073/pnas.1916974117

      “We propose that indole spatially segregates cells based on their state of adaptation to repel invaders while recruiting beneficial resident bacteria to growing microbial communities within the GI tract.”

      And

      “Thus, foreign ingested bacteria, including invading pathogens such as E. coli O157:H7 and S. enterica, are likely to be prevented by indole from gaining a foothold in the mucosa.”

      As shown by others, indole certainly does have many roles in controlling pathogenesis, and there are other chemicals we do not investigate that control invasion and bacterial growth, but we keep our statements here restricted to chemotaxis since that is what are experiments and data show.

      (25) Line 472 - "until fully motile". How long did this take, how variable was it, and how was it determined?

      Thank you for asking for this clarification. We have added that the time was between 1-2 h, and confirmed visually. Our methods are similar to those described in earlier chemotaxis studies (ex: 10.1128/jb.182.15.4337-4342.2000).

      (26) Line 487 - I worry that the fact fecal samples were obtained commercially means that compound stability/degradation may be a factor to consider here. How long had the sample been in storage? Is this information available?

      Thank you for this question. We agree that the fecal sample we used serves as a model system and we cannot rule out that handling by the supplier could potentially alter its contents in some way that would impact bacterial chemosensing. However, we note that the measurements of L-Ser and indole we obtained are in the appropriate range for what other studies have shown.

      The fecal sample used for all work in the study were from a single healthy human donor, obtained from Lee Biosolutions (https://www.leebio.com/product/395/fecal-stool-samplehuman-donor-991-18). The supplier did not state the explicit date of collection, nor indicated any specific handline or storage methods that would obviously degrade its native metabolites, but we cannot rule that out. In our hands, the fecal sample was collected and kept frozen at -20 C. For research purposes, portions were extracted and thawed as needed, maintaining the frozen state of the original sample to limit degradation from freeze-thaws.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewing Editor Comments:

      The resubmitted version of the manuscript adequately addressed several initial comments made by reviewing editors, including a more detailed analysis of the results (such as those of bilayer thickness). This version was seen by 2 reviewers. Both reviewers recognize this work as being an important contribution to the field of BK and voltage-dependent ion channels in general. The long trajectories and the rigorous/novel analyses have revealed important insights into the mechanisms of voltage-sensing and electromechanical coupling in the context of a truncated variant of the BK channel. Many of these observations are consistent with structural and functional measurements of the channel, available thus far. The authors also identify a novel partially expanded state of the channel pore that is accessed after gating-charge displacement, which informs the sequence of structural events accompanying voltage-dependent opening of BK.

      However, there are key concerns regarding the use of the truncated channel in the simulations. While many gating features of BK are preserved in the truncated variant, studies have suggested that opening of the channel pore to voltage-sensing domain rearrangement is impaired upon gating-ring deletion. So the inferences made here might only represent a partial view of the mechanism of electromechanical coupling.

      It is also not entirely clear whether the partially expanded pore represents a functionally open, sub-conductance, or another closed state. Although the authors provide evidence that the inner pore is hydrated in this partially open state, in the absence of additional structural/functional restraints, a confident assignment of a functional state to this structure state is difficult. Functional measurements of the truncated channel seem to suggest that not only is their single channel conductance lower than full-length channels, but they also appear to have a voltage-independent step that causes the gates to open. It is unclear whether it is this voltage-independent step that remains to be captured in these MD trajectories. A clean cut resolution of this conundrum might not be feasible at this time, but it could help present the various possibilities to the readers.

      We appreciate the positive comments and agree that there will likely be important differences between the mechanistic details of voltage activation between the Core-MT and full-length constructs of BK channels. We also agree that the dilated pore observed in the simulation may not be the fully open state of Core-MT.

      Nonetheless, the notion that the simulation may not have captured the full pore opening transition or the contribution of the CTD should not render the current work “incomplete”, because a complete understanding of BK activation would be an unrealistic goal beyond the scope of this work. We respectfully emphasize that the main insights of the current simulations are the mechanisms of voltage sensing (e.g., the nature of VSD movements, contributions of various charged residues, how small charge movements allow voltage sensing, etc.) as well as the role of the S4-S5-S6 interface in VSD-pore coupling. As noted by the Editor and reviewers, these insights represent important steps towards establishing a more complete understanding of BK activation.

      Below are the specific comments of the two experts who have assessed the work and made specific suggestions to improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Although the successful simulation of V-dependent K+ conduction through the BK channel pore and analysis of associated state dependent VSD/pore interactions and coupling analysis is significant, there are two related questions that are relevant to the conclusions and of interest to the BK channel community which I think should be addressed or discussed.

      One key feature of BK channels is their extraordinarily large conductance compared to other K+ selective channels. Do the simulations of K+ conductance provide any insight into this difference? Is the predicted conductance of BK larger than that of other K+ channels studied by similar methods? Is there any difference in the conductance mechanism (e.g., the hard and soft knock-on effects mentioned for BK)?

      The molecular basis of the large conductance of BK channels is indeed an interesting and fundamental question. Unfortunately, this is beyond the scope of this work and the current simulation does not appear to provide any insight into the basis of large conductance. It is interesting to note, though, the conductance is apparently related to the level of pore dilation and the pore hydration level, as increasing hydration level from ~30 to ~40 waters in the pore increases the simulated conductance from ~1.5 to 6 pS (page 8). This is consistent with previous atomistic simulations (Gu and de Groot, Nature Communications 2023; ref. 33) showing that the pore hydration level is strongly correlated with observed conductance. As noted in the manuscript, the conductance mechanism through the filter appears highly similar to previous simulations of other K+ channels (Page 8). Given the limit conductance events observed in the current simulations, we will refrain from discussing possible basis of the large conductance in BK channels except commenting on the role of pore hydration (page 8; also see below in response to #5).

      The pore in the MD simulations does not open as wide as the Ca-bound open structure, which (as the authors note) may mean that full opening requires longer than 10 us. I think that is highly likely given that the two 750 mV simulations yielded different degrees of opening and that in BK channels opening is generally much slower than charge movement. Therefore, a question is - do any of the conclusions illustrated in Figures 6, S5, S6 differ if the Ca-bound structure is used as the open state? For example, I expect the interactions between S5 and S6 might at least change to some extent as S6 moves to its final position. In this case, would conclusions about which residues interact, and get stronger or weaker, be the same as in Figures S6 b,c? Providing a comparison may help indicate to what extent the conclusions are dependent on achieving a fully open conformation.

      We appreciate the reviewer’s suggestion and have further analyzed the information flow and coupling pathways using the simulation trajectory initiated from the Ca<sup>2+</sup>-bound cryo-EM structure (sim 7, Table S1). The new results are shown in two new SI Figures S7 and S8, and new discussion has been added to pages 14-15. Comparing Figures 5 and S7, we find that dynamic community, coupling pathways, and information flow are highly similar between simulation of the open and closed states, even though there are significant differences in S5 contacts in the simulated open state vs Ca<sup>2+</sup>-bound open state (Figure S8). Interestingly, there are significant differences in S4-S5 packing in the simulated and Ca<sup>2+</sup>-bound open states (Figure S8 top panel), which likely reflect important difference in VSD/pore interactions during voltage vs Ca<sup>2+</sup> activation.

      (2) P4 Significance -"first, successful direct simulation of voltage-activation"

      This statement may need rewording. As noted above Carrasquel-Ursulaez et al.,2022 (reference 39) simulated voltage sensor activation under comparable conditions to the current manuscript (3.9 us simulation at +400 mV), and made some similar conclusions regarding R210, R213 movement, and electric field focusing within the VSD. However, they did not report what happens to the pore or simulate K+ movement. So do the authors here mean something like "first, successful direct simulation of voltage-dependent channel opening"?

      We agree with the reviewer and have revised the statement to “ … the first successful direct simulation of voltage-dependent activation of the big potassium (BK) channel, ..”

      (3) P5 "We compare the membrane thickness at 300 and 750 mV and the results reveal no significant difference in the membrane thickness (Figure S2)"

      The figure also shows membrane thickness at 0 mV and indicates it is 1.4 Angstroms less than that at 300 or 750 mV. Whether or not this difference is significant should be stated, as the question being addressed is whether the structure is perturbed owing to the use of non-physiological voltages (which would include both 300 and 750 mV).

      We have revised the Figure S2 caption to clarify that one-way ANOVA suggest the difference is not significant.

      (4) P7 "It should be noted that the full-length BK channel in the Ca2+ bound state has an even larger intracellular opening (Figure 2f, green trace), suggesting that additional dilation of the pore may

      occur at longer timescales."

      As noted above, I agree it is likely that additional pore dilation may occur at longer timescales. However, for completeness, I suppose an alternative hypothesis should be noted, e.g. "...suggesting that additional dilation of the pore may occur at longer timescales, or in response to Ca-binding to the full length channel."

      This is a great suggestion. Revised as suggested.

      (5) Since the authors raise the possibility that they are simulating a subconductance state, some more discussion on this point would be helpful, especially in relation to the hydrophobic gate concept. Although the Magleby group concluded that the cytoplasmic mouth of the (fully open) pore has little impact on single channel conductance, that doesn't rule out that it becomes limiting in a partially open conformation. The simulation in Figure 3A shows an initial hydration of the pore with ~15 waters with little conductance events, suggesting that hydration per se may not suffice to define a fully open state. Indeed, the authors indicate that the simulated open state (w/ ~30-40 waters) has 1/4th the simulated conductance of the open structure (w/ ~60 waters). So is it the degree of hydration that limits conductance? Or is there a threshold of hydration that permits conductance and then other factors that limit conductance until the pore widens further? Addressing these issues might also be relevant to understanding the extraordinarily large conductance of fully open BK compared to other K channels.

      We agree with the reviewer’s proposal that pore hydration seems to be a major factor that can affect conductance. This is also well in-line with the previous computational study by Gu and de Groot (2023). We have now added a brief discussion on page 8, stating “Besides the limitation of the current fixed charge force fields in quantitively predicting channel conductance, we note that the molecular basis for the large conductance of BK channels is actually poorly understood (78). It is noteworthy that the pore hydration level appears to be an important factor in determining the apparent conductance in the simulation, which has also been proposed in a previous atomistic simulation study of the Aplysia BK channel (33).”

      Minor points

      (1) P5 "the fully relaxed pore profile (red trace in Figure S1d, top row) shows substantial differences compared to that of the Ca2+-free Cryo-EM structure of the full-length channel."

      For clarity, I suggest indicating which is the Ca-free profile - "... Ca2+-free Cryo-EM structure of the full-length channel (black trace)."

      We greatly appreciate the thoughtful suggestion. Revised as suggested.

      (2) P8 "Consistent with previous simulations (78-80), the conductance follows a multi-ion mechanism, where there are at least two K+ ions inside the filter"

      For clarity, I suggest indicating these are not previous simulations of BK channels (e.g., "previous simulations of other K+ channels ...").

      Author response: Revised as suggested. Thank you.

      (3) Figure 2, S1 - grey traces representing individual subunits are very difficult to see (especially if printed). I wonder if they should be made slightly darker. Similar traces in Figure 3 are easier to see.

      The traces in Figure S1 are actually the same thickness in Figure 3 and they appear lighter due to the size of the figure. Figure 2 panels a-c have been updated to improve the resolution.

      (4) Figure 2 - suggest labeling S6 as "S6 313-324" (similar to S4 notation) to indicate it is not the entire segment.

      Figure 2 panel d) has been updated as suggested.

      (5) Figure 2 legend - "Voltage activation of Core-MT BK channels. a-d)..."

      It would be easier to find details corresponding to individual panels if they were referenced individually. For example:

      "a-d) results from a 10-μs simulation under 750 mV (sim2b in Table S1). Each data point represents the average of four subunits for a given snapshot (thin grey lines), and the colored thick lines plot the running average. a) z-displacement of key side chain charged groups from initial positions. The locations of charged groups were taken as those of guanidinium CZ atoms (for Arg) and sidechain carboxyl carbons (for Asp/Glu) b) z-displacement of centers-of-mass of VSD helices from initial positions, c) backbone RMSD of the pore-lining S6 (F307-L325) to the open state, and d) tilt angles of all TM helices. Only residues 313-324 of S6 were included inthe tilt angle calculation, and the values in the open and closed Cryo-EM structures are marked using purple dashed lines. "

      We appreciate the thoughtful suggestion and have revised the caption as suggested.

      (6) Figure S1 - column labels a,b,c, and d should be referenced in the legend.

      The references to column labels have been added to Figure S1 caption.

      (7) References need to be double-checked for duplicates and formatting.

      a) I noticed several duplicate references, but did not do a complete search: Budelli et al 2013 (#68, 100), Horrigan Aldrich 2002 (#22,97), Sun Horrigan 2022 (#40, 86), Jensen et al 2012 (#56,81).

      b) Reference #38 is incorrectly cited with the first name spelled out and the last name abbreviated.

      We appreciate the careful proofreading of the reviewer. The duplicated references were introduced by mistake due to the use of multiple reference libraries. We have gone through the manuscript and removed a total of 5 duplicated references.

      Response to additional reviewer comments

      My only new comment is that the numbering of residues in Fig. S8 does not match the standard convention for hSlo and needs to be doublechecked. For the residues I checked, the numbers appear to be shifted 3 compared hSlo (e.g. Y315, P317, E318, G324 should be Y318, P320, E321, G327).

      We greatly appreciate the reviewer for catching the errors in residue labels. Figure S8 has now been updated to include correct residue labels. Thanks!

      Reviewer #2 (Recommendations for the authors):

      This manuscript has been through a previous level of review. The authors have provided their responses to the previous reviewers, which appear to be satisfactory, and I have no additional comments, beyond the caveats concerning interpretations based on the truncated channel, which are noted above.

      We greatly appreciate the constructive comments and insightful advice. Please see above response to the Reviewing Editor’s comments for response and changes regarding the caveats concerning interpretations of the current simulations.

    1. Background Characterising genetic and epigenetic diversity is crucial for assessing the adaptive potential of populations and species. Slow-reproducing and already threatened species, including endangered sea turtles, are particularly at risk. Those species with temperature-dependent sex determination (TSD) have heightened climate vulnerability, with sea turtle populations facing feminisation and extinction under future climate change. High- quality genomic and epigenomic resources will therefore support conservation efforts for these flagship species with such plastic traits.Findings We generated a chromosome-level genome assembly for the loggerhead sea turtle (Caretta caretta) from the globally important Cabo Verde rookery. Using Oxford Nanopore Technology (ONT) and Illumina reads followed by homology-guided scaffolding, we achieved a contiguous (N50: 129.7 Mbp) and complete (BUSCO: 97.1%) assembly, with 98.9% of the genome scaffolded into 28 chromosomes and 29,883 annotated genes. We then extracted the ONT-derived methylome and validated it via whole genome bisulfite sequencing of ten loggerheads from the same population. Applying our novel resources, we reconstructed population size fluctuations and matched them with major climatic events and niche availability. We identified microchromosomes as key regions for monitoring genetic diversity and epigenetic flexibility. Isolating 191 TSD-linked genes, we further built the largest network of functional associations and methylation patterns for sea turtles to date.Conclusions We present a high-quality loggerhead sea turtle genome and methylome from the globally significant East Atlantic population. By leveraging ONT sequencing to create genomic and epigenomic resources simultaneously, we showcase this dual strategy for driving conservation insights into endangered sea turtles.

      This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf054), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer: Victor Quesada

      This work offers a in improved version of the reference genome for the loggerhead sea turtle. The authors have also analyzed the methylation patterns of blood obtained from different individuals and with two methods. The resulting data set includes gene annotations, methylation levels and the specific analysis of methylation levels of genes involved in temperature-dependent sex determination (TSD). While the improvements offered by this work seem modest, I think that the data sets may provide important resources for future works.-In my opinion, the use of a previous version of the same genome in the assembly process should be noted in the abstract. It would be enough to write "... followed by homolgy-guided scaffolding to GSC_CCare_1.0...".-If possible, the authors should clarify the taxonomic relationship between the reference individual in this work and the reference individual for the previous version of the genome (ref. 26). Is it the same NCBI taxid?-There is a mention to "lateral terminal repeats" at the "Genome annotation" section (page 7). I think it is a typo and it should read "long terminal repeats".-In the same section, at page 9, reference 73 refers to StringTie, not gffread. In addition, it is not clear how "in-frame stop codons were removed". A simple way to unambiguously explain this would be to provide the options that were used, as with other programs.-I would revise the use of "coverage" versus "depth". For instance, the expression "...a coverage of 9.2(...)X" would be more precise as "...a sequencing depth of 9.2(...)X". Coverage should be a fraction or a percentage. However, this is only a piece of advice, as there is no strong consensus at the moment.-The interpretation of methylation patterns is always difficult. In my opinion, the manuscript should discuss several limitations about the results:First, using blood as the starting tissue is convenient but not ideal, as many methylation patterns are tissue-specific. The authors may want to add a reference to preliminary evidence that some methylation changes in blood cells are related to TSD (Bock et al., Mol Ecol. 2022; 31:5487-5505).Second, the work examines broad patterns of methylation (all promoters, all coding sequences,...). While this may be interesting for descriptive purposes, it may also drown significant signals. The manuscript should mention this limitation.*Figure 2B shows methylation per gene. If the aim is to compare both kinds of sequencing, there should be at least one comparison of methylation per CpG, which might even be cathegorial or downsampled.-The origin of the duplication of EP300 seems outside the scope of the manuscript. Nevertheless, given that the question is posed, the authors may want to perform a simple phylogenetic analysis of the sequences. Even the basic analysis of the annotated copies plus an outgroup is likely to give a robust answer to this question.-For the benefit of non-specialists, the manuscript might include a brief mention of how microchromosomes allow a larger number of combinations of variants without chromosome recombination.-Some expressions may be edited for clarity and precission. Examples are "which should be verified whether they are true" (page 17) and "microchromosomes have greater methylation potential and realised levels...".

    1. Background Variant Call Format (VCF) is the standard file format for interchanging genetic variation data and associated quality control metrics. The usual row-wise encoding of the VCF data model (either as text or packed binary) emphasises efficient retrieval of all data for a given variant, but accessing data on a field or sample basis is inefficient. Biobank scale datasets currently available consist of hundreds of thousands of whole genomes and hundreds of terabytes of compressed VCF. Row-wise data storage is fundamentally unsuitable and a more scalable approach is needed.Results Zarr is a format for storing multi-dimensional data that is widely used across the sciences, and is ideally suited to massively parallel processing. We present the VCF Zarr specification, an encoding of the VCF data model using Zarr, along with fundamental software infrastructure for efficient and reliable conversion at scale. We show how this format is far more efficient than standard VCF based approaches, and competitive with specialised methods for storing genotype data in terms of compression ratios and single-threaded calculation performance. We present case studies on subsets of three large human datasets (Genomics England: n=78,195; Our Future Health: n=651,050; All of Us: n=245,394) along with whole genome datasets for Norway Spruce (n=1,063) and SARS-CoV-2 (n=4,484,157). We demonstrate the potential for VCF Zarr to enable a new generation of high-performance and cost-effective applications via illustrative examples using cloud computing and GPUs.Conclusions Large row-encoded VCF files are a major bottleneck for current research, and storing and processing these files incurs a substantial cost. The VCF Zarr specification, building on widely-used, open-source technologies has the potential to greatly reduce these costs, and may enable a diverse ecosystem of next-generation tools for analysing genetic variation data directly from cloud-based object stores, while maintaining compatibility with existing file-oriented workflows.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf049), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer: Nezar Abdennur

      The authors present VCF Zarr, a specification that translates the variant call format (VCF) data model into an array-based representation for the Zarr storage format. They also present the vcf2zarr utility to convert large VCFs to Zarr. They provide data compression and analysis benchmarks comparing VCF Zarr to existing variant storage technologies using simulated genotype data. They also present a case study on real world Genomics England aggV2 data.The authors' benchmarks overall show that VCF Zarr has superior compression and computational analysis performance at scale relative to data stored as roworiented VCF and that VCF Zarr is competitive with specialized storage solutions that require similarly specialized tools and access libraries for querying. An attractive feature is that VCF Zarr allows for variant annotation workflows that do not require full dataset copy and conversion. Another key point is that Zarr is a high-level spec and data model for the chunked storage of n-d arrays, rather than a bytelevel encoding designed specifically around the genomic variant data type. I personally have used Zarr productively for several applications unrelated to statistical genetics. While Zarr VCF mildly underperforms some of the specialized formats (Savvy in compute, Genozip in compression) in a few instances, I believe the accessibility, interoperability, and reusability gains of Zarr make the small tradeoff well worthwhile.Because Zarr has seen heavy adoption in other scientific communities like the geospatial and Earth sciences, and is well integrated in the scientific Python stack, I think it holds potential for greater reusability across the ecosystem. As such, I think the VCF Zarr spec is a highly valuable if not overdue contribution to an entrenched field that has recently been confronted by a scalability wall.Overall, the paper is clear, comprehensive, and well written. Some high-level comments: The benefits for large scientific datasets to be analysis-ready cloud-optimized (ARCO) have been well articulated by Abernathey et al., 2021. However, I do think that the "local"/HPC single-file use case is still important and won't disappear any time soon, and for some file system use cases, expansive and deep hierarchies can be performance limiting (this was hinted at in one of the benchmarks). In this scenario would a large Zarr VCF perform reasonably well (or even better on some file systems) via a single local zip store? The description of the intermediate columnar format (ICF) used by vcf2zarr is missing some detail. At first I got the impression it might be based on something like Parquet, but running the provided code showed that it consists of a similar file-based chunk layout to Zarr. This should be clarified in the manuscript. The authors discuss the possibility of storing an index mapping genomic coordinates to chunk indexes. Have Zarr-based formats in other fields like geospatial introduced their own indexing approaches to take inspiration from? Since VCF Zarr is still a draft proposal, it could be useful to indicate where community discussions are happening and how potential new contributors can get involved, if possible. This doesn't need to be in the paper per se, but perhaps documented in the spec repo.Minor comments: In the background: "For the representation to be FAIR, it must also be accessible," -- A is for "accessible", so "also" doesn't make sense. "There is currently no efficient, FAIR representation...". Just a nit and feel free to ignore, but the solution you present is technically "current".* In Figure 2, the zarr line is occluded by the sav line and hard to see.

    1. Author response:

      Reviewer #1 (Public review):

      The usefulness of the proposed new metric of "variant consistency" and how it can guide users in selecting demultiplexing methods seems a little unclear. It correlates with the level of ambient RNA/DNA contamination, which makes it look like a metric on data quality. However, it does depend on the exact demultiplexing method, yet it's not clear how it directly connects to the "accuracy" of each demultiplexing method, which is the most important property that users of these methods care about. Since the simulated data has ground truth of donor identities available, I would suggest using the simulated data to show whether "variant consistency" directly indicates the accuracy of each method, especially the accuracy within those "C2" reads.

      I also think the tool and analyses presented in this paper need some further clarification and documentation on the details, such as how the cell-type gene and peak probabilities are determined in the simulation, and how doublets from different cell types are handled in the simulation and analysis. A few analyses and figures also need a more detailed description of the exact methods used. 

      We thank the reviewer for their suggestions. We plan on revising the manuscript to reflect their suggestions, which will include clarification of the variant consistency metric and its relationship with demultiplexing accuracy based on the simulations and additional detail regarding ambisim’s generation of multiplexed snRNA/snATAC.

      Reviewer #2 (Public review):

      (1) Throughout the manuscript, the figure legends are difficult to understand, and this makes it difficult to interpret the graphs.

      (2) Since this is both a new tool and a benchmark, it would be worthwhile in the Discussion to comment on which demultiplexing tools one may want to choose for their dataset, especially given the warning against ensemble methods. From this extensive benchmarking, one may want to choose a tool based on the number of donors one has pooled, the modalities present, and perhaps even the ambient RNA (if it has been estimated previously).

      (3) What are the minimal computational requirements for running ambisim? What is the time cost? 

      We thank the reviewer for their suggestions. We plan on updating the manuscript to better clarify figure legends. We will also outline a set of concrete recommendations in our discussion section based on different multiplexed experimental designs. Finally, we will also include extra computational benchmarks for ambisim.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2:

      Minor reviews:

      The caveats are (1) the particular point will perhaps only be interesting to a small slice of the eQTL research community; (2) the authors provide no statistical controls/error estimate or independent validation of the variance partitioning analysis in Figure 3, and (3) the authors don't seem to use the single-cell growth/fitness estimates for anything else, as Figure 4 uses loci mapped to growth from a previously published, standard culture-by-culture approach. It would be appropriate for the manuscript to mention these caveats.

      We have added two small mention of these caveats – mainly that the study may not generalize, and that the study does not attempt to try the variance partitioning on other traits or other system where the values of the partitions are better established.

      I also think it is not appropriate for the manuscript to avoid a comparison between the current work and Boocock et al., which reports single-cell eQTL mapping in the same yeast system. I recommend a citation and statement of the similarities and differences between the papers.

      We have added this reference and a clear statement of similarities between the two studies. It was not our intention to avoid this; we had simply not seen that study in the initial submission.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, Reproducibility, and Clarity)

      Reviewer comment: This is a very well conceived study of responses to plasma membrane stresses in yeast that signal through the conserved TORC2 complex. Physical stress through small molecular intercalators in the plasma membrane is shown to be independent of their biochemistry and then studies for its effect on plasma membrane morphology and the distribution of free ergosterol (the yeast equivalent of cholesterol), with free being the pool of cholesterol that is available to probes and/or sterol transfer proteins. Experiments nicely demonstrate a negative feedback loop consisting of: stress -> increased free sterol and TORC2 inhibition -> activation of LAM proteins (as demonstrated by Relents and co-workers previously) -> removal of free sterol -> return to unstressed state of PM and TORC2.

      Author response: We thank the reviewer for their positive and encouraging feedback. We are pleased to submit our revised manuscript and have addressed all points raised below.

      Comment: Fig 2A: Is detection of PIP/PIP2/PS linear for target, or possibly just showing availability that is increased due to local positive curvature?

      Response: This is an excellent and fundamental question. While FLARE signal likely reflects lipid availability, its detection is indeed influenced by factors such as membrane curvature and lipid composition, due to varying insertion depths of the lipid-binding domains. For example, studies using NMR suggest that the PLCδ PH domain partially inserts into membranes, potentially conferring curvature sensitivity (Flesch et al., 2005; Uekama et al., 2009). Similarly, curvature influences lactadherin binding, though it's unclear if this extends to its isolated C2 domain (Otzen et al., 2012; Shao et al., 2008; Shi et al., 2004). We could not find direct evidence for curvature sensitivity of P4C(SidC), but assume some influence exists.

      To avoid overinterpreting these limitations, we now describe our data based solely on the FLAREs used, rather than inferring enrichment of specific lipid species. We refer to these PM structures as "PI(4,5)P₂-containing", consistent with prior literature (Riggi et al., 2018) and have revised our manuscript accordingly.

      Comment: Can any marker be identified for the D4H spots at 2 minutes? In particular, are they early endosomes (shown by brief pre-incubation with FM4-64)?

      Response: We appreciate the reviewer's suggestion and have now added new data (Fig. S2E-H). We tested colocalization of D4H spots with FM4-64 (early endosomes), GFP-VPS21 (early endosome marker), and LipidSpot{trade mark, serif} 488 (lipid droplets), but found no overlap. This later observation was not unexpected given that D4H does not recognize Sterol esters. D4H foci also did not overlap with ER (dsRED-HDEL), though they were frequently adjacent to it. While their exact identity remains unknown, we agree this is an intriguing direction for future investigation.

      Comment: Is there any functional (& direct) link between Arp inhibition (as in the Pombe study of LAMs by the lab of Sophie Martin) and PM disturbance by amphipathic molecules?

      Response: We have explored this connection and now present new data (see final paragraph of Results). Briefly, we show that CK-666 induces internalization of PM sterols in a Lam2/4-dependent manner, and that TORC2 activity is more strongly reduced in lam2Δ lam4Δ cells compared to WT. These findings support the idea that, like PalmC, Arp2/3 inhibition triggers a PM stress that is counteracted by sterol internalization.

      Minor Comment: Fig 2A: Labels not clear. Say for each part what FP is used for pip2.

      Response: As noted above, we revised image labels to clarify which FLAREs were used, and refer to data accordingly throughout.

      Minor Comment: Move fig s2d to main ms. The 1 min and 2 min data are integral to the story.

      Response: We agree and have incorporated the 1-min and 2-min data into the main figures. Vehicle-treated controls were moved to Fig. S2.

      Minor Comment: The role of Lam2 and Lam4 in retrograde sterol transport has in vivo only been linked to one of their two StART domains not both, as mentioned in the text.

      Response: Thank you for pointing this out. We have corrected the text to:

      "[...]Lam2 and Lam4[...] contain two START domains, of which at least one has been demonstrated to facilitate sterol transport between membranes (Gatta et al., 2015; Jentsch et al., 2018; Tong et al., 2018)."

      Minor Comment: Throughout, images of tagged D4H should be labelled as such, not as "Ergosterol".

      Response: We have updated all relevant figure labels and text to refer to "D4H" rather than "Ergosterol", in line with this recommendation.

      Reviewer #1 (Significance):

      These results in budding yeast are likely to be directly applicable to a wide range of eukaryotic cells, if not all of them. I expect this paper to be a significant guide of research in this area. The paper specifically points out that the current experiments do not distinguish the precise causation among the two outcomes of stress: increased free sterol and TORC2 inhibition. Of these two outcomes which causes which is not yet known. If data were added that shed light on this causation that would make this work much more signifiant, but I can understand 100% that this extra step lies beyond - for a later study for which the current one forms the bedrock.

      Response:

      We thank the reviewer for their generous assessment. We agree that understanding the causality between increased free sterol and TORC2 inhibition is a critical next step.

      Based on our current data, we believe the increase in free ergosterol precedes TORC2 inhibition. For example, TORC2 inhibition alone (e.g., via pharmacological means) does not initially increase free sterol, while it does enhance Lam2/4 activity, promoting sterol internalization (Fig. 3A). Baseline TORC2 activity also inversely correlates with free PM sterol levels in lam2Δ lam4Δ versus LAM2T518A LAM4S401A cells (Figs. 2D, S2C).

      Additionally, during sterol depletion, we observe an initial increase in TORC2 activity before growth inhibition occurs, after which activity declines-likely due to compromised PM integrity (Fig. S2M). We now also show that adaptation to several other stresses (e.g., osmotic shock, heat shock, CK-666) partially depends on sterol internalization, which correlates with TORC2 activation (Fig. 4, S4B).

      While these findings strengthen the model that PM stress perturbs sterol availability and secondarily impacts TORC2, we cannot yet definitively demonstrate causality. As suggested by Reviewer 3, we tested cholesterol-producing yeast (Souza et al., 2011), but found their response to PalmC indistinguishable from WT, making it difficult to draw mechanistic conclusions (Rebuttal Fig. 2).

      Taken together, we favour a model where sterols affect PM properties sensed by TORC2, probably lipid-packing, rather than acting as direct effectors. We hope our revised manuscript more clearly conveys this model and serves as a strong foundation for future mechanistic studies.

      Reviewer #2 (Evidence, Reproducibility, and Clarity)

      Reviewer comment: This manuscript describes multiple effects of positively-charged membrane-intercalating amphipaths (palmitoylcarnitine, PalmC, in particular) on TORC2 in yeast plasma membranes. It is a "next step" in the Loewith laboratory's characterization of the effect of this agent on this system. The study confirms the findings of Riggi et al.(2018) that PalmC inhibits TORC2 and drives the formation of membrane invaginations that contain phosphatidylinositol-bis-phosphate (PIP2) and other anionic phospholipids. It also demonstrates that PalmC intercalates into the membrane, acts directly (rather than through secondary metabolism) and is representative of a class of cationic amphipaths. The interesting finding here is that PalmC causes a rapid initial increase in the plasma membrane ergosterol accessible to the DH4 sterol probe followed by a decrease caused by its transfer to the cytoplasm through its transporter, LAM2/4. TORC2 is implicated in these processes. Loewith et al. have pioneered in this area and this study clearly shows their expertise. Several of the findings reported here are novel. However, I am concerned that PalmC may not be revealing the physiology of the system but rather adding tangential complexity. (This concern applies to the precursor studies using PalmC to probe the TORC2 system.) In particular, I am not confident that the data justify the authors' conclusions "...that TORC2 acts in a feedback loop to control active sterol levels at the PM and [the results] introduce sterols as possible TORC2 signalling modulators."

      Author response:

      We thank Reviewer #2 for the constructive and critical evaluation of our work. We appreciate the acknowledgment of the novelty and technical strength of several of our findings, and we understand the concern that PalmC could be eliciting non-physiological effects. Our study was designed precisely to use PalmC and similar membrane-active amphipaths as tools to strongly perturb the plasma membrane (PM) in a controlled and tractable way. We now state this intention explicitly in both the Introduction and Discussion sections. To address concerns about the specificity and physiological relevance of PalmC, we have expanded our dataset to include additional PM stressors (hyperosmotic shock, Arp2/3 inhibition, and heat shock), all of which reproduce key features observed with PalmC-namely, TORC2 inhibition, PM invaginations, and retrograde sterol transport (Fig. 4, S4).

      We hope this more comprehensive dataset, along with revised discussion and clarified claims, addresses the reviewer's concerns regarding physiological interpretation and artifact.

      Major issues 1 and 2: 1. The invaginations induced by PalmC may not be physiologic but simply the result of the well-known "bilayer couple" bending of the bilayer due to the accumulation of cationic amphipaths in the inner leaflet of the plasma membrane bilayer which is rich in anionic phospholipids. Such unphysiological effects make the observed correlation of invagination with TORC2 inhibition etc. hard to interpret.

      Electrostatic/hydrophobic association of PIP2 with PalmC could sequester the anionic phospholipid(s). Such associations could also drive the accumulation of PIP2 in the invaginations. This could explain PalmC inhibition of TORC2 through a simple physical rather than biological process. So, it is difficult to draw any physiological conclusion about PIP2 from these experiments.

      Response to major issues 1 and 2:

      We agree that amphipath-induced bilayer stress, including via the bilayer-couple mechanism, may contribute to PM curvature changes. However, the reviewer's assumption that PalmC inserts preferentially into the inner leaflet appears inconsistent with both literature and our observations. PalmC is zwitterionic, not cationic, and is unlikely to electrostatically sequester anionic lipids such as PIP2. For clarification, we included a short summary of our proposed mechanism of PalmC in the context of the current literature in our Discussion:

      "[...] study it was also demonstrated that addition of phospholipids to the outer PM leaflet causes an excess of free sterol at the inner PM leaflet, and its subsequent retrograde transport to lipid droplets (Doktorova et al., 2025). Although we cannot exclude that it is the substrate of a flippase or scramblase, PalmC is not a metabolite found in yeast, nor, given its charged headgroup, is it likely to spontaneously flip to the inner leaflet (Goñi, Requero and Alonso, 1996). Thus, we propose that PalmC accumulates in the outer leaflet, disrupts the lipid balance with the inner leaflet which is, similarly to the mammalian cell model (Doktorova et al., 2025), rectified by sterol mobilization, flipping and internalization (Fig. 5B)."

      While we agree that PM invaginations per se are not the central focus of this study, they are indeed a reproducible and biologically intriguing phenomenon. We emphasize that similar invaginations occur not only during PalmC treatment but also in response to other physiological stresses, such as hyperosmotic shock and Arp2/3 inhibition (Fig. 4), and have been reported independently by others (Phan et al., 2025). Furthermore, related structures have been documented in yeast mutants with altered PIP2 metabolism or TORC2 hyperactivity (Rodríguez-Escudero et al., 2018; Sakata et al., 2022; Stefan et al., 2002), and even in mammalian neurons with SJ1 phosphatase mutations (Stefan et al., 2002). These observations support our interpretation that the observed invaginations represent an exaggerated manifestation of a physiologically relevant stress-adaptive process. In our previous study we indeed proposed that PI(4,5)P2 enrichment in PM invaginations was important for PalmC-induced TORC2 inactivation, using the heat sensitive PI(4,5)P2 kinase allele mss4ts - a rather blunt tool (Riggi et al., 2018). We have now come to the conclusion that different mechanisms other than, or in addition to, PIP2 changes drive TORC2 inhibition in our system. In this study, we use the 2xPH(PLC) FLARE exclusively as a generic PM marker, not as a readout of PIP2 biology. Rather, we propose that sterol redistribution and/or the biophysical impact that this has on the PM are central drivers, with TORC2 acting as a signaling node that senses and adjusts PM composition accordingly.

      We now clarify these arguments in the revised Discussion and have reframed our use of PalmC as a probe to explore the capacity of the PM to adapt to acute stress via dynamic lipid rearrangements.

      Major issue 3:

      As the authors point out, a large number of intercalated amphipaths displace sterols from their association with bilayer phospholipids. This unphysiologic mechanism can explain how PalmC causes the transient increase in the availability of plasma membrane ergosterol to the D4H probe and its subsequent removal from the plasma membrane via LAM2/4. TORC2 regulation may not be involved. In fact, the authors say that "TORC2 inhibition, and thereby Lam2/4 activation, cannot be the only trigger for PalmC induced sterol removal." Furthermore, the subsequent recovery of plasma membrane ergosterol could simply reflect homeostatic responses independent of the components studied here.

      Response:

      We agree that increased free sterols in the inner leaflet likely initiate retrograde transport. Our results suggest that TORC2 inhibition facilitates this process by disinhibiting Lam2/4, allowing more efficient clearance of ergosterol from the PM (Fig. 3A, S2C). However, the process is not exclusively dependent on TORC2, and we state this explicitly.

      We do not observe recovery of PM ergosterol on the timescales measured, while TORC2 activity recovers, suggesting that restoration likely occurs later via biosynthetic or anterograde trafficking pathways, which are outside the scope of this study. These points are clarified in the revised Discussion.

      Major issue 3a:

      The data suggest that LAM2/4 mediates the return of cytoplasmic ergosterol to the plasma membrane. To my knowledge, this is a nice finding that not been reported previously and is worth confirming more directly.

      Response:

      We thank the reviewer for this observation but would like to clarify a misunderstanding: our data do not suggest that Lam2/4 mediates anterograde sterol transport. Our results and prior work (Gatta et al., 2015; Roelants et al., 2018) show that Lam2/4 mediate retrograde transport from the PM to the ER, and TORC2 inhibits this process. We now clarify this point in the revised manuscript, stating:

      "In vivo, Lam2/4 seem to predominantly transport sterols from the PM to the ER, following the concentration gradient (Gatta et al., 2015; Jentsch et al., 2018; Tong et al., 2018)."

      Major issue 4:

      I agree with the authors that "It is unclear if the excess of free sterols itself is part of the inhibitory signal to TORC2..." Instead, the inhibition of TORC2 by PalmC may simply result from its artifactual aggregation of the anionic phospholipids (especially, PIP2) needed for TORC2 activity. This would not be biologically meaningful. If the authors wish to show that accessible ergosterol inhibits TORC2 activity or vice versa, they should use more direct methods. For example, neutral amphipaths that do not cause the aforementioned PalmC perturbations should still increase plasma membrane ergosterol and send it through LAM2/4 to the ER.

      Response:

      We now provide evidence that three orthologous treatments (hyperosmotic shock, heat shock and Arp2/3 inhibition) similarly cause sterol mobilization and, in the absence of sterol clearance from the PM, prolonged TORC2 inhibition. These results do not support the reviewer's contention that the inhibition of TORC2 by PalmC is simply resulting from its artifactual aggregation of the anionic phospholipids. Furthermore, PalmC is zwitterionic, and its interaction with anionic lipids should be somewhat limited.

      In our experimental setup, neutral amphipaths did not trigger TORC2 inhibition or D4H redistribution While this differs from prior in vitro work (Lange et al., 2009), we attribute this in part to a discrepancy to experimental setup differences, including flow chamber artifacts that we discuss in the methods section.

      Importantly, only amphipaths with a charged headgroup, including zwitterionic (PalmC) and positively charged analogs, produced robust effects. A negatively charged derivative also seemed to have a minor effect on TORC2 activity and PM sterol internalization (Palmitoylglycine (Fig. 1D, Rebuttal Fig. 1). This suggests that in vivo, charge-based membrane perturbation is required to alter PM sterol distribution and TORC2 activity.

      Major issue 5.:

      The mechanistic relationship between TORC2 activity and ergosterol suggested in the title, abstract, and discussion is not secure. I agree with the concluding section of the manuscript called "Limitations of the study". It highlights the need for a better approach to the interplay between TORC2 and ergosterol.

      Response:

      This may have been true of the previous submission, but we now demonstrate that provoking PM stress in four orthogonal ways triggers mobilization of sterols, which left uncleared, prevents normal (re)activation of TORC2 activity. We thus conclude that free sterols, directly or more likely indirectly, inhibit TORC2. The role that TORC2 plays in sterol retrotranslocation has been demonstrated previously (Roelants et al., 2018). We believe our expanded data and clarified framework make a compelling case for a stress-adaptive role of sterol retrograde transport that is supervised and modulated-but not fully driven-by TORC2 activity.

      Thus, we feel in the present version of this manuscript that the title is now justified.

      Minor issue: Based on earlier work using the reporter fliptR, the authors claim that PalmC reduces membrane tension. They should consider that this intercalated dye senses many variables including membrane tension but also lipid packing. I suspect that, by intercalating into and thereby altering the bilayer, PalmC is affecting the latter rather than the former.

      Response:

      We thank the reviewer for this important point regarding the multifactorial sensitivity of intercalating dyes such as Flipper-TR®, including to membrane tension and lipid packing.

      We respectfully note, however, that our current study does not include any new data generated using Flipper-TR®. We referred to earlier work (Riggi et al., 2018) for context, where Flipper-TR® was used as a membrane tension reporter.

      We fully agree that the response of such "smart" membrane probes integrates multiple biophysical parameters-including tension, packing, and hydration-which are themselves interrelated as consequences of membrane composition (Colom et al., 2018; Ragaller et al., 2024; Torra et al., 2024). Indeed, this interconnectedness is central to our interpretation of PalmC's pleiotropic effects on the plasma membrane (PM). In our previous study, we observed that PalmC treatment not only reduced apparent PM tension (as measured by Flipper-TR®) but also increased membrane order ((Riggi et al., 2018); see laurdan GP, Fig. 6C), and here we show that it promotes the redistribution of free sterol away from the PM.

      Furthermore, PalmC's effect on membrane tension was supported by orthogonal in vitro data: its addition to giant unilamellar vesicles (GUVs) led to a measurable increase in membrane surface area and decreased tension, as shown by pipette aspiration ((Riggi et al., 2018), Fig. 3F). This provides complementary evidence that the membrane tension reduction is not merely an artifact of Flipper-TR® reporting.

      That said, we agree with the reviewer that in the case of TORC2 inhibition or hyperactivation, the observed changes in PM tension are based solely on Flipper-TR® data, without additional orthogonal validation. To address this concern, we have revised the relevant text in the manuscript to more cautiously reflect this complexity. The revised sentence now reads:

      "Consistent with this role, data generated with the lipid packing reporter dye Flipper-TR® suggest that acute chemical inhibition of TORC2 increases PM tension, while Ypk1 hyperactivation decreases it."

      This revised phrasing acknowledges both the utility and the limitations of Flipper-TR® as a probe of membrane biophysics.

      Reviewer #2 Significance:

      This is an interesting topic. However, use of the exogenous probe, palmitoylcarnitine, could be causing multiple changes that complicate the interpretation of the data.

      Reviewers #1 and #3 were much more impressed by this study than I was. I am not a yeast expert and so I may have missed or confused something. I would therefore welcome their expert feedback regarding my comments (#2). Ted Steck

      Response:

      Thank you for your constructive feedback.

      We believe that the manuscript is now much improved, and we hope to have convinced you that the mechanisms that we've elucidated using PalmC represent a general adaptation response to physiological PM stressors.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Reviewer comment: The authors describe the effects of surfactant-like molecules on the plasma membrane (PM) and its associated TORC2 complex. Addition of the surfactants with a positively-charged headgroup and a hydro-carbon tail of at least 16 caused the rapid clustering of PI-4,5P2 together with PI-4P and phosphatidylserine in large membrane invaginations. The authors convincingly demonstrate that this effect of the surfactants on the PM is likely caused by a direct disturbance of the PM organization and/or lipid composition. Interestingly, upon PalmC treatment, free ergosterol of the PM was found to first concentrate in the clusters, but within The kinetics of the changes in free ergosterol levels and the changes in TORC2 activity do not match. Ergosterol is rapidly depleted after PalmC treatment (The Lam2/4 data support the idea that ergosterol transport plays a role in the TORC2 recovery, but what role this is, is not clear to me. I think the data fit better with a model in which PalmC causes low tension of the PM which in turn disrupts normal lipid organization and thus causes TORC2 to shut down, maybe not by changes in free ergosterol but by changes, for instance, in lipid raft formation (which is in part effected by ergosterol levels). The transport of ergosterol is only one mechanism that is involved in restoring PM tension and TORC2 activity. However, sensing free ergosterol alone is most likely not the mechanism explaining how TORC2 senses PM tension.

      Therefore, I recommend that the model is revised (or supported by more data), reflecting the fact that free ergosterol levels do not directly correlate with the TORC2 activity, but instead might be only one of the PM parameters that regulate TORC2.

      Author response:

      We thank the reviewer for their thoughtful assessment and constructive suggestions. As described in more detail above, we have included in our revised version of this manuscript a variety of new data, including the sterol-internalization dependent adaptation of the PM and regulation of TORC2 during additional stresses. We think that these data vastly improve on our previous manuscript version. We have addressed each point risen by the reviewer below and revised the manuscript accordingly, including a rewritten discussion and updated model to better reflect the limitations of our current understanding of how TORC2 senses changes in the plasma membrane (PM). It is true that the appearance of PM invaginations tracks well with TORC2 inhibition, but it is not clear to us if they are upstream of this inhibition or merely another symptom of the preceding PM perturbation (PalmC-induced free sterol increase can be observed after 10s (Fig. S2A), but PM invaginations become visible only after ~1 min - meanwhile we can observe near complete TORC2 inhibition after 30s). In this study, we are mostly interested in the role of PM sterol redistribution in stress response. Indeed we think that the role of free sterol clearance during stresses is to adapt the PM to these stresses - thus restoring PM parameters which in turn reactivates TORC2. This can be seen for hyperosmotic stress and the newly introduced PM stressors, Arp2/3 inhibition and heat shock response (Fig. 4). We have therefore softened our model and updated discussion and final figure (Fig. 5) to reflect that TORC2 likely responds to broader changes in PM organization or tension, with sterol redistribution representing one of several contributing factors rather than the sole signal.

      Comment: - If TORC2 is indeed inhibited by free ergosterol, the addition of ergosterol to the growth medium should be able to trigger similar effects as PalmC. If this detection of free ergosterol is very specific (e.g. if TORC2 has a binding pocket for ergosterol) we would expect that addition of other sterols such a cholesterol or ergosterol precursors should not inhibit TORC2.

      Response:

      We appreciate this suggestion and agree that testing whether exogenous ergosterol can mimic PalmC effects would help assess specificity. However, yeast do not readily take up sterols under aerobic conditions, which renders artificial sterol enrichment at the yeast PM rather difficult. We have now included additional data characterizing our Lam2/4 mutants (see below), and pharmacological sterol synthesis inhibition, showing that a depletion of free sterols from the PM correlates with lower TORC2 activity (Fig. 2D, S2C). Additionally, as suggested, we tried to probe if ergosterol directly interacts with TORC2 through a specific binding pocket, by treating a yeast strain expressing cholesterol rather than ergosterol (Souza et al., 2011) with PalmC. However, the response of TORC2 activity in these cells was very similar to that of WT cells (Rebuttal Fig. 2). In conclusion, we agree that at present we do not know mechanistically how sterols affect TORC2 activity, although it does indeed seem more likely to be through an indirect mechanism linked to changes in PM parameters. The nature of such a mechanism will be subject to further studies. We hope that the introduced changes to the manuscript adequately reflect these considerations.

      Rebuttal Fig. 2: WT yeast cells which produce ergosterol as main sterol, and mutant cells which produce cholesterol instead were treated with 5 µM PalmC, and TORC2 activity was assessed by relative phosphorylation of Ypk1 on WB. One representative experiment out of two replicates.

      Comment: - The experiment in Figure 1C is not controlled for differences in membrane intercalation of the different compounds. For instance, does C16 choline and C16 glycine accumulate at the same rate in the PM (measure similar to experiment in Figure 1B). Maybe the positive charge at the headgroup of the surfactants increases the local concentration at the PM and therefore can explain the difference in effect on the PM.

      Response:

      We agree with the reviewer that the effects of the various PalmC derivatives are not directly controlled for differences in membrane intercalation. Our structure-activity screen was intended to demonstrate the general biophysical mode of action of PalmC-like compounds and to define minimal structural requirements for activity.

      We now note in the manuscript that differential membrane insertion could contribute to the observed variation in efficacy, particularly in relation to tail length. While we considered this additional suggested experiment, it was ultimately judged to be outside the scope of this study due to its complexity and limited impact on the central conclusions.

      A clarifying sentence has been added to the relevant results section to explicitly acknowledge this limitation:

      "We did not control for differences in PM intercalation efficiency."

      We also include a discussion here to further clarify our interpretation. Prior in vitro studies have shown that while intercalation is necessary, it is not sufficient for PM perturbation. For example, palmitoyl-CoA intercalates into membranes but does not induce the same biophysical effects as PalmC (Goñi et al., 1996; Ho et al., 2002). Thus, we believe that intercalation is only part of the story, and that the intrinsic propensity of different headgroups to perturb the PM plays a key role in the disruption of PM lipid organization.

      Comment: - Are the intracellular ergosterol structures associated (or in close proximity) with lipid droplets (ergosterol being modified and delivered into a lipid droplet)?

      Response:

      We thank the reviewer for raising this point. We now include additional data (Fig. S2H) showing that intracellular D4H-positive structures do not reside near or colocalize with lipid droplets. The latter is not entirely unexpected as D4H does not recognize esterified sterols. However, we do observe an increase in overall LD volume following PalmC treatment, consistent with the idea that internalized PM sterols may be stored in LDs as sterol esters over time - although we did not test if this increase in LD volume is Lam2/4 dependent. This increase is mentioned in the revised results text. An increase in cellular LDs has also been recently reported during hyperosmotic shock (Phan et al., 2025).

      For more attempts to identify a marker for intracellular D4H foci, see reply to reviewer 1.

      Comment:

      • How does the AA and DD mutations in Lam2/4 change the localization of the ergosterol sensor (before and after PalmC treatment).

      Response:

      We thank the reviewer for this question, as in the course of generating these data we realized that our "inhibited" DD mutant was in fact not phosphomimetic but displayed the same D4H distribution as the "hyperactive" AA mutant, i.e. a marked inwards shift of D4H signal away from the PM to internal structures due to increased PM-ER retrograde transport of sterols (Fig. S2C). This led us to critically re-evaluate and ultimately repeat our TORC2 activity WB experiments for PalmC treatment in LAM2/4 mutants. In this new set of experiments, the faster TORC2 recovery after PalmC treatment in the LAM2T518A LAM4S401A mutant did unfortunately not repeat robustly. It is possible that such differences can be observed under specific conditions. Nevertheless, the improved overall quality of the Western blot data allowed us to make the observation that baseline activity was already slightly different in these strains. The Lam2/4 centered part of the results section has subsequently been updated in the manuscript:

      "Using a phosphospecific antibody, we did not observe an increase in baseline TORC2 activity in lam2Δ lam4Δ cells, which had been previously reported by electrophoretic mobility shift (Murley et al., 2017). Instead, baseline TORC2 activity was consistently slightly decreased in these cells (Fig. 2D). Ypk1, activated directly by TORC2, inhibits Lam2 and Lam4 through phosphorylation on Thr518 and Ser401, respectively (Roelants et al., 2018; Topolska et al., 2020). We substituted these residues with alanine, generating a strain in which Lam2/4 were no longer inhibited by phosphorylation (Roelants et al., 2018). In these cells, yeGFP-D4H showed that free sterols were constitutively shifted away from the PM to intracellular structures (Fig. S2C, bottom panel). Intriguingly, in opposition to lam2Δ lam4Δ cells, basal TORC2 activity was increased in LAM2T518A LAM4S401A cells (Fig. 2D). This suggests that a decrease in free PM sterols stimulates TORC2 activity [...]"

      "In LAM2T518A LAM4S401A cells, TORC2 activity recovers with similar kinetics as the WT (Fig. 2D, bottom blot), suggesting that Lam2/4 release from TORC2 dependent inhibition during PalmC treatment is a fast and efficient process in WT cells, not further expedited by these constitutively active Lams."

      As suggested, we also observed D4H localization in LAM2T518A LAM4S401A after PalmC treatment, and implemented these data to further demonstrate that PalmC causes an increase in the fraction of free ergosterol at the PM, which is subsequently removed:

      "PalmC addition to LAM2T518A LAM4S401A cells likewise resulted first in a transient increase and then a further decrease in PM yeGFP-D4H signal (Fig. 3C, S3D)."

      Comment: - Does Lam2/4 localize to ER-PM contact sites near the large PM invaginations, which could allow for efficient transport of the free ergosterol that accumulates in these structures.

      Response:

      We were curious about this too, and have now added the requested data in our supplementary material and added a sentence in our results:

      "Indeed, in cells expressing GFP-Lam2 we observed that PalmC induced PM invaginations often formed at sites with preexisting GFP-Lam2 foci (Fig. S2K, cyan arrow), although GFP-Lam2 foci did not always colocalize with invaginations (Fig. S2K, yellow arrow) and vice versa. "

      Additionally, in the effort to characterize intracellular D4H foci during PalmC as requested by reviewer 1, we also looked at the localization of these foci relative to ER, and found that

      "During early timepoints, intracellular foci are usually in close vicinity to ER (Fig. S2E)"

      Reviewer #3 (Significance (Required)): The manuscript describes the effects of small molecule surfactants on the PM organization and on TORC2 activity. This is an important set of observation that helps understanding the response of cells to environmental stressors that affect the PM. This field of study is very challenging because of the limited tools available to directly observe lipids and their movements. I consider the data and most of its interpretations of high importance, but I am not convinced of the larger model that tries to link the ergosterol data with TORC2 activity. With adjustments of the model or additional experimental support, this manuscript will be of general interest for cell biologists, especially for researchers studying membrane stress response pathways.

      Response:

      We thank the reviewer for highlighting the importance of studying PM stress responses and acknowledging the technical challenges involved. We hope the applied changes and additional data succeed in softening our claims about TORC2 regulation while convincing the reviewer that free sterol levels at the PM are one of several contributing factors that correlate with changes in TORC2 activity.

      Colom, A., Derivery, E., Soleimanpour, S., Tomba, C., Dal Molin, M., Sakai, N., González-Gaitán, M., Matile, S., Roux, A., 2018. A Fluorescent Membrane Tension Probe. Nat. Chem. 10, 1118-1125. https://doi.org/10.1038/s41557-018-0127-3

      Flesch, F.M., Yu, J.W., Lemmon, M.A., Burger, K.N.J., 2005. Membrane activity of the phospholipase C-δ1 pleckstrin homology (PH) domain. Biochem. J. 389, 435-441. https://doi.org/10.1042/BJ20041721

      Gatta, A.T., Wong, L.H., Sere, Y.Y., Calderón-Noreña, D.M., Cockcroft, S., Menon, A.K., Levine, T.P., 2015. A new family of StART domain proteins at membrane contact sites has a role in ER-PM sterol transport. eLife 4. https://doi.org/10.7554/eLife.07253

      Goñi, F.M., Requero, M.A., Alonso, A., 1996. Palmitoylcarnitine, a surface-active metabolite. FEBS Lett. 390, 1-5. https://doi.org/10.1016/0014-5793(96)00603-5

      Ho, J.K., Duclos, R.I., Hamilton, J.A., 2002. Interactions of acyl carnitines with model membranes: a (13)C-NMR study. J. Lipid Res. 43, 1429-1439. https://doi.org/10.1194/jlr.m200137-jlr200

      Jentsch, J.-A., Kiburu, I., Pandey, K., Timme, M., Ramlall, T., Levkau, B., Wu, J., Eliezer, D., Boudker, O., Menon, A.K., 2018. Structural basis of sterol binding and transport by a yeast StARkin domain. J. Biol. Chem. 293, 5522-5531. https://doi.org/10.1074/jbc.RA118.001881

      Murley, A., Yamada, J., Niles, B.J., Toulmay, A., Prinz, W.A., Powers, T., Nunnari, J., 2017. Sterol transporters at membrane contact sites regulate TORC1 and TORC2 signaling. J. Cell Biol. 216, 2679-2689. https://doi.org/10.1083/jcb.201610032

      Otzen, D.E., Blans, K., Wang, H., Gilbert, G.E., Rasmussen, J.T., 2012. Lactadherin binds to phosphatidylserine-containing vesicles in a two-step mechanism sensitive to vesicle size and composition. Biochim. Biophys. Acta BBA - Biomembr., Protein Folding in Membranes 1818, 1019-1027. https://doi.org/10.1016/j.bbamem.2011.08.032

      Phan, J., Silva, M., Kohlmeyer, R., Ruethemann, R., Gay, L., Jorgensen, E., Babst, M., 2025. Recovery of plasma membrane tension after a hyperosmotic shock. Mol. Biol. Cell 36, ar45. https://doi.org/10.1091/mbc.E24-10-0436

      Ragaller, F., Sjule, E., Urem, Y.B., Schlegel, J., El, R., Urbancic, D., Urbancic, I., Blom, H., Sezgin, E., 2024. Quantifying Fluorescence Lifetime Responsiveness of Environment-Sensitive Probes for Membrane Fluidity Measurements. J. Phys. Chem. B 128, 2154-2167. https://doi.org/10.1021/acs.jpcb.3c07006

      Riggi, M., Niewola-Staszkowska, K., Chiaruttini, N., Colom, A., Kusmider, B., Mercier, V., Soleimanpour, S., Stahl, M., Matile, S., Roux, A., Loewith, R., 2018. Decrease in plasma membrane tension triggers PtdIns(4,5)P2 phase separation to inactivate TORC2. Nat. Cell Biol. 20, 1043-1051. https://doi.org/10.1038/s41556-018-0150-z

      Rodríguez-Escudero, I., Fernández-Acero, T., Cid, V.J., Molina, M., 2018. Heterologous mammalian Akt disrupts plasma membrane homeostasis by taking over TORC2 signaling in Saccharomyces cerevisiae. Sci. Rep. 8, 7732. https://doi.org/10.1038/s41598-018-25717-w

      Roelants, F.M., Chauhan, N., Muir, A., Davis, J.C., Menon, A.K., Levine, T.P., Thorner, J., 2018. TOR complex 2-regulated protein kinase Ypk1 controls sterol distribution by inhibiting StARkin domain-containing proteins located at plasma membrane-endoplasmic reticulum contact sites. Mol. Biol. Cell 29, 2128-2136. https://doi.org/10.1091/mbc.E18-04-0229

      Sakata, K.-T., Hashii, K., Yoshizawa, K., Tahara, Y.O., Yae, K., Tsuda, R., Tanaka, N., Maeda, T., Miyata, M., Tabuchi, M., 2022. Coordinated regulation of TORC2 signaling by MCC/eisosome-associated proteins, Pil1 and tetraspan membrane proteins during the stress response. Mol. Microbiol. 117, 1227-1244. https://doi.org/10.1111/mmi.14903

      Shao, C., Novakovic, V.A., Head, J.F., Seaton, B.A., Gilbert, G.E., 2008. Crystal Structure of Lactadherin C2 Domain at 1.7Å Resolution with Mutational and Computational Analyses of Its Membrane-binding Motif*. J. Biol. Chem. 283, 7230-7241. https://doi.org/10.1074/jbc.M705195200

      Shi, J., Heegaard, C.W., Rasmussen, J.T., Gilbert, G.E., 2004. Lactadherin binds selectively to membranes containing phosphatidyl-L-serine and increased curvature. Biochim. Biophys. Acta 1667, 82-90. https://doi.org/10.1016/j.bbamem.2004.09.006

      Souza, C.M., Schwabe, T.M.E., Pichler, H., Ploier, B., Leitner, E., Guan, X.L., Wenk, M.R., Riezman, I., Riezman, H., 2011. A stable yeast strain efficiently producing cholesterol instead of ergosterol is functional for tryptophan uptake, but not weak organic acid resistance. Metab. Eng. 13, 555-569. https://doi.org/10.1016/j.ymben.2011.06.006

      Stefan, C.J., Audhya, A., Emr, S.D., 2002. The yeast synaptojanin-like proteins control the cellular distribution of phosphatidylinositol (4,5)-bisphosphate. Mol. Biol. Cell 13, 542-557. https://doi.org/10.1091/mbc.01-10-0476

      Tong, J., Manik, M.K., Im, Y.J., 2018. Structural basis of sterol recognition and nonvesicular transport by lipid transfer proteins anchored at membrane contact sites. Proc. Natl. Acad. Sci. 115, E856-E865. https://doi.org/10.1073/pnas.1719709115

      Topolska, M., Roelants, F.M., Si, E.P., Thorner, J., 2020. TORC2-Dependent Ypk1-Mediated Phosphorylation of Lam2/Ltc4 Disrupts Its Association with the β-Propeller Protein Laf1 at Endoplasmic Reticulum-Plasma Membrane Contact Sites in the Yeast Saccharomyces cerevisiae. Biomolecules 10, 1598. https://doi.org/10.3390/biom10121598

      Torra, J., Campelo, F., Garcia-Parajo, M.F., 2024. Tensing Flipper: Photosensitized Manipulation of Membrane Tension, Lipid Phase Separation, and Raft Protein Sorting in Biological Membranes. J. Am. Chem. Soc. 146, 24114-24124. https://doi.org/10.1021/jacs.4c08580

      Uekama, N., Aoki, T., Maruoka, T., Kurisu, S., Hatakeyama, A., Yamaguchi, S., Okada, M., Yagisawa, H., Nishimura, K., Tuzi, S., 2009. Influence of membrane curvature on the structure of the membrane-associated pleckstrin homology domain of phospholipase C-δ1. Biochim. Biophys. Acta BBA - Biomembr. 1788, 2575-2583. https://doi.org/10.1016/j.bbamem.2009.10.009

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all the reviewers for their helpful and constructive comments and for their time.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):*

      Summary: Dady et al have developed fluorescent reporters to enable live imaging of cell behaviour and morphology in human pluripotent stem cell lines (PSCs). These reporters target 3 main features, the plasma membrane, nucleus and cytoskeleton. Reporter PSCs have been generated using a piggyBac transposon-mediated stable integration strategy, using a hyperactive piggyBac transposase (HyPBase). The same constructs were also used for mosaic labelling of cells within 2D cultures using lipofectamine transfection.

      The reporters used are tagged with either eGFP or mKate2 (far red) and tag the plasma membrane (pm) via the addition of a 20 amino-acid sequence from rat GAP-43 to the N-terminus of the fluorescent protein, the nucleus via Histone 2B with a laser-mediated photo-conversion option (H2B-mEos3.2), and the cytoskeleton via F-Tractin. In total, the authors produced lines with the following:

      • pm-mKate2 (far red) • pm-eGFP (green) • H2B-mEos3.2 (green to red) • F-tractin-mKate2 (far red) • H2B-mEos3.2 and pm-mKate2 (green to red, plus far red)

      The cell lines used to generate these were the human embryonic stem cell line H9 and human induced pluripotent cell line ChiPS4. The constructs were also used to label cells in a mosaic fashion, using lipofectamine transfection of the original cell lines once they had formed neural rosettes.

      Using these cells, Dady et al then performed live imaging in vitro of human spinal cord rosettes and assessed cell behaviour. In particular they analysed mitotic cleavage planes and apical positioning of neural progenitor cells (NPCs), and assessed actin dynamics within these cells. They showed a slowing of the cell cycle length after the initial expansion phase, an increase in the rate of asymmetric division of these NPCs, and abscission of the apical membrane during these divisions. The F-tractin reporter showed enrichment at the basal nuclear membrane during these cell divisions, suggested to help prevent basal chromosome displacement during mitosis.

      Major comments: The data presented are convincing and could be strengthened by the following additions and clarifications:*

      1. How long do the fluorescent reports take to be visible when transfected via lipofectamine? How efficiently are they expressed? And what concentrations were tested to enable the mosaic expression presented? * We followed the manufacturer’s instructions for Lipofectamine 3000 transfection, using the protocol recommended for set up for a 6 wells plate. We detected fluorescence the following morning ~16h. We did not assess earlier time points or optimise efficiency as we observed the mosaic pattern of expression we set out to achieve, with small groups of labelled cells and single cells as shown in Figure 3 and movies 2 and 3. This information and the detailed protocol provided below are now included in the Methods section “Labelling individual cells in human spinal cord rosettes by lipofection”.

      Manufacturer’s instructions for Lipofectamine 3000 transfection (6 well plate):

      • 1 tube containing 125 ul of Opti-MEM and 7.5 ul of Lipofectamine 3000
      • 1 tube containing 250 ul of Opti-MEM with 5 ug of DNA (total mix DNAs of 2 ug/ul) and P3000 Reagent
      • Add diluted DNA to diluted Lipofectamine 3000 (Ratio 1:1) and incubate for 10 to 15 min at Room Temperature.
      • 20 ul of DNA-Lipid complex was added to neural rosettes growing in 8 well IBIDI dishes (20 ul/well).
      • The ratio of DNA (PiggyBac plasmid) and HypBase transposase was kept at 5:1 (for a final concentration of 2ug/ul).
      • Cells in IBIDI dishes were left to develop in a sterile incubator overnight and mosaic fluorescence was observed the following morning (~16h post-lipofection).

      • Will these cell lines and constructs be made publicly available after publication?*

      The cell lines can be made available: for those reporters made in the H9 WiCell line an MTA will first have to be signed between the requesting PI and WiCell and permission for us to share the line(s) confirmed by WiCell; similarly, for reporters in ChiPS4 line an MTA will first need to be signed between the requesting PI and Cellartis/TakaraBio Europe. We will need to make a charge to cover costs. Constructs will be deposited with Addgene.

      • Were the H9 and ChiPS4 lines characterised after the reporters were added to show they still proliferate/differentiate as they did prior to the reporter integration*?

      In the Results we make clear that all lines created are polyclonal, with exception of a pm-eGFP ChiPS4 line, which is a monoclonal line (lines 145-150). We do not have direct data measuring cell proliferation but collected cell passaging data for all the reporter lines. This showed that they grow to similar densities at each passage compared to the parental line (this metadata is now provided as Supplementary data 1 and is cited in the Methods, line 348).

      As a proof of principle for this approach, we created one monoclonal line from a polyclonal line ChIPS4-pm-eGFP. The latter was made by selecting an individual clone and this was then expanded and characterised for expression of pluripotency markers (immunocytochemistry data Figure S4), and the ability to differentiate into 3 germ layers (qPCR Supplementary data 1). This information is already cited in the Methods (Lines 358-362).

      • Can the novel actin dynamics described be quantified? How many cells imaged show these novel dynamics?* Some of this quantification data was already reported in the paper (in figure 4 legend and in the Methods); we have now updated this and provide the detailed metadata in an Excel spread sheet, Supplementary data 4 (cited in the Methods, line 489)

      Minor comments: 1. Some images in the figures and supplemental movies are low in resolution, for example the DAPI in Fig 4B, making it hard to distinguish individual cells. Please increase this.

      We consider the DAPI labelling in Figure 4b to be clear, however, we wonder whether the reviewer was expecting to also see this combined with the other markers. We have therefore now provided these merged additional images in a revised Figure 4.

      • Please show a merge of Phalloidin and F-Tractin in Fig4, this will help the colocalization to be fully appreciated.*

      This has now been provided in revised Figure 4B.

      • Some additional annotation on the supplemental movies would be useful to indicate to the **reader exactly what cell to follow. *

      We have added indicative arrows to the movies, and note that more detailed labelling of the series of still images from these movies are provided in the main figures (Figures 3D and 4E & F).

      *Reviewer #1 (Significance (Required)):

      Human neurogenesis is currently poorly understood compared to many model systems used, yet key differences have already been identified between the human and the mouse, prompting the need for further investigation of human neural development. A major reason that human neurogenesis has been difficult to study is a lack of tools to enable cell morphology and behaviours to be analysed in real time.

      The reporters and reporter PSC lines generated by Dady et al will allow many of these cell characteristics to be observed using live imaging. For example, the morphology of neural progenitors during and after cell divisions, how the apical and basal processes and membranes are divided, and how the actin cytoskeleton helps to regulate these processes.

      *Importantly, PSC lines can be very heterogeneous, making generating reporter lines costly and time intensive. The use of these reporters with lipofectamine transfection, for a mosaic labelling, allows the visualisation of the plasma membrane, nucleus and cytoskeleton in any human PSC/NPC line, or even in human tissue cultures, without the need to generate each specific reporter line, making it a valuable tool for many labs in the field.

      We strongly agree with this final point; this is a major reason for our study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):*

      The manuscript describes the generation of novel lines of human pluripotent stem cells bearing fluorescent reporters, engineered through piggyBac transposon-mediated integration. The cells are differentiated into neuronal organoids, allowing to capture cellular behaviors associated to cell division. A replating protocol allows the observation of aging neurons by reducing the thickness of the tissue thereby facilitating live imaging. The authors also leverage the transposon technology to create mosaically-labelled organoids which allows visualizing aspects of neuronal delamination, notably cytoskeleton dynamics. They discover an undescribed pattern of F-actin enrichment at the basal nuclear membrane prior to nuclear envelope breakdown.

      L104-109: "Moreover, the transposon system obviates drawbacks of directly engineering endogenous proteins...". Despite the risk of endogenous protein dysfunction, directly tagging allows the full regulation of gene expression (including the promoter, the enhancers and other regulatory regions rather than a strong constitutive promoter such as CAG). In addition, the number of copies integrated and the genomic regions are variable with PB, which does not reflect the endogenous expression. This could be rephrased by nuancing the advantages and drawbacks of each approach. The PiggyBac method is easier and faster, but it results in overexpression of a tagged protein that will be expressed since the hESC state and might not reflect the expression dynamics of the endogenous protein.* We agree and have now revised this in the Introduction L109-118.

      *L124-126: "To monitor cell shape and dynamics we used a plasma membrane (pm) localized protein tagged with eGFP or mKate2 (pm-eGFP or pm-mKate2)." Could the authors provide more details and a reference on the palmitoylated rat peptide use to force membrane expression? *

      This information, including the peptide sequence, is provided in the Methods (L330-331), we have now added a reference addressing its role in membrane localisation PMID: 2918027.

      L132-133: " Finally, to observe actin cytoskeletal dynamics we selected F-tractin, for its minimal impact on cytoskeletal homeostasis".

      A recent JCB paper (https://doi.org/10.1083/jcb.202409192) suggests that "F-tractin alters actin organization and impairs cell migration when expressed at high levels". Whether the overexpression of F-tractin in hESC using a CAG promoter reflects the physiological F-actin dynamics and/or if the high levels could lead to an alteration of cell behavior should be addressed or at least discussed. The paper we cite in this sentence (Belin et al 2014) evaluates F-tractin expression against other approaches to labelling and monitoring the actin cytoskeleton and concludes that in comparison F-tractin has minimal impact.

      We do appreciate that expression above the endogenous level has the potential to alter cell behaviour and have revised the paper to more explicitly acknowledge this: in the Introduction (L109-112), and in the Discussion/conclusion (L289-293) where we now note the recent advances reported in Shatskiy et al. 2025 PMID: 39928047.

      “A further potential limitation of this approach is that over-expression driven by the CAG promoter might not reflect physiological protein dynamics and/or alter cell behaviour; for example, high levels of F-Tractin can impair cell migration and induce actin bundling, interestingly, this can now be minimised by removing the N-terminal region (Shatskiy et al 2025)”.

      L146-147: "...to generate polyclonal cell lines selected for expression of easily detectable (medium level) fluorescence for live imaging studies". What are the criteria used to define medium level? Number of copies integrated into the genome? Or levels by FACS during clone selection?

      To clarify, all the lines presented here are polyclonal, except for one clonal line, pm-eGFP in ChiPS4. The numbers of copies integrated may vary from cell to cell in polyclonal lines. In this study, we selected cells for all lines with a FACS gate and this data is presented in Figure S1 (see line 147).

      L260-263: "Efficient stable integration and moderate expression levels were achieved by optimising, i) the quantity and ratio of piggyBac plasmids and transposase and ii) subsequent FACS to exclude high expressing cells, as well as iii) transfection methods, including temporally defined lipofection in hiPSC-derived tissues." The ration 5:1 is classically used for PB Transposase delivery, however there is still high variability in the number of copies integration. Lipofection in derived tissues has been shown to be challenging. Could the authors should provide quantitative data regarding the efficiency of their approaches, notably the level of mosaicism one could expect?

      We provide quantitative data for the efficiency of transfection using nucleoporation assays (FACS data presented in Supplementary figure S1), which shows more than 80-90% efficiency for eGFP in 82.82% of cells, mKate2 in 92.74% of cells, and H2B-mEos3 22.75% of cells, while 13.79% of cells co-expressed pm-Kate and H2B-mEos3.2. No comparative data regarding the efficiency of the tissue Lipofection assay was collected: our goal was to label single/small numbers of cells in order to monitor individual cell behaviours, and this “inefficient labelling” was readily achieved following the manufacturer’s instructions (please see response to Review 1 point 1), further details are now provided in the Methods.

      L191-194: "We further wished to monitor sub-cellular behaviour within the developing neuroepithelium. To achieve this, we devised a strategy to target a mosaic of cells in established neural rosettes using lipofection. PiggyBac constructs and HyPBase transposase were transfected into D8/D9 human spinal cord neural progenitors using lipofectamine (Felgner, et al., 1987)(Fig. 3A)." The mosaicism is not an all or nothing in this method but also leads to variations in expression levels among the positive cells. The protocol for lipofection could be better detailed to allow easy reproduction by other teams, and its expected efficiency should be discussed. It would be interesting to explore the relationship between individual cells phenotype and expression levels. Please see response to Reviewer 1 point 1 above for more detailed lipofection protocol which generated mosaic expression, this is now also included in the Methods. We agree that investigating the relationship between individual cell phenotypes and expression levels would be interesting, but we think this is beyond the scope of this paper.

      Additional comments: -Did the authors perform karyotyping of the hPSCs prior to use in the differentiation protocol?

      As these are polyclonal lines, we did not undertake karyotyping. This could be done for the one monoclonal line described here (pm-eGFP ChiPS4 line): we lack funds for commercial options, but we are exploring other possibilities.

      -Were pluripotency assays performed after reporter lines generation?

      These were carried out for the clonal pm-eGFP ChiPS4 line (lines 145-150). The latter was made by selecting an individual clone and this was then expanded and characterised for expression of pluripotency markers by IF (Figure S4), and the ability to differentiate into 3 germ layers by qPCR (Supplementary data 2). This information is provided in the Methods (Lines 358-362).

      *-Did the authors measure the cell proliferation rate in H2B-overexpressing cells and controls? Since H2B plays an important role in cytokinesis, it could interfere in cell division when H2B is overexpressed (see doi: 10.3390/cells8111391). *

      We did not directly measure cell division when H2B is over-expressed. However, we assessed cell -passaging time of all the transfected cell lines. This showed that they grow to similar densities at each passage compared to the parental line (this is now provided as Supplementary data 1 and is cited in the Methods, line 348). We also found no difference between apical visiting time of progenitors in spinal cord rosettes expressing pm-eGFP or H2B-mEoS3.2, further supporting the conclusion that levels of H2B-mEoS3.2 expression achieved in this line did not interfere with cell division (metadata provided in Supplementary data 3).

      The authors should provide data concerning the efficiency of expression of the distinct markers after electroporation. This is provided in Supplementary Figure S1 (FACS data) and detailed above for this reviewer.

      *At Fig 1C, the schematic representation describes clone selection, however in the methods it is stated (L348-349): "Sorted cells expressing medium levels of fluorescence were expanded and frozen then representative lots of each polyclonal cell line...". There is some confusion regarding which experiments were performed using polyclonal medium-level mixed populations or monoclonal populations. *

      We apologise for any confusion and have revised the Figure 1C schematic to indicate that cells can be selected to either make polyclonal lines or clonal lines.

      *Reviewer #2 (Significance (Required)):

      The study provides novel tools, as well as elements regarding neuroepithelium biology. It is well conducted and written, and the quality of images is excellent. It reads more as a resource paper in its current version, since the observation regarding neural cell division and delamination are interesting but not deeply explored, so this review will focus on those technical aspects rather than the novelty of the biological findings.

      This study would be of interests for researchers in stem cells and organoids, developmental biology, and neurosciences.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the manuscript, "Engineering fluorescent reporters in human pluripotent cells and strategies for live imaging human neurogenesis" the authors Dady et al. describe the adaptation of a recent advancement in transposase technology (HyPBase) as a method to integrate live reporters in human pluripotent stem cells. They show that these florescent reporters paired with new imaging strategies can be used to confirm the existence cellular behaviour described in other species such as the interkinetic nuclear migration (IKNM) of dividing progenitors in neural tube development. Finally, they demonstrate that this live imaging system is also able to discover novel biology by identifying previously undescribed actin polymerization at the basal nuclear surface of cortical progenitors undergoing cell division. Overall, the study presents two examples in which this adapted tool will aid in live-imaging studies of cellular biology.

      Major Concerns: 1. This work needs more controls to properly demonstrate claims that their engineering strategy provides an advancement to current Piggyback methods. Their HyPBase strategy needs to be compared and quantified in terms of efficiency with other methods to support their claims (increased detection and reduced phototoxicity).*

      We do not make specific claims for our experiments with respect to the superiority of HyPBase strategy. Our comments on this approach referred to by the reviewer here are in the Introduction (L 94-103), are supported by the literature (e.g. more stable gene expression than native piggyBac or the Tc1/mariner transposase Sleeping Beauty (Doherty, et al., 2012, Yusa, et al., 2011) and serve to explain our selection of HyPBase for our experiments. We make a case for using HyPBase as opposed to another transposase and although it would be interesting to compare efficiencies, this comment does not specify what “other methods” might be informative.

      2.Throughout the manuscript more quantification is needed of the results. How many rosettes were examined? Were all the reported cells within one rosette? Were there differences between rosettes? This should be done for both the spinal and cortical differentiations.

      The reviewer appears to have missed this information – we placed detailed quantifications in the figure legends (numbers of independent experiments and rosettes) and in the Methods in a specific section on Quantification of cell behaviour (L465-486), rather than in the main text. These has since been further updated and we now also provide additional metadata in the form of Excel spreadsheets for quantifications and analyses made for both spinal cord and cortical rosettes (Supplementary data 3 and 4 respectively).

      Minor Comments: 1. Line 246 needs quantification shown in figures of the statements made. Specifically, how many cells were measured to get this number?

      This information was provided in the figure 4 legend and we have since added numbers to these data; we were able to monitor 169 divisions in 21 rosettes; 154/166 divisions had vertical cleavage planes (symmetric) and 12/166 had horizontal cleavage planes (asymmetric).

      These detailed observations were made in two independent experiments, along with observations of basal nuclear membrane F-Tractin localisation. This is noted in figure 4 legend, Methods and detailed metadata is provided in Supplementary data 4.

      2.How many cells in the cortical rosettes had the enriched actin at the basal nuclear surface?

      We confidently observed basal nuclear membrane F-Tractin enrichment in 141/146 divisions, for the remaining 20 cases (166-146), we could not tell whether F-Tractin is enriched or not at the basal nuclear membrane either because of low expression levels or because the basal nuclear membrane was out of focus at NEB. In 5 cases, we did not see the basal nuclear enrichment despite sufficient F-Tractin expression levels and the nucleus being in focus. We have updated the Fig4 legend excluding the non-analysable cases and see detailed metadata is provided in Supplementary data 4.

      *Reviewer #3 (Significance (Required)):

      General Assessment: This manuscript makes a very minor advancement in the field of stem cell engineering and developmental biology, but one that is worthy of publication with a few edits.

      Advance: While PiggyBac reporters are widely used in stem cell engineering, Dady et al. demonstrate a new workflow using HyPBase which would be beneficial to the field. However, to increase this benefit, much more description and quantification of the methods would be needed. The biological advances of this manuscript are also very minor, but interesting as most of them confirm that human neural rosettes mimic many of the observed cell behaviours seen in animal models. Along these lines is the actin dynamics observation in cortical rosettes is interesting, but a preliminary observation and in need of follow up experiments.

      Audience: Regardless, this technique would be of interest to the wider field of stem cell engineering.

      My Expertise: Human Stem Cell Engineering, Neural Tube Development*

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have given the wrong impression about SiNET6 classification (it was labeled in Fig. 4a in a misleading manner). In the revised manuscript, we corrected the labeling in Fig. 4a and clarified that SiNET6 is not assigned to any subtype. We also further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.     

      (Additional specific recommendations for the authors are provided below)

      (2) Results:

      Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We do not find evidence for similar progenitors in the SiNET samples, but they also do not contain two co-existing lineages of cancer cells within the same tumor, so this is harder to define. We agree about the need for additional validation for this specific finding and have noted that in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Methods:

      a) Could the team clarify the discrepancy in subtype assignment between two samples from the same patient? i.e. are these samples from the same tumor? If so, what does the team think is the explanation for the difference in subtype assignment?

      As noted above in response to the public review of reviewer #1, SiNET6 was in fact not assigned to any subtype (due to insufficient NE cells) and hence there was no discrepancy. We apologize for the misleading labeling of SiNET6 in the previous version and have corrected this In the revised version of Figure 4.

      b) What is the rationale for scoring tumor-derived programs on samples with no tumor cells? For instance, SiNET3 does not contain NE cells, and SiNET9 has a very low fraction of NE cells. Please clarify how the scoring was performed on these samples, as the program assignments may be driven by other cell types in samples with little to no NE cells.

      Scoring for tumor-derived programs was done only for the NE cells. Accordingly, SiNET3 was not scored or assigned to any of the programs. SINET9 was included in this analysis - although it had a relatively small fraction of NE cells, the absolute number of profiled cells was particularly high in this sample and therefore the number of NE cells was 130, higher than our cutoff of 100 cells.

      c) Given the heterogeneity of cell types within each sample, would there be a way to provide a refined sense of confidence for certain cell type annotations? This would be helpful given the heterogeneity in marker gene expression and the absence of gold-standard markers for fibroblasts and endothelial cells in this cancer type. Additionally, there seems to be an unusually large proportion of NK and T cells - was there selection for this (given that these tumors are largely not immune infiltrated)?

      Author Response: Except for the Neuroendocrine cells, there are six TME cell types that we consistently find in multiple SiNET samples: macrophages, T cells, B/plasma cells, fibroblasts, endothelial and epithelial cells. Each of these cell types are identified as discrete clusters in analysis of the respective tumors (as shown in Fig. 1a,b and Fig. S1), and these are exactly the six most common non-malignant cell types that we and others found in single cell analysis across various other tumor types (e.g. see Gavish et al. 2023, ref. #15). The signatures used to annotate these cell types are shown in Table S2, and they primarily consist of classical markers that are traditionally used to define those cell types. We therefore believe that the annotation of these typical tumor-associated cell types is robust and does not include major uncertainties. In addition to these five common cell types, there are three cell types that we find only in 1-2 of the samples – epithelial cells, plasma cells and NK cells. Again, we believe that their annotation is robust, and these cell types are primarily not used for further analysis.

      There was no selection for any specific cell types in this study. Nevertheless, single cell (or single nuclei) analysis may lead to biases towards specific cell types, that we cannot evaluate directly from the data. NK cells were detected only in one tumor. T cells were detected in eight of the ten samples; but in four of those samples the frequency of T cells was lower than 5% and only in one sample the frequency was above 20%. Therefore, while we cannot exclude a technical bias towards high frequency of T/NK cells, we do not consider these frequencies as high enough to suggest this specific type of bias. In the revised manuscript, we clarify that the commonly observed cell types in SiNETs are the same as those commonly observed in other tumors and we acknowledge the possibility of a technical bias in cell type capture.  

      d) Evaluating the expression of one gene at a time may not effectively demonstrate subtype-specific patterns, particularly when comparing NE cells from one tumor to non-NE cells from another, which may not be an appropriate approach for identifying differentially expressed genes. DE analysis coupled with concordance analysis, for example, could strengthen the results.

      We apologize, but we do not fully understand this comment. We note that the initial normalization by non-NE cells was done in order to decrease batch effects when combining the data of the two platforms. We also note that the two subtypes were identified by two distinct approaches, as shown in Fig. 2c and in Fig. 2f.

      (2) Results:

      See the above public review.

      (3) Minor Comments:

      a) Results: Single cell and single nuclei RNA-seq profiling of SiNETs

      The results say ten primary tumor samples from eight patients. Later in the paragraph it says, "After initial quality controls, we retained 29,198 cells from the ten patients." Please clarify to either ten samples or eight patients.

      Indeed these are ten samples rather than ten patients. We corrected that in the revised version and thank the reviewer for noticing our error.

      b) Methods:

      - Please specify which computational tools were used to perform quality control, signature scoring, etc.

      The approaches for quality control, scoring etc. are described in the methods. We implemented these approaches with R code and did not use other computational tools.

      - Minor point but be consistent with naming convention (ie, siAdeno vs SiAdeno) throughout the paper. For example, under "Sample Normalization, Filtering and annotations" change "siAdeno" to "SiAdeno."

      Thank you for noting this, we corrected that.

      - Add processing and analysis of MiNEN sample to the methods section. It is not mentioned in the methods at all.

      As noted in the revised manuscript, the MiNEN sample was analyzed in the same way as the SiNET fresh samples.

      c) Supplementary Figures:

      Figure S1: Change (A-H) to (A-I) to account for all panels in the figure.

      Figure S4: Add (C) after "the siAdeno sample" in the legend.

      Thank you for noting this, we corrected that.

      (4) Font size is quite small in the main figures.

      We enlarged the font in selected figure panels.

      Reviewer #2 (Recommendations for the authors):

      (1) The small number of samples used in some analyses affects the robustness of the findings. Increasing the sample size or including more validation data could improve the statistical reliability and make the results more convincing. The authors should consider expanding the cohort size or integrating additional external datasets to increase statistical power.

      We agree with the reviewer that adding more samples would improve the reliability of the results. However, the external data that we found was not comparable enough to enable integration with our data, and we are unable to profile additional SiNET samples in our lab. We hope that future studies would support our results and extend them further.

      (2) The biological significance of differentially expressed genes needs more depth, limiting the insights into SiNET biology. The authors should perform a comprehensive pathway enrichment analysis and integrate findings with existing literature. Tools like Gene Set Enrichment Analysis (GSEA) or Overrepresentation Analysis (ORA) could provide a more holistic view of altered biological processes.

      We thank the reviewer for this suggestion. We did examine the functional enrichment of differentially expressed genes and did not find additional enrichments that we felt were important to highlight beyond what we described. We report the genes in supplementary tables, enabling other researchers to examine these lists further. 

      (3) The unexpected finding of higher proliferation in non-malignant cells requires further investigation and plausible biological explanation. The authors should perform additional analyses to explore potential mechanisms, such as investigating cell cycle regulators or performing in vitro validation experiments. The authors should consider single-cell trajectory analysis to explore these highly proliferative non-malignant cells' potential differentiation or activation states.

      We agree that our results are descriptive and that we do not fully explain the mechanism for the high level of non-malignant cell proliferation. We did attempt to perform follow up computational analysis. These analyses raised the hypothesis that high levels of MIF are causing the proliferation of immune cells. Additional analyses that we performed were not sufficient to conclusively identify a mechanism, and we felt that they were not informative enough to be included in the manuscript. Further in vitro (or in vivo) studies are beyond the scope of the current work.

      (3) More details are required on methods used for p-value adjustment, and criteria for statistical significance should be clearly defined. Additionally, integrating scRNA-seq and snRNA-seq data needs a more thorough explanation, including batch effect mitigation and more explicit cell clustering representation. The authors should clearly describe p-value adjustments (e.g., FDR) and batch correction methods (e.g., Harmony, FastMNN integration) and include additional figures showing corrected UMAP plots or heatmaps post-batch correction to enhance the confidence in results.

      We now clarify in the Methods our use of FDR for p-value adjustments. As for batch correction, we have avoided the use of integration methods as we believe that they tend to distort the data and decrease tumor-specific signals. Instead, we primarily analyzed one tumor at a time and never directly compared cell profiles across distinct tumors but only compared the differences between subpopulations; specifically, we normalized the expression of NE cells by subtracting the expression of reference non-NE cells from the same tumor as a method to decrease batch effects. We now clarify this point in the Methods section.

      (4) The lack of analysis of interactions between different cell types limits understanding of tumor microenvironment dynamics. The authors should employ cell-cell interaction analysis tools (e.g., CellPhoneDB, NicheNet) to explore potential communication networks within the tumor microenvironment. This could provide valuable insights into how different cell types influence tumor progression and maintenance.

      We thank the reviewer for this suggestion. We have tried to use such methods but found the results difficult to interpret since these approaches generated very long lists of potential cell-cell interactions that are largely not unique to the SiNET context and their relevance remains unclear without follow up experiments, which are beyond the scope of this work. We therefore focused only on ligand/receptors that came up robustly through specific analyses such as the differences between SiNET subtypes. In particular, MIF is highly expressed in the epithelial subtype, and remarkably, MIF upregulation is shared across multiple cell types. Thus, the cell-cell interactions that are suggested by the SiNET data as somewhat unique to this context are those involving MIF and its receptor (CD74 on immune cell types), while other interactions detected by the proposed methods primarily reflect the generic ligand/receptors expressed by corresponding TME cell types.   

      Reviewer #3 (Recommendations for the authors):

      (1) For a relatively small dataset, the mixing of single-cell versus single-nucleus RNA-seq should be discussed more. It would be nice to have 1-2 tumors that are analyzed by both methods to compare and increase our understanding of how these different approaches may affect the results. This could be accomplished by splitting a fresh tumor into two parts, processing it fresh for single-cell RNA-seq, and freezing the other part for single-nucleus RNA-seq.

      We agree with the reviewer that the different techniques may bias our results and we refer to this limitation in the Results and Discussion sections. However, it is important to note that we do not directly integrate the primary data across these modalities, but rather analyze each tumor separately and only combine the results across tumors. For example, we first compare the NE cells from each tumor to control non-NE cells from the same tumor and then only compare the sets of NE-specific genes across tumors. Moreover, the subtypes that we detect cannot be explained by these modalities, as the first subtype contains samples from both methods and these subtypes are further demonstrated in external bulk data. Similarly, the results regarding low proliferation of NE cells and high proliferation of B/plasma cells are observed across both modalities. We therefore argue that while the combination of methods is a limitation of this work it does not account for the main results.  

      (2) The authors state that they defined the siNET transcriptomic signature by comparing their siNET single-cell/nucleus data to other NETs profiled by bulk RNA-seq. Some of the genes in the signature, such as CHGA, are widely used as markers for NETs (and not specific for siNET). The authors should address this in more detail.

      To define the SiNET transcriptomic signature we first analyzed each tumor separately and compared the expression of Neuroendocrine (NE) cells to that of non-NE cells to detect NE-specific genes. Next, we compared the lists of NE-specific genes across the 8 SiNET patients and found a subset of 26 genes which were shared across most of the analyzed SiNET samples (Fig. 2a). Thus, the signature was defined only from analysis of SiNETs and not based on comparison to other types of NETs and hence it is expected that the signature could contain both SiNET-specific genes and more generic NET genes such as CHGA.

      Only after defining this signature, we went on to compare it between SiNETs and other types of NETs (pancreatic and rectal) based on external bulk RNA-seq data. In this comparison, we observed that the signature was clearly higher in SiNETs than in the other NETs (Fig. 2b). This result supports the accuracy of the signature and further suggests that it contains a fraction of SiNET-specific genes and not only generic NET genes such as CHGA. Thus, we would expect this signature to perform well also for distinguishing between SiNET and types of NETs, but it does contain a subset of genes that would be high in the other NETs. Finally, we note that even though CHGA is a generic NET marker, the bulk RNA-seq data would suggest that, at least at the mRNA level, this gene is still higher expressed in SiNETs than in other NETs. To avoid confusion regarding the definition and specificity of the SiNET transcriptomic signature we have extended the description of this section in the revised manuscript.

      (3) The authors only compare their data to bulk transcriptomic data on NETs. While in some instances this makes sense given the bulk dataset has >80 tumors, they should at least cite and do some comparison to other published single-cell RNA-seq datasets of NETs (e.g., PMID: 37756410, 34671197). The former study listed has 3 siNETs, 4 pNETs, and 1 gNET. Do the epithelial-like and neuronal-like signatures show up in this dataset too?

      We examined these studies but concluded that their data was inadequate to identify the two SiNET subtypes. The latter study was of pNETs, while the former study had 3 SiNET samples but only from 2 patients, and furthermore it was enriching for immune cells with only very low amounts of NE cells. Therefore, we now cite this work in the discussion but cannot use it to extend the results from our work.

      (4) How did the authors statistically handle patients with more than one tumor sample (true for n = 2)? These tumor samples would not be truly independent.

      In both cases where we had two distinct samples of the same patient, only one sample had sufficient NE cells to be included in NE-related analysis and therefore the other samples (SiNET3 and SiNET6) were excluded from all analysis of NE differential expression and subtypes. These samples were only included in the initial analysis (Fig. 1) and in TME-related analysis (Fig. 3-4) in which there was no statistical analysis of differences between patients and hence no problem with the inclusion of 2 samples for the same patient. We clarified this issue in the revised version.

      (5) The association between siNET subtype and B/plasma cell proliferation is very interesting, as is the hypothesis regarding MIF signaling. It would be illuminating for the authors to perform cell-cell interaction analyses with methods such as CellChat in this context rather than just relying on DE. Spatial mapping would be helpful too and while this may be outside the scope of this study, it should at least be expounded upon in the Discussion section.

      Indeed, spatial transcriptomic analysis would add interesting insight to our data and to SiNET biology. Unfortunately, this is not within the scope of the current project but we note this interesting possibility in the Discussion. Regarding additional methods for cell-cell interactions, we have performed such analysis but found it not informative as it highlighted a large number of interactions that are not unique SiNETs and are difficult to interpret, and therefore we do not include this in the revised version. 

      (6) The authors note that in the mixed lung tumor, the NE component was more proliferative than that observed with siNETs. How does the proliferation compare to pNETs, gNETs, in other published studies? How about assessing the clonality of the SCC and LNET malignant cells with various genomic or combined genomic/transcriptomic methods?

      The percentage of proliferating NE cells in the mixed lung tumor was higher than 60%. This is extremely high, approximately four-fold higher than the average that we found in a pan-cancer analysis and higher than the average of any of the >20 cancer types that we analyzed (Gavish et al. 2023, ref. #15). This remarkably high proliferation serves as a control for the low proliferation that we found in SiNET NE cells.

      (7) In the Discussion on page 13, the authors write "Second, proliferation of NE cells may be inhibited by prior treatments with somatostatin analogues." How many patients were treated in this manner? This information should be made more explicit in the manuscript.

      Details on pretreatment with somatostatin analogues are provided in Table S1. All patients were pre-pretreated with somatostatin analogues, with the possible exception of one patient (P8, SiNET10) for which we could not confidently obtain this information.

      (8) On page 5, "bone-fide" is misspelled.

      (9) On page 8, "exact identify" is misspelled.

      We thank the reviewer and have corrected the typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors provide a study among healthy individuals, general medical patients and patients receiving haematopoietic cell transplants (HCT) to study the gut microbiome through shotgun metagenomic sequencing of stool samples. The first two groups were sampled once, while the patients receiving HCT were sampled longitudinally. A range of metadata (including current and previous (up to 1 year before sampling) antibiotic use) was recorded for all sampled individuals. The authors then performed shotgun metagenomic sequencing (using the Illumina platform) and performed bioinformatic analyses on these data to determine the composition and diversity of the gut microbiota and the antibiotic resistance genes therein. The authors conclude, on the basis of these analyses, that some antibiotics had a large impact on gut microbiota diversity, and could select opportunistic pathogens and/or antibiotic resistance genes in the gut microbiota.

      Strengths:

      The major strength of this study is the considerable achievement of performing this observational study in a large cohort of individuals. Studies into the impact of antibiotic therapy on the gut microbiota are difficult to organise, perform and interpret, and this work follows state-of-the-art methodologies to achieve its goals. The authors have achieved their objectives and the conclusion they draw on the impact of different antibiotics and their impact on the gut microbiota and its antibiotic resistance genes (the 'resistome', in short), are supported by the data presented in this work.

      Weaknesses:

      The weaknesses are the lack of information on the different resistance genes that have been identified and which could have been supplied as Supplementary Data.

      We have now supplied a list of individual resistance genes as supplementary data.

      In addition, no attempt is made to assess whether the identified resistance genes are associated with mobile genetic elements and/or (opportunistic) pathogens in the gut. While this is challenging with short-read data, alternative approaches like long-read metagenomics, Hi-C and/or culture-based profiling of bacterial communities could have been employed to further strengthen this work.

      We agree this is a limitation, and we now refer to this in the discussion. Unfortunately we did not have funding to perform additional profiling of the samples that would have provided more information about the genetic context of the AMR genes identified.

      Unfortunately, the authors have not attempted to perform corrections for multiple testing because many antibiotic exposures were correlated.

      The reviewer is correct that we did not perform formal correction for multiple testing. This was because correlation between antimicrobial exposures meant we could not determine what correction would be appropriate and not overly conservative. We now describe this more clearly in the statistical analysis section.

      Impact:

      The work may impact policies on the use of antibiotics, as those drugs that have major impacts on the diversity of the gut microbiota and select for antibiotic resistance genes in the gut are better avoided. However, the primary rationale for antibiotic therapy will remain the clinical effectiveness of antimicrobial drugs, and the impact on the gut microbiota and resistome will be secondary to these considerations.

      We agree that the primary consideration guiding antimicrobial therapy will usually be clinical effectiveness. However antimicrobial stewardship to minimise microbiome disruption and AMR selection is an increasingly important consideration, particularly as choices can often be made between different antibiotics that are likely to be equally clinically effective.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript by Peto et al., the authors describe the impact of different antimicrobials on gut microbiota in a prospective observational study of 225 participants (healthy volunteers, inpatients and outpatients). Both cross-sectional data (all participants) and longitudinal data (a subset of 79 haematopoietic cell transplant patients) were used. Using metagenomic sequencing, they estimated the impact of antibiotic exposure on gut microbiota composition and resistance genes. In their models, the authors aim to correct for potential confounders (e.g. demographics, non-antimicrobial exposures and physiological abnormalities), and for differences in the recency and total duration of antibiotic exposure. I consider these comprehensive models an important strength of this observational study. Yet, the underlying assumptions of such models may have impacted the study findings (detailed below). Other strengths include the presence of both cross-sectional and longitudinal exposure data and the presence of both healthy volunteers and patients. Together, these observational findings expand on previous studies (both observational and RCTs) describing the impact of antimicrobials on gut microbiota.

      Weaknesses:

      (1) The main weaknesses result from the observational design. This hampers causal interpretation and corrects for potential confounding necessary. The authors have used comprehensive models to correct for potential confounders and for differences between participants in duration of antibiotic exposure and time between exposure and sample collection. I wonder if some of the choices made by the authors did affect these findings. For example, the authors did not include travel in the final model, but travel (most importantly, south Asia) may result in the acquisition of AMR genes [Worby et al., Lancet Microbe 2023; PMID 37716364). Moreover, non-antimicrobial drugs (such as proton pump inhibitors) were not included but these have a well-known impact on gut microbiota and might be linked with exposure to antimicrobial drugs. Residual confounding may underlie some of the unexplained discrepancies between the cross-sectional and longitudinal data (e.g. for vancomycin).

      We agree that the observational design means there is the potential for confounding, which, as the reviewer notes, we attempt to account for as far as possible in the multivariable models presented. We cannot exclude the possibility of residual confounding, and we highlight this as a limitation in the  discussion. We have expanded on this limitation, and mention it as a possible explanation for inconsistencies between longitudinal and cross sectional models. Conducting randomised trials to assess the impacts of multiple antimicrobials in sick, hospitalised patients would be exceptionally difficult, and so it is hard to avoid reliance on observational data in these settings.

      We did record participants’ foreign travel and diet, but these exposures were not included in our models as they were not independently associated with an impact on the microbiome and their inclusion did not materially affect other estimates. However, because most participants were recruited from a healthcare setting, few had recent foreign travel and so this study was not well powered to assess the effects of travel on AMR carriage. We have added this as a limitation.

      In addition, the authors found a disruption half-life of 6 days to be the best fit based on Shannon diversity. If I'm understanding correctly, this results in a near-zero modelled exposure of a 14-day-course after 70 days (purple line; Supplementary Figure 2). However, it has been described that microbiota composition and resistome (not Shannon diversity!) remain altered for longer periods of time after (certain) antibiotic exposures (e.g. Anthony et al., Cell Reports 2022; PMID 35417701). The authors did not assess whether extending the disruption half-life would alter their conclusions.

      The reviewer is correct that the best fit disruption half-life of 6 days means the model assumes near-zero exposure by 70 days. We appreciate that antimicrobials can cause longer-term disruption than is represented in our model, and we refer to this in the discussion (we had cited two papers supporting this, and we are grateful for the additional reference above, which we have added). We agree that it is useful to clarify that the longer term effects may be seen in individual components of the microbiome or AMR genes, but not in overall measures of diversity, so have added this to the discussion.

      (2) Another consequence of the observational design of this study is the relatively small number of participants available for some comparisons (e.g. oral clindamycin was only used by 6 participants). Care should be taken when drawing any conclusions from such small numbers.

      We agree. Although our participants received a large number of different antimicrobial exposures, these were dependent on routine clinical practice at our centre and we lack data on many potentially important exposures. We had mentioned this in relation to antimicrobials not used at our centre, and have now clarified in the discussion that this also limits reliability of estimates for antimicrobials that were rarely used in study participants.

      (3) The authors assessed log-transformed relative abundances of specific bacteria after subsampling to 3.5 million reads. While I agree that some kind of data transformation is probably preferable, these methods do not address the compositional data of microbiome data and using a pseudocount (10-6) is necessary for absent (i.e. undetected) taxa [Gloor et al., Front Microbiol 2017; PMID 29187837]. Given the centrality of these relative abundances to their conclusions, a sensitivity analysis using compositionally-aware methods (such as a centred log-ratio (clr) transformation) would have added robustness to their findings.

      We agree that using a pseudocount is necessary for undetected taxa, which we have done assuming undetected taxa had an abundance of 10<sup>-6</sup> (based on the lower limit of detection at the depth we sequenced). We refer to this as truncation in the methods section, but for clarity we have now also described this as a pseudocount.  Because our analysis focusses on major taxa that are almost ubiquitous in the human gut microbiome, a pseudocount was only used for 3 samples that had no detectable Enterobacteriaciae.

      We are aware that compositionally-aware methods are often used with microbiome data, and for some analyses these are necessary to avoid introducing spurious correlations. However the flaws in non-compositional analyses outlined in Gloor et al do not affect the analyses in this paper:

      (1) The problems related to differing sequence depths or inadequate normalisation do not apply to our dataset, as we took a random subset of 3.5 million reads from all samples (Gloor et al correctly point out that this method has the drawback of losing some information, but it avoids problems related to variable sequencing depth)

      (2) The remainder Gloor et al critiques multivariate analyses that assess correlations between multiple microbiome measurements made on the same sample, starting with a dissimilarity matrix. With compositional data these can lead to spurious correlations, as measurements on an individual sample are not independent of other measurements made on the same sample. In contrast, our analyses do not use a dissimilarity matrix, but evaluate the association of multiple non-microbiome covariates (e.g. antibiotic exposures, age) with single microbiome measures. We use a separate model for each of 11 specified microbiome components, and display these results side-by side. This does not lead to the same problem of spurious correlation as analyses of dissimilarity matrices. However, it does mean that estimates of effects on each taxa outcome have to be interpreted in the context of estimates on the other taxa. Specifically, in our models, the associations of antimicrobial exposure with different taxa/AMR genes are not necessarily independent of each other (e.g. if an antimicrobial eradicated only one taxon then it would be associated with an increase in others). This is not a spurious correlation, and makes intuitive sense when using relative abundance as outcome. However, we agree this should be made more explicit.

      For these reasons, at this stage we would prefer not to increase the complexity of the manuscript by adding a sensitivity analysis.

      (4) An overall description of gut microbiota composition and resistome of the included participants is missing. This makes it difficult to compare the current study population to other studies. In addition, for correct interpretation of the findings, it would have been helpful if the reasons for hospital visits of the general medical patients were provided.

      We have added a summary of microbiome and resistome composition in the results section and new supplementary table 2), and we also now include microbiome and resistome profiles of all samples in the supplementary data. We also provide some more detail about the types of general medical patients included. We are not able to provide a breakdown of the initial reason for admission as this was not collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Provide a supplementary table with information on the abundance of individual genes in the samples.

      This supplementary data is now included.

      (2) Engage with an expert in statistics to discuss how statistical analyses can be improved.

      A experienced biostatistician has been involved in this study since its conception, and was involved in planning the analysis and in the responses to these comments.

      (3) Typos and other minor corrections:

      Methods: it is my understanding that litre should be abbreviated with a lowercase l.

      Different journals have different house styles: we are happy to follow Editorial guidance.

      p. 9: abuindance should be corrected to abundance.

      Corrected

      p. 9: relative species should be relevant species?  

      Yes, corrected. Thank you.

      p. 9 - 10: can the apparent lack of effect of beta-lactams on beta-lactamase gene abundance be explained by the focus on a small number of beta-lactamase resistance genes that are found in Enterobacteriaceae and which are not particularly prevalent, while other classes of resistance genes (e.g. Bacteroidal beta-lactamases) were excluded?

      It is possible that including other beta-lactamases would have led to different results, but as a small number of beta-lactamases in Enterobacteriaceae are of major clinical importance we decided to focus on these (already justified in the Methods). A full list of AMR genes identified is now provided in the supplementary data.

      p. 10: beta-lactamse should be beta-lactamase

      Corrected

      Figure 3A: could the data shown for tetracycline resistance genes be skewed by tetQ, which is probably one of the most abundant resistance genes in the human gut and acts through ribosome protection?

      TetQ was included, but only accounted for 23% of reads assigned to tetracycline resistance genes so is unlikely to have skewed the overall result. We limited the analysis to a few major categories of AMR genes and, other than VanA, have avoided presenting results for single genes to limit the degree of multiple testing. We now include the resistome profile for each sample in the supplementary data so that readers can explore the data if desired.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the importance of obligate anaerobic gut microbiota for human health, it might be interesting to divide antibiotics into categories based on their anti-anaerobic activity and assess whether these antibiotics differ in their effects on gut microbiota.

      The large majority of antibiotics used in clinical practice have activity against aerobic bacteria and anaerobic bacteria, so it is not possible to easily categorise them this way. There are two main exceptions (metronidazole and aminoglycosides) but there was insufficient use of these drugs to clearly detect or rule out a difference between them, even when categorising antimicrobials by class, so we prefer not to frame the results in these terms. Also see our comments on this categorisation below.

      (2) For estimating the abundance of anaerobic bacteria, three major groups were assessed: Bacteroidetes, Actinobacteria and Clostridia. To me, this seems a bit aspecific. For example, the phylum Bacteroidetes contains some aerobic bacteria (e.g. Flavobacteriia). Would it be possible to provide a more accurate estimation of anaerobic bacteria?

      We think that an emphasis on a binary aerobic/anaerobic classification is less biologically meaningful that the more granular genetic classification we use, and its use largely reflects the previous reliance on culture-based methods for bacterial identification. Although some important opportunistic human pathogens are aerobic, it is not clear that the benefit or harm of most gut commensals relates to their oxygen tolerance, and all luminal bacteria exist in an anaerobic environment. As such we prefer not to perform an additional analysis using this category. We are also not sure that this could be done reliably, as many of the taxa are characterised poorly, or not at all.

      We appreciate that Bacteroidetes, Actinobacteria and Clostridia are diverse taxa that include many different species, so may seem non-specific, but these were chosen because:

      i) they are non-overlapping with Enterobacteriaceae and Enterococcus, the major opportunistic pathogens of clinical relevance, so could be used in parallel, and

      ii) they make up the large majority of the gut microbiome in most people and most species are of low pathogenicity, so it is plausible that their disruption might drive colonisation with more pathogenic organisms (or those carrying important AMR genes).

      We have more clearly stated this rationale.

      (3) A statement on the availability of data and code for analysis is missing. I would highly recommend public sharing of raw sequence data and R code for analysis. If possible, it would be very valuable if processed microbiome data and patient metadata could be shared.

      We agree, and these have been submitted as supplementary data. We have added the following statement “The data and code used to produce this manuscript are available in the supplementary material, including processed microbiome data, and pseudonymised patient metadata. The sequence data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB86785.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Cao et al. provides a compelling investigation into the role of mutational input in the rapid evolution of pesticide resistance, focusing on the two-spotted spider mite's response to the recent introduction of the acaricide cyetpyrafen. This well-documented introduction of the pesticide - and thus a clearly defined history of selection - offers a powerful framework for studying the temporal dynamics of rapid adaptation. The authors combine resistance phenotyping across multiple populations, extensive resequencing to track the frequency of resistance alleles, and genomic analyses of selection in both contemporary and historical samples. These approaches are further complemented by laboratory-based experimental evolution, which serves as a baseline for understanding the genetic architecture of resistance across mite populations in China. Their analyses identify two key resistance-associated genes, sdhB and sdhD, within which they detect 15 mutations in wild-collected samples. Protein modeling reveals that these mutations cluster around the pesticide's binding site, suggesting a direct functional role in resistance. The authors further examine signatures of selective sweeps and their distribution across populations to infer the mechanisms - such as de novo mutation or gene flow-driving the spread of resistance, a crucial consideration for predicting evolutionary responses to extreme selection pressure. Overall, this is a well-rounded, thoughtfully designed, and well-written manuscript. It shows significant novelty, as it is relatively rare to integrate broad-scale evolutionary inference from natural populations with experimentally informed bioassays, however, some aspects of the methods and discussion have an opportunity to be clarified and strengthened.

      Strengths:

      One of the most compelling aspects of this study is its integration of genomic time-series data in natural populations with controlled experimental evolution. By coupling genome sequencing of resistant field populations with laboratory selection experiments, the authors tease apart the individual effects of resistance alleles along with regions of the genome where selection is expected to occur, and compare that to the observed frequency in the wild populations over space and time. Their temporal data clearly demonstrates the pace at which evolution can occur in response to extreme selection. This type of approach is a powerful roadmap for the rest of the field of rapid adaptation.

      The study effectively links specific genetic changes to resistance phenotypes. The identification of sdhB and sdhD mutations as major drivers of cyetpyrafen resistance is well-supported by allele frequency shifts in both field and experimental populations. The scope of their sampling clearly facilitated the remarkable number of observed mutations within these target genes, and the authors provide a careful discussion of the likelihood of these mutations from de novo or standing variation. Furthermore, the discovered cross-resistance that these mutations confer to other mitochondrial complex II inhibitors highlights the potential for broader resistance management and evolution.

      Weaknesses:

      (1) Experimental Evolution:

      - Additional information about the lab experimental evolution would be useful in the main text. Specifically, the dose of cyetpyrafen used should be clarified, especially with respect to the LD50 values. How does it compare to recommended field doses? This is expected to influence the architecture of resistance evolution. What was the sample size? This will help readers contextualize how the experimental design could influence the role of standing variation.

      The experimental design involved sampling approximately 6,000 individuals from the wild population ZJSX1, which were subsequently divided into two parallel cohorts under controlled laboratory conditions. The selection group (LabR) was subjected to continuous selection pressure using cyetpyrafen, while the control group (LabS) was maintained under identical laboratory conditions without exposure to acyetpyrafen. A dynamic selection regime was implemented wherein the acaricide dosage was systematically adjusted every two generations to maintain a consistent selection intensity, achieving a mortality rate of 60% ± 10% in the LabR population. This adaptive dosage strategy ensured sustained evolutionary pressure while preventing population collapse. The LC<sub>50</sub> values were tested at F1, F32, F54, F60, F62, and F66 generations using standardized bioassay protocols to quantify resistance development trajectories and optimize dosage for subsequent selection cycles. We provided the additional information in subsection 4.1 of the materials and methods section.

      - The finding that lab-evolved strains show cross-resistance is interesting, but potentially complicates the story. It would help to know more about the other mitochondrial complex II inhibitors used across China and their impact on adaptive dynamics at these loci, particularly regarding pre-existing resistance alleles. For example, a comparison of usage data from 2013, 2017, and 2019 could help explain whether cyetpyrafen was the main driver of resistance or if previous pesticides played a role. What happened in 2020 that caused such rapid evolution 3 years after launch?

      Although the introduction of the other two SDHI acaricides complicates the story, we would like to provide a complete background on the usage of acaricides with this mode of action in China. Although cyflumetofen was released in 2013 before cyetpyrafen, and cyenopyrafen was released in 2019 after cyetpyrafen, their market share is minor (about 3.2%) compared to cyetpyrafen (about 96.8%, personal communication). Since cross-resistance is reported among SDHIs, we could not exclude the contribution of cyflumetofen to the initial accumulation of resistance alleles, but the effect should be minor, both because of their minimal market share and because of the independent evolution of resistance in the field as found in our study. Although the contribution of cyflumetofen and cyenopyrafen cannot be entirely excluded, the rapid evolution of resistance seems likely to be mainly explained by the intensive application of cyetpyrafen. To clarify this issue, we added relevant information in the first paragraph of the discussion section.

      (2) Evolutionary history of resistance alleles:

      - It would be beneficial to examine the population structure of the sampled populations, especially regarding the role of migration. Though resistance evolution appears to have had minimal impact on genome-wide diversity (as shown in Supplementary Figure 2), could admixture be influencing the results? An explicit multivariate regression framework could help to understand factors influencing diversity across populations, as right now much is left to the readers' visual acuity.

      The genetic structure of the populations was examined by Treemix analysis. We detected only one migration event from JXNC to SHPD (no resistance data available for these two populations), suggesting a limited role for migration to resistance evolution. The multiple regression analysis revealed that overall genetic diversity and Tajima’s D across the genome were not significantly associated with resistance levels, genetic structure or geographic coordinates (P > 0.05), which all support a limited role of migration in resistance development.

      - It is unclear why lab populations were included in the migration/treemix analysis. We might suggest redoing the analysis without including the laboratory populations to reveal biologically plausible patterns of resistance evolution.

      Thank you for the constructive suggestion. The Treemix analysis was redone by removing laboratory populations and is now reported.

      - Can the authors explore isolation by distance (IBD) in the frequency of resistance alleles?

      Thank you for the constructive suggestion. No significant isolation-by-distance pattern was detected for resistance allele frequencies across all surveyed years (2020: P=0.73; 2021: P=0.52; 2023: P=0.16; Mantel test). We added these results to the text.

      - Given the claim regarding the novelty of the number of pesticide resistance mutations, it is important to acknowledge the evolution of resistance to all pesticides (antibiotics, herbicides, etc.). ALS-inhibiting herbicides have driven remarkable repeatability across species based on numerous SNPs within the target gene.

      We appreciate this comment, which highlights the need to place our findings within the broader evolutionary context of pesticide resistance. We have investigated references relevant to the evolution of resistance to diverse pesticides. As far as we can tell, the 15 target mutations in eight amino acid residues are among the highest number of pesticide resistance mutations detected, especially within the context of animal studies. We have added relevant text to the second paragraph of the discussion.

      - Figure 5 A-B. Why not run a multivariate regression with status at each resistance mutation encoded as a separate predictor? It is interesting that focusing on the predominant mutation gives the strongest r2, but it is somewhat unintuitive and masks some interesting variation among populations.

      We conducted a multiple regression analysis to explore the influence of multiple mutations on resistance levels of field populations. However of 15 putative resistant mutations, only five were detected in more than three populations where bioassay data are available, i.e. I260T, I260V, D116G, R119C, R119L. The frequency of three of these mutations, I260T (P = 0.00128), I260V (P = 0.00423) and D116G (P = 0.00058), are significantly correlated with the resistance level of field populations. This has been added.

      (3) Haplotype Reconstruction (Line 271-):

      - We are a bit sceptical of the methods taken to reconstruct these haplotypes. It seems as though the authors did so with Sanger sequencing (this should be mentioned in the text), focusing only on homozygous SNPs. How many such SNPs were used to reconstruct haplotypes, along what length of sequence? For how many individuals were haplotypes reconstructed? Nonetheless, I appreciated that the authors looked into the extent to which the reconstructed haplotypes could be driven by recombination. Can the authors elaborate on the calculations in line 296? Is that the census population size estimate or effective?

      Because haplotypes could not be determined when more than two loci were heterozygous, we detected haplotypes from sequencing data with at most one heterozygous locus. In total 844 individuals and 696 individuals were used to detect haplotypes of sdhB and sdhD. We detected 11 haplotypes (with 8 SNPs) and 24 haplotypes (with 11 SNPs) along 216 bp of the sdhB and 155 bp of the sdhD genes, respectively. Please see the fifth paragraph of subsection 2.4. We used ρ = 4 × Ne × d (genetic distance) (Li and Stephens, 2003) to calculate the number of effective individuals for one recombination event.

      (4) Single Mutations and Their Effect (line 312-):

      - It's not entirely clear how the breeding scheme resulted in near-isogenic lines. Could the authors provide a clearer explanation of the process and its biological implications?

      To investigate the effect of single mutations or their combination on resistance levels, we isolated the females and males with the same homozygous/ hemizygous genotypes for creating homozygous lines. Females from these lines were not near-isogenic, but homozygous for the critical mutations. We revised the description in the methods section to clearly define these lines.

      - If they are indeed isogenic, it's interesting that individual resistance mutations have effects on resistance that vary considerably among lines. Could the authors run a multivariate analysis including all potential resistance SNPs to account for interactions between them? Given the variable effects of the D116G substitution (ranging from 4-25%), could polygenic or epistatic factors be influencing the evolution of resistance?

      We couldn’t conduct multivariate analysis because most lines have only one resistant SNP. The four lines homozygous for 116G were from the same population. The variable mortality may reflect other unknown mechanisms but these are beyond the scope of this study.

      - Why are there some populations that segregate for resistance mutations but have no survival to pesticides (i.e., the green points in Figure 5)? Some discussion of this heterogeneity seems required in the absence of validation of the effects of these particular mutations. Could it be dominance playing a role, or do the authors have some other explanation?

      We didn’t investigate the degree of dominance of each mutation. The mutation I260V shows incompletely dominant inheritance (Sun, et al. 2022). To investigate survival rate of different populations, the two-spotted spider mite T. urticae was exposed to 1000 mg/L of cyetpyrafen, higher than the recommended field dose of 100 mg/L. Such a high concentration may lead to death of an individual heterozygous for certain mutations, such as I260V.

      - The authors mention that all resistance mutations co-localized to the Q-site. Is this where the pesticide binds? This seems like an important point to follow their argument for these being resistance-related.

      Yes. We revised Fig. 3c to show the Q-site.

      (5) Statistical Considerations for Allele Frequency Changes (Figure 3):

      - It might be helpful to use a logistic regression model to assess the rate of allele frequency changes and determine the strength of selection acting on these alleles (e.g., Kreiner et al. 2022; Patel et al. 2024). This approach could refine the interpretation of selection dynamics over time.

      Thank you for this suggestion. A logistic regression model was used to track allele frequencies trajectories. The selection coefficient of each allele and their joint effects were estimated.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the evolution of pesticide resistance in the two-spotted spider mite following the introduction of an SDHI acaricide, cyatpyrafen, in China. The authors make use of cyatpyrafen-naive populations collected before that pesticide was first used, as well as more recent populations (both sensitive and resistant) to conduct comparative population genomics. They report 15 different mutations in the insecticide target site from resistant populations, many reported here for the first time, and look at the mutation and selection processes underlying the evolution of resistance, through GWAS, haplotype mapping, and testing for loss of diversity indicating selective sweeps. None of the target site mutations found in resistant populations was found in pre-exposure populations, suggesting that the mutations may have arisen de novo rather than being present as standing variation, unless initially present at very low frequencies; a de novo origin is also supported by evidence of selective sweeps in some resistant populations. Furthermore, there is no significant evidence of migration of resistant genotypes between the sampled field populations, indicating multiple origins of common mutations. Overall, this indicates a very high mutation rate and a wide range of mutational pathways to resistance for this target site in this pest species. The series of population genomic analyses carried out here, in addition to the evolutionary processes that appear to underlie resistance development in this case, could have implications for the study of resistance evolution more widely.

      Strengths:

      This paper combines phenotypic characterisation with extensive comparative population genomics, made possible by the availability of multiple population samples (each with hundreds of individuals) collected before as well as after the introduction of the pesticide cyatpyrafen, as well as lab-evolved lines. This results in findings of mutation and selection processes that can be related back to the pesticide resistance trait of concern. Large numbers of mites were tested phenotypically to show the levels of resistance present, and the authors also made near-isogenic lines to confirm the phenotypic effects of key mutations. The population genomic analyses consider a range of alternative hypotheses, including mutations arising by de novo mutation or selection from standing genetic variation, and mutations in different populations arising independently or arriving by migration. The claim that mutations most likley arose by multiple repeated de novo mutations is therefore supported by multiple lines of evidence: the direct evidence of none of the mutations being found in over 2000 individuals from naive populations, and the indirect evidence from population genomics showing evidence of selective sweeps but not of significant migration between the sampled populations.

      Weaknesses:

      As acknowledged within the discussion, whilst evidence supports a de novo origin of the resistance-associated mutations, this cannot be proven definitively as mutations may have been present at a very low frequency and therefore not found within the tested pesticide-naive population samples.

      We agree that we could not definitively exclude the presence of a very low incidence of favoured mutations before the introduction of this novel acaricide.

      Near-isofemale lines were made to confirm the resistance levels associated with five of the 15 mutations, but otherwise, the genotype-phenotype associations are correlative, as confirmation by functional genetics was beyond the scope of this study.

      We hope that future functional studies will validate the effects of these mutations on resistance in both the two-spotted spider mite T. urticae and other spider mite species. This could be done by creating near-isogenic female lines or using CRISPR-Cas9 technology, as gene knockouts have recently been established for T. urticae.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Could the authors elaborate on the environmental context (e.g., climate, geography) of the sampled populations to give more nuance to the analysis of genetic differentiation and resistance evolution?

      We have explored the influence of geographic isolation on the frequency of resistance alleles by Mantel tests (isolation by distance). We didn’t investigate the influence of climate, because most of the samples were from greenhouses, where the climate to which the pest is exposed is unclear.

      (2) Line 161: is this supposed to be one R and one S?

      Yes, we added this information (LabR and LabS).

      (3) Line 207: variation is not saturated at the first two sites because the different combinations are not seen. This is a bit misleading.

      What we wanted to indicate was that the two codon positions are saturated, rather than their combinations. We revised this sentence by adding “of each codon position”.

      (4) Line 376: continuous selection did not "result in a new mutation arising". Rather, the mutation arose and was subsequently selected on.

      We revised the expression of this de novo mutation and selection process.

      (5) Line 402: can the authors explore what Ne would be necessary to drive the number of mutational origins they observe, as in (Karasov et al. 2010)?

      It is challenged to estimate Ne, especially when mutation rate data from the two-spotted spider mite T. urticae is unavailable. We observed 2.7 resistant mutations per population in samples collected in 2024, seven years after the release of cyetpyrafen. The estimated mutation rate (Θ) is  0.0193, given 20 generations per year for T. urticae. An effective population size (Ne) of 2.29*10<sup>6</sup> would be necessary to reach the number of de novo mutations observed in this study, given Θ  =  3Neμ (haplodiploid sex determination of T. urticae) and a mutation rate of μ  =  2.8*10<sup>-9</sup> per base pair per generation as estimated for Drosophila melanogaster (Keightley et al., 2014). The high reproductive capacity of T. urticae (> 100 eggs per female) and short generation time makes it easier to reach such a population size in the field as we now note.

      (6) Line 482: how did the authors precisely kill 60% of samples with their selection? What was the applied rate? In general, listing the rates of insecticide used in dose response would be useful to decipher if LD50s are projected outside of the doses used (seems like they are). In this case, authors should limit their estimates to those > the highest rate used in the dose response.

      It is difficult to control mortality precisely. We applied cyetpyrafen every two generations but did not determine the LC<sub>50</sub> every two generations. When mortality was lower than 60%, another round of spraying was applied by increasing the dosage of the pesticide. The LC<sub>50</sub> values were tested at F<sub>1</sub>, F<sub>32</sub>, F<sub>54</sub>, F<sub>60</sub>, F<sub>62</sub>, and F<sub>66</sub> generations to establish the trajectories around resistance.

      (7) The light pink genomic region in Figure 2 was distracting. Why is it included if there is no discussion of genomic regions outside the sdh genes? Generally, there was a lot going on in this figure, and some guiding categories (i.e., lab selected vs wild population) on the figure itself could help orient the reader.

      We included chromosome 2 colored in light pink/ red to show the selection signal across a wider genomic region. In the figure legend, we added a description of the lab selected, field resistant and field susceptible populations. Very little common selection signal was detected among resistant populations on chromosome 2, indicating this region was less likely to be involved in resistance evolution of T. urticae to cyetpyrafen. We also described the result briefly in the figure legend.

      Reviewer #2 (Recommendations for the authors):

      (1) The most significant aspect of this study is the use of multiple pest population samples taken before as well as after the introduction of a class of pesticides, allowing a thorough comparative population genomics study in a species where a range of resistance mutations have appeared within a few years. I would prefer to see a title conveying this significance, rather than the current study, which focuses on the total number of mutations and claimed notoriety of the (at that point unnamed) study species. Similarly, I would prefer an abstract that relies less on superlative claims and includes more details: the scientific name of the study species; the number of years in which resistance evolved; the number of historical specimens; how the resistance levels for single mutations were shown.

      (1) The title was changed by adding “the two-spotted spider mite Tetranychus urticae” and removing the “unprecedented number” to emphasize that “recurrent mutations drive rapid evolution”, i.e., “Recurrent Mutations Drive the Rapid Evolution of Pesticide Resistance in the Two-spotted Spider Mite Tetranychus urticae.”

      (2) The scientific name of the study species was added.

      (3) The number of years in which resistance evolved was added.

      (4) The number of historical specimens was added (2666).

      (5) Because we used homozygous lines but not iso-genic lines or gene-edited lines, our bioassay data could not provide direct evidence on the level of resistance conferred by each mutation. We revised our description of the results and removed this content from the abstract.

      Line 29: if you want to claim the number is unprecedented, please specify the context: unprecedented for a pesticide target in an arthropod pest? (more resistance mutations may have been found in bacteria/fungi...).

      We revised the sentence by adding “in an arthropod pest”.

      Line 30: rather than a claim of notoriety, it may be better to specify what damage this pest causes.

      Revised by describing it as an arthropod pest.

      Line 34: please clarify, was this all in different haplotypes, or were some mutations found in combination?

      Done: We identified 15 target mutations, including six mutations on five amino acid residues of subunit sdhB, and nine mutations on three amino acid residues of subunit sdhD, with as many as five substitutions on one residue.

      (2) The introduction begins by framing the context as resistance evolution in invertebrate pests. However, the evolutionary processes examined in the study are applicable to resistance in other systems, and potentially to other cases of rapid contemporary evolution. The authors could show wider significance for their work beyond the subfield of invertebrate pests by including more of this wider context in their introduction and discussion: even if this means they can no longer claim novelty based on the number of mutations alone, the study is a strong example of the use of population genomics combined with functional and phenotypic characterisation to investigate the evolutionary processes underlying the emergence of resistance, so could have wider importance than within its current framing.

      The background was revised as mentioned above to take this into account.

      For example, in lines 48-50, please clarify what is meant by pesticides here (insects/arthropods? weeds and pathogens too?) In lines 69-73, the opposite is sometimes seen in fungal pathogens, with large numbers of mutations generated in lab-evolved strains.

      We extended pesticides to those targeting arthropods, weeds and pathogens. We still emphasize the situation mainly with respect to arthropod pests.

      (3) Lines 91-93: how many modes of action? How recently were SDHI acaricides introduced?

      Added: at least 11 groups of acaricides based on their modes of action. SDHI was launched in 2007.

      (4) Line 98-102: Use in China is a useful background for the study populations, but the global context should be included too.

      Yes, four SDHI acaricides developed around the globe were introduced.

      (5) Line 113: They show diverse mutations, but all within the mechanism of target-site point mutations.

      We agree to your suggestion. This sentence has been removed as it repeats information stated above it.

      (6) Line 115-116: Yes, agreed; I think this is the main strength of the current study and should be emphasised sooner.

      Thanks.

      (7) Line 158: Selective sweep signals were clear in half of the resistant populations but not in the others. The suggestion that the others had undergine soft sweeps, with multiple mutations increasing in frequency simultaneously but no one reaching fixation, seems reasonable; but the authors could compare the populations that did show a sweep with those that did not (for example, was there greater diversity or evenness of genotypes in those that did not?).

      Five resistant populations with selection signals identified by PBE analysis (Figure 2b) showed corresponding decreases in π and Tajima’s D near the two SDH genes but not across the genome (Figure S1).

      (8) Line 313: please clarify "in combination with other mutations" within a mixed population or combined in one individual/haplotype? Also, the phrase "characterised the function" may be a little misleading, as this is a correlative analysis, not functional confirmation.

      None of the combinations of different resistant mutations was observed in a single haplotype. Here, we examine resistance levels associated with a single mutation or two mutations on sdhB and sdhD in one individual, i.e. sdhB_I260V and sdhD_R119C. We revised the sentences to avoid any implication of functional confirmation.

      (9) Line 358: again, please clarify the context: among arthropod pests?

      Done.

      (10) Line 360-363: please give some background on when and where these related compounds were introduced.

      Added.

      (11) Line 410: yes fitness costs may be a factor, but you could also give an example of a cost expressed in the absence of any pesticides, as well as the given example of negative cross-resistance.

      We added the example of the H258Y mutation which causes both fitness costs and negative cross-resistance.

      (12) Lines 419-438: this is one aspect where the situation for insecticides is in contrast with some other resistance areas.

      Yes, we restricted these statements to arthropod pests.

      (13) Line 466: some more detail could be given here: for example, SNP-specific monitoring would be less effective, but amplicon sequencing would be more suitable.

      Yes, revised.

      (14) Lines 472-475: Please list the numbers of field/lab, pre/post exposure, and sensitive/resistant populations within the main text.

      Done. The number of sensitive/resistant populations was reported in the result section.

      (15) Line 483: randomly selected individuals?

      Yes, added randomly selected individuals.

      (16) Line 556: Sanger sequencing to characterise populations? Or a number of individuals from each population?

      Revised.

      (17) References: there are some duplicate entries, please check this.

      Checked.

      (18) Figure 1e: consider a log(10) scale to better show large fold changes and avoid multiple axis breaks.

      Thanks for your suggestions. However we didn’t scale the LC<sub>50</sub> value, because we wanted to show the specific impact of 1,000 mg/L. The breaks in the Y axis around 30 mg/L -1,000 mg/L reveal that the LC50s of the resistant populations were all greater than 1000 mg/L, while those of the susceptible populations were all below 30 mg/L. This justified the use 1000 mg/L as a discriminating dose to investigate resistance status and level in subsequent work.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for the GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Comments on latest version:

      The authors have attempted to address my initial concerns with additional experiments and refutations. Unfortunately, my concerns, especially my specific comments 1-3, remain unaddressed. The present manuscript is descriptive and fails to describe the molecular mechanism by which Sakura exerts its function in the germline. Nevertheless, this reviewer acknowledges that the observed defects in sakura mutant ovaries and the possible physiological significance of the Sakura-Out interaction are worth sharing with the research community, as they may lay the groundwork for future research in functional analysis.

      We thank the reviewer for valuable comments. We would like to investigate the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (named it sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. In this revised manuscript, the authors further investigated whether Sakura affects the function of Orb, a binding partner they identified, in deubiquitinase activity when Orb interacts with Bam.

      We appreciate the authors' efforts to address all our comments. While these revisions have greatly improved the clarity of certain sections, some of the concerns remain unclear, while details mentioned in the responses about these studies should be incorporated in the manuscript. Specifically, the manuscript still lacks the demonstration that Sakura co-localizes with Orb/Bam despite having the means for staining and visualization. This would bring insight into the selective binding of Orb with Bam vs. Sakura perhaps at different stages of oogenesis. Such analyses would allow for more specific conclusions, further alluding to the underlying mechanism, rather than the general observations currently presented.

      This elaborate study will be embraced by both germline-focused scientists and the developmental biology community.

      We thank the reviewer for valuable comments. We believe that the author meant Otu, not Orb, for the binding partner of Sakura that we identified. We would like to investigate the colocalization of Sakura with other proteins including Otu and the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field.

      Comments on latest version:

      With these revisions, the authors have addressed my main concerns.

      We thank the reviewer for valuable comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript is much improved based on the changes made upon recommendations from the reviewers.

      Though most of our comments have been addressed, we have a few more we wish to recommend. For previous points we made, we replied with further clarification for the authors.

      Figure 1

      (1) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      • Previous Fig1B (sakura mRNA expression level) is now Fig S2, not S1. Please make this data as Fig S1.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      (2) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      • The labels on lanes for Stages 12-13 and Stage 14, still only say "chambers", not "egg chambers". Also there is no Stage 1-3 egg chamber. More accurately, the label should be "Germarium - Stage 11 egg chambers".

      We updated the lables on lanes as suggested by the reviewer.

      (3) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakuranull phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      • Please put this info into the Methods section.

      We added this info into the Methods section.

      (4) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      • Please add this detail into the manuscript.

      We added this info into the Methods section.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer's point. We think using numbers, not %, makes more sense.

      • Having a different 'n' number for each experiment does not allow one to compare anything except numbers of the egg chambers. This must be normalized.

      We still don’t agree with the reviewer. In Fig 5D, we are showing the numbers of stage 14 oocytes per fly (= per a pair of ovaries). ‘n’ is the number of flies (= number of a pair of ovaries) examined. We now clarified this in the figure legend. Different ‘n’ number does not prevent us from comparing the numbers of stage 14 oocytes per fly. Therefore, we would like to show as it is now.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      • Please add this information to the manuscript.

      We added this info into the Methods section.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      • Current Fig S1 should go to Fig 7, to better understand the relationship between pMad and Bam expression.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      Figure 9C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      • Please add this info to the Methods section.

      We added this info into the Methods section.

      Figure 10- Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer's points. In our study, even for the full-length proteins. We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      • Based on your binding studies, we would expect them to colocalize in the egg chamber, and since there are antibodies and a GFP-line available, it would be important to demonstrate that via visualization.

      As we wrote in the response and now in the manuscript, our antibodies are not best for immunostaining. We will try to optimize the experimental conditions in the future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment 

      The authors utilize a valuable computational approach to exploring the mechanisms of memorydependent klinotaxis, with a hypothesis that is both plausible and testable. Although they provide a solid hypothesis of circuit function based on an established model, the model's lack of integration of newer experimental findings, its reliance on predefined synaptic states, and oversimplified sensory dynamics, make the investigation incomplete for both memory and internal-state modulation of taxis.  

      We would like to express our gratitude to the editor for the assessment of our work. However, we respectfully disagree with the assessment that our investigation is incomplete, if the negative assessment is primarily due to the impact of AIY interneuron ablation on the chemotaxis index (CI) which was reported in Reference [1]. It is crucial to acknowledge that the CI determined through experimental means incorporates contributions from both klinokinesis and klinotaxis [1]. It is plausible that the impact of AIY ablation was not adequately reflected in the CI value. Consequently, the experimental observation does not necessarily diminish the role of AIY in klinotaxis. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. These findings provide substantial evidence supporting the validity of the presented minimal neural network responsible for salt klinotaxis.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This research focuses on C. elegans klinotaxis, a chemotactic behavior characterized by gradual turning, aiming to uncover the neural circuit mechanism responsible for the context-dependent reversal of salt concentration preference. The phenomenon observed is that the preferred salt concentration depends on the difference between the pre-assay cultivation conditions and the current environmental salt levels. 

      We would like to express our gratitude for the time and consideration you have dedicated to reviewing our manuscript.

      The authors propose that a synaptic-reversal plasticity mechanism at the primary sensory neuron, ASER, is critical for this memory- and context-dependent switching of preference. They build on prior findings regarding synaptic reversal between ASER and AIB, as well as the receptor composition of AIY neurons, to hypothesize that similar "plasticity" between ASER and AIY underpins salt preference behavior in klinotaxis. This plasticity differs conceptually from the classical one as it does not rely on any structural changes but rather synaptic transmission is modulated by the basal level of glutamate, and can switch from inhibitory to excitatory. 

      To test this hypothesis, the study employs a previously established neuroanatomically grounded model [4] and demonstrates that reversing the ASER-AIY synapse sign in the model agent reproduces the observed reversal in salt preference. The model is parameterized using a computational search technique (evolutionary algorithm) to optimize unknown electrophysiological parameters for chemotaxis performance. Experimental validity is ensured by incorporating constraints derived from published findings, confirming the plausibility of the proposed mechanism. 

      Finally. the circuit mechanism allowing C. elegans to switch behaviour to an exploration run when starved is also investigated. This extension highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      We would like to thank the reviewer for the appropriate summary of our work. 

      Strengths and weaknesses: 

      The authors' approach of integrating prior knowledge of receptor composition and synaptic reversal with the repurposing of a published neuroanatomical model [4] is a significant strength. This methodology not only ensures biological plausibility but also leverages a solid, reproducible modeling foundation to explore and test novel hypotheses effectively.

      The evidence produced that the original model has been successfully reproduced is convincing.

      The writing of the manuscript needs revision as it makes comprehension difficult.  

      We would like to thank the reviewer for recognizing the usefulness of our approach. In the revised version, we improved the explanation according to your suggestions.  

      One major weakness is that the model does not incorporate key findings that have emerged since the original model's publication in 2013, limiting the support for the proposed mechanism. In particular, ablation studies indicate that AIY is not critical for chemotaxis, and other interneurons may play partially overlapping roles in positive versus negative chemotaxis. These findings challenge the centrality of AIY and suggest the model oversimplifies the circuit involved in klinotaxis.

      We would like to express our gratitude for the constructive feedback we have received. We concur with some of your assertions. In fact, our model is the minimal network for salt klinotaxis, which includes solely the interneurons that are connected to each other via the highest number of synaptic connections. It is important to note that our model does not consider redundant interneurons that exhibit overlapping roles. Consequently, the model is not applicable to the study of the impact of interneuron ablation. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. The experimentally determined CI value incorporates the contributions from both klinokinesis and klinotaxis. Consequently, it is plausible that the impact of AIY ablation was not significantly reflected in the CI value. The experimental observation does not necessarily diminish the role of AIY in klinotaxis. 

      Reference [1] also shows that ASER neurons exhibit complex, memory- and context-dependent responses, which are not accounted for in the model and may have a significant impact on chemotactic model behaviour. 

      As the reviewer has noted, our model does not incorporate the context-dependent response of the ASER. Instead, the impact of the salt concentration-dependent glutamate release from the ASER [S. Hiroki et al. Nat Commun 13, 2928 (2022)] as the result of the ASER responses was in detail examined in the present study.

      The hypothesis of synaptic reversal between ASER and AIY is not explicitly modeled in terms of receptor-specific dynamics or glutamate basal levels. Instead, the ASER-to-AIY connection is predefined as inhibitory or excitatory in separate models. This approach limits the model's ability to test the full range of mechanisms hypothesized to drive behavioral switching.  

      We would like to express our gratitude to the reviewer for their constructive feedback. As you correctly noted, the hypothesized synaptic reversal between ASER and AIY is not explicitly modeled in terms of the sensitivity of the receptors in the AIY and the glutamate basal levels by the ASER. On the other hand, in the present study, under considering a substantial difference in the sensitivity of the two glutamate receptors on the AIY, we sought to endeavored to elucidate the impact of salt-concentration-dependent glutamate basal levels on klinotaxis. To this end, we conducted a comprehensive examination of the full range gradual change in the ASER-to-AIY connection from inhibitory to excitatory, as illustrated in Figures S4 and S5.

      While the main results - such as response dependence on step inputs at different phases of the oscillator - are consistent with those observed in chemotaxis models with explicit neural dynamics (e.g., Reference [2]), the lack of richer neural dynamics could overlook critical effects. For example, the authors highlight the influence of gap junctions on turning sensitivity but do not sufficiently analyze the underlying mechanisms driving these effects. The role of gap junctions in the model may be oversimplified because, as in the original model [4], the oscillator dynamics are not intrinsically generated by an oscillator circuit but are instead externally imposed via $z_¥text{osc}$. This simplification should be carefully considered when interpreting the contributions of specific connections to network dynamics. Lastly, the complex and contextdependent responses of ASER [1] might interact with circuit dynamics in ways that are not captured by the current simplified implementation. These simplifications could limit the model's ability to account for the interplay between sensory encoding and motor responses in C. elegans chemotaxis. 

      We might not understand the substance of your assertions. However, we understand that the oscillator dynamics were not intrinsically generated by the oscillator neural circuit that is explicitly incorporated into our modeling. On the other hand, the present study focuses on how the sensory input and resulting interneuron dynamics regulate the oscillatory behavior of SMB motor neurons to generate klinotaxis. The neuron dynamics via gap junctions results from the equilibration of the membrane potential yi of two neurons connected by gap junctions rather than the zi. We added this explanation in the revised manuscript as follows.

      “The hyperpolarization signals in the AIZL are transmitted to the AIZR via the gap junction (Figs. S1d and S1f and Fig. 3d). This is because the neuron dynamics via gap junctions results from the equilibration of the membrane potential y<sub>i</sub> of two neurons connected by gap junctions rather than the z<sub>i</sub>.”

      In the limitation, we added the following sentence:

      “In the present study, the oscillator components of the SMB are not intrinsically generated by an oscillator circuit but are instead externally imposed via 𝑧<sub>i</sub><sup>OSC</sup>. Furthermore, the complex and context-dependent responses of ASER {Luo:2014et} were not taken into consideration. It should be acknowledged as a limitation of this study that these omitted factors may interact with circuit dynamics in ways that are not captured by the current simplified implementation.”

      Appraisal: 

      The authors show that their model can reproduce memory-dependent reversal of preference in klinotaxis, demonstrating that the ASER-to-AIY synapse plays a key role in switching chemotactic preferences. By switching the ASER-AIY connection from excitatory to inhibitory they indeed show that salt preference reverses. They also show that the curving/turn rate underlying the preference change is gradual and depends on the weight between ASER-AIY. They further support their claim by showing that curving rates also depend on cultivated (set-point).  

      We would like to thank the reviewer for assessing our work.

      Thus within the constraints of the hypothesis and the framework, the model operates as expected and aligns with some experimental findings. However, significant omissions of key experimental evidence raise questions on whether the proposed neural mechanisms are sufficient for reversal in salt-preference chemotaxis.  

      We agree with your opinion. The present hypothesis should be verified by experiments.

      Previous work [1] has shown that individually ablating the AIZ or AIY interneurons has essentially no effect on the Chemotactic Index (CI) toward the set point ([1] Figure 6). Furthermore, in [1] the authors report that different postsynaptic neurons are required for movement above or below the set point. The manuscript should address how this evidence fits with their model by attempting similar ablations. It is possible that the CI is rescued by klinokinesis but this needs to be tested on an extension of this model to provide a more compelling argument.  

      We would like to express our gratitude for the constructive feedback we have received. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. It is important to acknowledge that the experimentally determined CI value encompasses the contributions of both klinokinesis and klinotaxis. It is plausible that the impact of AIY ablation was not reflected in the CI value. Consequently, these experimental observations do not necessarily diminish the role of AIY in klinotaxis. The neural circuit model employed in the present study constitutes a minimal network for salt klinotaxis, encompassing solely interneurons that are connected to each other via the highest number of synaptic connections. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/cceptool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. Our model does not take into account redundant interneurons with overlapping roles, thus rendering it not applicable to the study of the effects of interneuron ablation.

      The investigation of dispersal behaviour in starved individuals is rather limited to testing by imposing inhibition of the SMB neurons. Although a circuit is proposed for how hunger states modulate taxis in the absence of food, this circuit hypothesis is not explicitly modelled to test the theory or provide novel insights.  

      As the reviewer noted, the experimentally identified neural circuit that inhibits the SMB motor neurons in starved individuals is not incorporated in our model. Instead of incorporating this circuit explicitly, we examined whether our minimal network model could reproduce dispersal behavior under starvation conditions solely due to the experimentally demonstrated inhibitory effect of SMB motor neurons.

      Impact: 

      This research underscores the value of an embodied approach to understanding chemotaxis, addressing an important memory mechanism that enables adaptive behavior in the sensorimotor circuits supporting C. elegans chemotaxis. The principle of operation - the dependence of motor responses to sensory inputs on the phase of oscillation - appears to be a convergent solution to taxis. Similar mechanisms have been proposed in Drosophila larvae chemotaxis [2], zebrafish phototaxis [3], and other systems. Consequently, the proposed mechanism has broader implications for understanding how adaptive behaviors are embedded within sensorimotor systems and how experience shapes these circuits across species.

      We would like to express our gratitude for useful suggestion. We added this argument in Discussion of the revised manuscript as follows.    

      “The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, appears to be a convergent solution for taxis and navigation across species. In fact, analogous mechanisms have been postulated in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. Consequently, the synaptic reversal mechanism highlighted in this study offers the framework for understanding how the behaviors that are adaptive to the environment are embedded within sensorimotor systems and how experience shapes these neural circuits across species.”

      Although the reported reversal of synaptic connection from excitatory to inhibitory is an exciting phenomenon of broad interest, it is not entirely new, as the authors acknowledge similar reversals have been reported in ASER-to-AIB signaling for klinokinesis ( Hiroki et al., 2022). The proposed reversal of the ASER-to-AIY synaptic connection from inhibitory to excitatory is a novel contribution in the specific context of klinotaxis. While the ASER's role in gradient sensing and memory encoding has been previously identified, the current paper mechanistically models these processes, introducing a hypothesis for synaptic plasticity as the basis for bidirectional salt preference in klinotaxis.  

      The research also highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      The methodology of parameter search on a neural model of a connectome used here yielded the valuable insight that connectome information alone does not provide enough constraints to reproduce the neural circuits for behaviour. It demonstrates that additional neurophysiological constraints are required.  

      We would like to acknowledge the appropriate recognition of our work.

      Additional Context 

      Oscillators with stimulus-driven perturbations appear to be a convergent solution for taxis and navigation across species. Similar mechanisms have been studied in zebrafish phototaxis [3], Drosophila larvae chemotaxis [2], and have even been proposed to underlie search runs in ants. The modulation of taxis by context and memory is a ubiquitous requirement, with parallels across species. For example, Drosophila larvae modulate taxis based on current food availability and predicted rewards associated with odors, though the underlying mechanism remains elusive. The synaptic reversal mechanism highlighted in this study offers a compelling framework for understanding how taxis circuits integrate context-related memory retrieval more broadly.  

      We would like to express our gratitude for the insightful commentary. In the revised manuscript, we incorporated the argument that the similar oscillator mechanism with stimulus-driven perturbations has been observed for zebrafish phototaxis [3] and Drosophila larvae chemotaxis [2] into Discussion.

      As a side note, an interesting difference emerges when comparing C. elegans and Drosophila larvae chemotaxis. In Drosophila larvae, oscillatory mechanisms are hypothesized to underlie all chemotactic reorientations, ranging from large turns to smaller directional biases (weathervaning). By contrast, in C. elegans, weathervaning and pirouettes are treated as distinct strategies, often attributed to separate neural mechanisms. This raises the possibility that their motor execution could share a common oscillator-based framework. Re-examining their overlap might reveal deeper insights into the neural principles underlying these maneuvers. 

      We would like to acknowledge your thoughtfully articulated comment. As the reviewer pointed out, the anatomical database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) shows that that the neural circuits underlying weathervaning and pirouettes in C. elegans are predominantly distinct but exhibit partial overlap. When we restrict our search to the neurons that are connected to each other with the highest number of synaptic connections, we identify the projections from the neural circuit of weathervaning to the circuit of pirouettes; however we observed no reversal projections. This finding suggests that the neural circuit of weathervaning, namely, our minimal neural network, is not likely to be affected by that of pirouettes, which consists of AIB interneurons and interneurons and motor neurons the downstream. 

      (1) Luo, L., Wen, Q., Ren, J., Hendricks, M., Gershow, M., Qin, Y., Greenwood, J., Soucy, E.R., Klein, M., Smith-Parker, H.K., & Calvo, A.C. (2014). Dynamic encoding of perception, memory, and movement in a C. elegans chemotaxis circuit. Neuron, 82(5), 1115-1128. 

      (2) Antoine Wystrach, Konstantinos Lagogiannis, Barbara Webb (2016) Continuous lateral oscillations as a core mechanism for taxis in Drosophila larvae eLife 5:e15504. 

      (3) Wolf, S., Dubreuil, A.M., Bertoni, T. et al. Sensorimotor computation underlying phototaxis in zebrafish. Nat Commun 8, 651 (2017). 

      (4) Izquierdo, E.J. and Beer, R.D., 2013. Connecting a connectome to behavior: an ensemble of neuroanatomical models of C. elegans klinotaxis. PLoS computational biology, 9(2), p.e1002890. 

      Reviewer #2 (Public review): 

      Summary: 

      This study explores how a simple sensorimotor circuit in the nematode C. elegans enables it to navigate salt gradients based on past experiences. Using computational simulations and previously described neural connections, the study demonstrates how a single neuron, ASER, can change its signaling behavior in response to different salt conditions, with which the worm is able to "remember" prior environments and adjust its navigation toward "preferred" salinity accordingly.  

      We would like to express our gratitude for the time and consideration the reviewer has dedicated to reviewing our manuscript.

      Strengths: 

      The key novelty and strength of this paper is the explicit demonstration of computational neurobehavioral modeling and evolutionary algorithms to elucidate the synaptic plasticity in a minimal neural circuit that is sufficient to replicate memory-based chemotaxis. In particular, with changes in ASER's glutamate release and sensitivity of downstream neurons, the ASER neuron adjusts its output to be either excitatory or inhibitory depending on ambient salt concentration, enabling the worm to navigate toward or away from salt gradients based on prior exposure to salt concentration.

      We would like to thank the reviewer for appreciating our research. 

      Weaknesses: 

      While the model successfully replicates some behaviors observed in previous experiments, many key assumptions lack direct biological validation. As to the model output readouts, the model considers only endpoint behaviors (chemotaxis index) rather than the full dynamics of navigation, which limits its predictive power. Moreover, some results presented in the paper lack interpretation, and many descriptions in the main text are overly technical and require clearer definitions.  

      We would like to thank the reviewer for the constructive feedback. As the reviewer noted, the fundamental assumptions posited in the study have yet to be substantiated by biological validation, and consequently, these assumptions must be directly assessed by biological experimentation. The model performance for salt klinotaxis has been evaluated by multiple factors, including not only a chemotaxis index but also the curving rate vs. bearing (Fig. 4a, the bearing is defined in Fig. A3) and the curving rate vs. normal gradient (Fig. 4c). These two parameters work to characterize the trajectory during salt klinotaxis. In the revised version, we meticulously revised the manuscript according to the reviewer’s suggestions. We would like to express our sincere gratitude for your insightful review of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      An interesting and engaging methodology combining theoretical and computational approaches. Overall I found the manuscript up to discussion a difficult read, and I would suggest revising it. I would also recommend introducing the general operating principle of the oscillator with sensory perturbations before jumping into the implementation details of signal propagation specific to C.

      elegans.  

      In order to elucidate the relation between the general operating principle of the oscillator with sensory perturbations and the results shown by the two graphs from the bottom in Fig. 3d, the following statement was added on page 12.

      “It is remarkable that this regulatory mechanism derived via the optimization of the CI has been observed in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, therefore, may serve as a convergent solution for taxis and navigation across species.”

      The abstract could benefit from a clarification of terms to benefit a broader audience:  The term "salt klinotaxis" is used without prior introduction or definition. It would be beneficial to briefly explain this term, as it may not be familiar to all readers. 

      Due to the limitation of the word number in the abstract, the explanation of salt klinotaxis could not be included.

      Although ASER is introduced as a right-side head sensory neuron, AIY neurons are not similarly introduced. It may also benefit to introduce here that ASER integrates memory with current salt gradients, tuning its output to produce context-appropriate behaviour.  

      Due to the limitation of the word number in the abstract, we could add no more the explanations. 

      "it can be anticipated that the ASER-AIY synaptic transmission will undergo a reversal due to alterations in the basal glutamate Release": Where is this expectation drawn from? Is it derived from biophysical or is it a functional expectation to explain the network's output constraints?  

      As delineated before this sentence, it is derived from a comprehensive consideration of the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons, in conjunction with varying the basal level of glutamate transmission from ASER.

      The statement that the model "revealed the modular neural circuit function downstream of ASE" could be more explicit. What specific insights about the downstream circuit were uncovered?

      Highlighting one or two key findings would strengthen the impact.  

      Due to the limitation of the word number in the abstract, no more details could be added here, while the sentence was revised as “revealed that the circuit downstream of ASE functions as a module that is responsible for salt klinotaxis.” This is because the salt-concentration dependent behaviors in klinitaxis can be reproduced through the modulation of the ASRE-AIY synaptic connections alone, despite the absence of alterations in the neural circuit downstream of AIY.

      I believe the authors should cite Luo et al. 2014, which also studies how chemotactic behaviours arise from neural circuit dynamics, including the dynamic encoding of salt concentration by ASER, and the crucial downstream interaction with AIY for chemotactic actions. 

      We would like to express our gratitude for useful suggestion. We cited Luo et al. 2014 in the discussion on the limitation of our work. 

      The introduction could also be improved for clarity. Specifically in the last paragraph authors should clarify how the observed synchrony of ASER excitation to the AIZ (Matsumoto et al., 2024), validates the resulting network.  

      We would like to express our gratitude for useful suggestion. We added the following explanation in the last paragraph of the introduction.

      “Specifically, the synchrony of the excitation of the ASER and AIZ {Matsumoto:2024ig} taken together with the experimentally identified inhibitory synaptic transmission between the AIY and AIZ revealed that the ASER-AIY synaptic connections should be inhibitory, which was consistent with the network obtained from the most evolved model.”

      In addition, we added the following explanation after “It was then hypothesized that the ASER-AIY inhibitory synaptic connections are altered to become excitatory due to a decrease in the baseline release of glutamate from the ASER when individuals are cultured under C<sub>cult</sub> < C<sub>test</sub>.”

      This is due to the substantial difference in the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons.

      I would also strongly recommend replacing the term "evolved model", with "Optimized Model" or "Best-Performing Model" to clarify this is a computational optimization process with limitations - optimization through GAs does not guarantee finding global optima.  

      We revised "evolved model" as "optimized model" in the main and SI text.

      The text overall would benefit from editing for clarity and expression.  

      According to the revisions mentioned above, we revised “best optimized model” as “most optimized model” in the main and SI text.

      The font size on the plot axis in Figures 3 c&d should be increased for readability on the printed page. Label the left/right panel to indicate unconstrained / constrained evolution.  

      As you noted, the font size of the subscript on the vertical axis in Figs 3c and 3d was too small. We have revised the font size of the subscript in Figs. 3c and 3d and also in Fig. 5e. At your suggestion, “unconstrained” and “constrained” have been added as labels to the left and right panels in Fig. 3.

      There is no input/transmission to AIYR to step input in either model shown in Figure 3? 

      As shown in Fig. S1e and S1f, there are the transmissions to the AIYR from the ASEL and ASER. 

      Supplementary Figure 1 attempts to explain the interactions. There are inconsistent symbols used for inhibition and excitation between network schema (colours) and the z response plots (arrows vs circles), combined with different meanings for red/blue making it very confusing. 

      We could not address the inconsistency in the color of arrows and lines with an ending between Figs. S1c and S1d and Figs. S1a and S1b. On the other hand, Figs. S1e and S1f were revised so that the consistent symbols were used for inhibition, excitation, and electrical gap connections in Figs. S1c-S1f. The same revisions were made for Fig. S7c-S7f.

      Model parameters are given to 15 decimal precision, which seems excessive. Is model performance sensitive to that order? We would expect robustness around those values. The authors should identify relevant orders and truncate parameters accordingly. 

      We examined the influence of the parameter truncation on the trajectory and decided that the parameters with four decimal places were appropriate. According to this, we revised Table A4.

      Figure 3 caption typo "step changes I the salt concentration".  

      The typo was revised in Fig. 3 caption. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Overall, the language of the paper is not properly organized, making the paper's logic and purpose hard to follow. In the Results Section, many observations or findings lack explicit interpretation. To address this issue, the authors should consider (1) adopting the contextcontent-conclusion scheme, (2) optimizing the logic flow by clearly identifying the context and goals prior to discussing their results and findings, (3) more explicitly interpreting their results, especially in a biological context.  

      We would like to express our gratitude for helpful suggestion. According to your suggestion listed below, we revised the main and SI texts.

      (2) In Figure 2, trajectories from the model with AIY-AIZ constraints show a faster convergence than those from the constraint-free model. However, in the corresponding texts in the Results section, the authors claimed no significant difference. It seems that the authors made this argument only based on CI (Chemotaxis Index). Therefore, in order to address such inconsistency, the authors need more explanation on why only relying on CI, which is an endpoint metric, instead of the whole navigation.  

      I would like to thank you for the helpful comment. In the present study, not only the CI but also the curving rate shown in Fig. 4 were applied to characterize the behavior in klinotaxis.

      According to your comments, we revised the related description in the main text as follows:

      “The difference between these CI values is slight, while the model optimized with the constraints exhibits a marginally accelerated attainment of the salt concentration peak, as shown by the trajectories. The slightly higher chemotaxis performance observed in the constrained model is not essentially attributed to the introduction of the AIY-AIZ synaptic constraints but rather depends on the specific individuals selected from the optimized individuals obtained from the evolutionary algorithm. In fact, even when the AIY-AIZ constraints are taken into consideration, the model retains a significant degree of freedom to reproduce salt klinotaxis due to the presence of a substantial parameter space. Consequently, the impact of the AIY-AIZ constraints on the optimization of the CI is expected to be negligible.”

      (3) In Figures 3a and b, some inter-neuron connections are relatively weak (e.g., AIYR to AIZR in Figure 3a) - thus it is unclear whether the polarity of such synapses would significantly influence the behavioral outcome or not. The authors could consider plotting the change of the connection strengths between neurons over the course of model optimization to get a sense of confidence in each inter-neuron connection. 

      In the evolutional algorithm, the parameters of individuals are subject to discontinuous variation due to the influence of selection, crossover, and mutations. Consequently, it is not straightforward to extract information regarding parameter optimization from parameter changes due to the non-systematic nature of parameter variation..

      (4) In Figure 3, the order of individual figure panels is incorrect: in the main text, Figure 3 a and b were mentioned after c and d. Also, the caption of Figure 3c "negative step changes I the" should be "in".  

      The main text underwent revision, with the description of Figures 3a and 3b being presented prior to that of Figures 3c and 3d. The typo was revised.

      (5) In Figure 4, the order of individual figure panels is messed up: in the main text, Figure 4 a was mentioned after b.  

      The main text underwent revision, with the description of Figure 4a being presented prior to that of Figure 4b.

      (6) Also in Figure 4, the authors need to provide a definition/explanation of "Bearing" and "Translational Gradient". In Figure 4d, the definition of positive and negative components is not clear.  

      Normal and Translational Salt Concentration Gradient in METHOD was referenced for the definition and explanation of the bearing and the translational gradient. We added the following explanation on the positive and negative components.

      “The positive and negative components of the curving rate are respectively sampled from the trajectory during leftward turns (as illustrated in Fig. 4b) and rightward turns, respectively.”

      (7) Figure 5: the authors need to explain why c has an error bar and how they were calculated, as this result is from a computational model. Figure 5d is experimental results - the authors need to add error bars to the data points and provide a sample size. 

      As explained in Analysis of the Salt Preference Behavior in Klinotaxis in METHOD, the ensemble average of these quantities was determined by performing 100,000 sets of the simulation with randomized initial orientation for a simulation time of T_sim=200 sec. The error bars for the experimental data were added in Figs. 5c, 6a, and S9a.

      (8) On Page 14, the authors said, "To this end, this end, we used the best evolved network with the constraints, in which we varied the synaptic connections between ASER and AIY from inhibitory to excitatory." How did the model change the ASER-AIY signaling specifically? The authors should provide more explanation or at least refer to the Methods Section.  

      The caption of Fig. S4 was referred as the explanation on the detailed method. 

      (9) Page 15: "a subset a subset exhibited a slight curve...". This observation from the model simulation is contradictory to experiments. However, their explanation of that is hard to understand.  

      I would like to thank you for the helpful comment. To improve this, we added the following explanation:

      “In the case of step increases in 𝑧OFF as illustrated in the second right panel from the bottom in Fig.3d, the turning angle φ is increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward lower salt concentrations. On the other hand, in the case of step increases in 𝑧ON as illustrated in the second left panel from the bottom in Fig.3d, the turning angle φ is again increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward higher salt concentrations. The behaviors that are consistent with these analyses are observed in the trajectory illustrated in Fig. S8b.”

      (10) Last result session: inhibited SMB in starved worms is due to a mechanism unrelated to their neural network model upstream to SMB. Therefore, their results recapitulating the worms' dispersal behaviors cannot strengthen the validity of their model.  

      We agree with your opinion. We think that the findings from the study of starved worms do not provide evidence to validate the neural network model upstream of SMB.   

      (11) Discussion: "in contrast, the remaining neurons...". This argument lacks evidence or references.  

      This argument is based on the results obtained from the present study. This sentence was revised as follows:

      “This regulatory process enables the reproduction of salt concentration memory-dependent reversal of preference behavior in klinotaxis, despite the remaining neurons further downstream of the ASER not undergoing alterations and simply functioning as a modular circuit to transmit the received signals to the motor systems. Consequently, the sensorimotor circuit allows a simple and efficient bidirectional regulation of salt preference behavior in klinotaxis.”

      (12) To increase the predictive power of their model, can the authors perform simulations on mutant worms, like those with altered glutamate basal level expression in ASER?  

      We would like to express our gratitude for useful suggestion. The simulations, in which the weight of the ASER-AIY synaptic connection is increased from negative (inhibitory connection) to positive (excitatory connection), as illustrated in Figure S4, provide valuable insights into the relationship between varying glutamate basal levels from ASER and behavior in klinotaxis, such as the chemotaxis index.

  4. Jun 2025
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Comments for the authors of Review Commons Manuscript RC-2024-02804:

      The author of the Review Commons manuscript "Antigen flexibility supports the avidity of hemagglutinin-specific antibodies at low antigen densities", present their recent work evaluating hemagglutinin interactions with cellular receptors and antibodies. This manuscript focuses specifically on the avidity of the hemagglutinin using a fluorescence-based assay to measure dissociation kinetics and steady-state binding of antibodies to virions. Their findings confirm that bivalent interactions can offset weak monovalent affinity and that HA ectodomain flexibility is an additional determinant of antibody avidity. These findings are key for our understanding of neutralizing antibodies. Below are some comments that I would like the authors to address as they revise the manuscript.

      Comments:

      1. Can the authors provide justification for the two influenza viruses that they used.

      We selected the lab-adapted IAV strains A/WSN/1933 (H1N1) and A/Hong Kong/1968 (H3N2) for this work because they are well-studied, including in the context of the antibodies used here, S139/1 and C05. While both antibodies bind to more contemporary H3N2 strains, they no not bind to HA from pandemic H1N1. Another feature of these strains is that their HAs have high enough affinity to both antibodies to enable strong signal in our imaging assays. This context for our strain selection has been added in lines 85-88.

      1. The use of filamentous particles is a strength, but authors should detail the role of filamentous vs. spherical in nature and lab settings. This will help researchers that plan to repeat these assays.

      We have revised the text (lines 336-339) to include more context on the biology of filamentous and spherical influenza viruses. In our experiments, HK68 naturally produces filaments in cell culture whereas WSN33 does not. To produce filaments artificially, we replace the M1 sequence from WSN33 with that of M1 from A/Udorn/1972, an H3N2 strain that is closely related to HK68.

      1. Did the authors add the Udorn M1 to the HK68 as well?

      Since HK68 naturally forms filaments, we did not introduce Udorn M1 into this strain. We note that the amino acid sequences of Udorn M1 and HK68 M1 differ only at position 167 (Alanine in Udorn, Threonine in HK68), and that this residue has previously been found to not correlate with virus morphology (10.1016/j.virol.2003.12.009).

      Reviewer #1 (Significance (Required)):

      This manuscript focuses specifically on the avidity of the hemagglutinin using a fluorescence-based assay to measure dissociation kinetics and steady-state binding of antibodies to virions. Thie findings confirm that bivalent interactions can offset weak monovalent affinity and that HA ectodomain flexibility is an additional determinant of antibody avidity. These findings are key for our understanding of neutralizing antibodies.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      In this study, Benegal et al. investigate the binding kinetics of HA-head-specific antibodies (S139/1 and C05) to intact influenza virus particles using a fluorescence microscopy-based technique to measure the dissociation rate (koff) of the antibodies. By applying their proposed equilibrium model for bivalent antibody binding to HA, the authors calculated the crosslinking rate (kx), which represents the rate at which a single-bound antibody crosslinks to an additional HA molecule. Their experiments revealed that antigen crosslinking significantly slows koff, reducing it by up to two orders of magnitude. The authors further utilized streptavidin-coated beads conjugated with biotinylated HA or biotinylated BSA at varying concentrations to control HA surface density. Their results demonstrated that the two tested HA-head-specific antibodies retained the ability to crosslink HAs even at ~10-fold lower HA surface densities. In a complementary experiment, they employed an HA-anchor-specific antibody to restrict HA flexibility, which led to reduced binding of S139/1 and C05 IgGs but not their Fab fragments. This finding suggests that HA flexibility, rather than density, is the primary determinant of antibody crosslinking and avidity. Overall, the authors present an innovative approach to elucidating the dissociation and crosslinking kinetics of antibodies targeting intact virions or nanoparticles. The study is well-designed, with alternative interpretations of the results carefully considered and addressed throughout. I have only a few minor comments and suggestions for clarification.

      Minor comments:

      1. In Figure 1, does the grey color of each IgG in panel C indicate the Fc domain? If so, please add the description of the colors to the figure legend. In fact, it may be better to explain all the colors used here (for HA1, HA2, Fab heavy chain, light chain, etc.).

      We have included this information in panel C and the caption for Figure 1.

      1. Under the section," Bivalent binding of S139/1 and C05 persists after ~10-fold reductions in HA surface densities", the beginning of the second paragraph writes, "For both S139/1 and C05 Fab, binding increases linearly with HA density, as expected for a monovalent interaction dictated by absolute HA availability rather than density (Fig. 3D). Interestingly, the same relationship is observed for S139/1 IgG."

      Visually, I think the same relationship also seems to hold for C05 IgG. Would it be better to perform some linear regression and report the R2 value for the fitting so that this assessment can be quantitative?

      We agree with the reviewer's point. In Figure 3 of the revised manuscript, we include the results from a linear regression analysis to make this assessment more quantitative.

      1. At the end of the same page, in the same paragraph, the authors mentioned, "In contrast to the IgG, Fab binding measured at twice the molar concentration of the IgG is nearly undetectable under these conditions, confirming the IgG binding is not occurring through monovalent interactions (Fig. S2E)." What are the conditions you are referring to? In Fig. S2E, there is only the Ab intensity for the Ab binding at 100% HA (and not the other percentages). For the Ab intensity of S139/1 Fab, what is the concentration of the Fab used in Figure 3D? Why could the intensity in this experiment for S139/1 Fab reach ~100,000, whereas that of the 8 nM in Fig. S2E can only reach ~20,000?

      To clarify this point, we have updated Figure 3 to include the antibody concentration used for each experiment. The experiments in Fig 3 are conducted approximately around the respective KD of each IgG or Fab to ensure both consistency and strong signal-to-noise. For S139/1, we use 4nM of IgG, and 25nM of Fab. In Fig S2E, we use a concentration of Fab fragments double to that of the IgG, to reach an equivalent concentration of binding sites and confirm that the IgG binding we see is indeed due to bivalent binding. In this case, we use 4nM of IgG and 8nM of Fab.

      1. Under the section, "Tilting of HA about its membrane anchor contributes to C05 and S139/1 avidity", in the second paragraph, the authors wrote, "If this is correct, we reasoned that avidity could be reduced by constraining tilting of the HA ectodomain. To test this hypothesis, we used FISW84, an antibody that binds to the HA anchor epitope and biases the ectodomain into a tilted conformation (Fig. 4B)."

      Can you use some computational models (maybe the same one you used for Figure 4A) to show that when an HA trimer is bounded by FISW84 Fabs, the tilting of HA is constrained? I think this will help substantiate the assertion above.

      This is an important point. The model that we employ in Figure 4A is suited to predicting the angles sampled by HAs when they are bound by an IgG antibody, but it does not take into consideration clashes with the viral membrane. It is these clashes that we predict based on published structures (reference 35 in the revised manuscript) will constrain HA tilting when FISW84 binds to the HA anchor. We have revised the text (Lines 247-249) to clarify these points.

      1. It would be good if you could mention the strain of HA used in the experiments in Figure 4 in the actual Figure as well (as supposed to just in the figure legend).

      We have added this information to Figure 4 in the revised manuscript.

      1. I do not see a method section for the structure-based model you used in Figure 4. In the text, you cited your previous study (ref 28) for the model, but it would be good to write about this briefly (and how you specifically apply the model in this study) in this current manuscript.

      We have updated the methods to include a subsection ("Geometric Model for Preferred Crosslinking Geometry") on how the structure-based model was set up, along with a corresponding visual in Fig S3 of the angles of freedom given.

      1. In Figure S1 panel D, what is the unit of the antibody concentration? Could you please add it to the graph legend?

      We have updated the figure (S1E in the revised manuscript) to include this information.

      Reviewer #2 (Significance (Required)):

      Previously, this group utilized the same fluorescence-based method to investigate the potency of anti-HA IgG1 antibodies in preventing viral entry versus egress, as well as the tendency of antibodies targeting different HA epitopes to crosslink two HA trimers in cis or in trans (He et al., J Virol, 2024). In this study, they extend their work by evaluating, in-depth, how the density and flexibility of hemagglutinin (HA) on the viral surface influence the binding avidity of anti-HA antibodies. Using two human IgG1 antibodies targeting the HA head, the authors demonstrate that these antibodies can crosslink two HA trimers in cis, even when the trimers are further apart than adjacent HAs. Notably, the study reveals that HA flexibility, rather than density, is the key determinant modulating antibody crosslinking. Even at a 10-fold reduced HA density compared to the original, the antibodies retained their ability to crosslink trimers.

      This study provides critical insights into the relationship between HA density, flexibility, and antibody function, adding to the broader understanding of antibody crosslinking-a topic frequently discussed in the field of influenza research. These findings could have significant implications for vaccine design, particularly for strategies involving the display of the HA ectodomain on nanoparticles, potentially guiding the development of more effective influenza vaccines. Furthermore, the broader relevance of these findings may extend to other viruses with similar structural and immunological properties.

      My expertise lies in the structural determination of antibody-antigen complexes in influenza and other pathogens. While I may not have sufficient expertise to evaluate specific technical details of the fluorescence-based methods employed, the authors have convincingly demonstrated the robustness of their experimental design and interpretation, supported by appropriate controls.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      SUMMARY In "Antigen flexibility supports the avidity of hemagglutinin-specific antibodies at low antigen densities", Benegal et al. develop a microscopy-based assay to measure dissociation of HA head-binding antibodies from intact virions. This assay allows the authors to explore the contribution of IgG bivalent avidity to antibody interaction with native virions, which is not accessible using other methods such as BLI. Using this assay, the authors further explore the effect of HA density on IgG avidity with engineered low-HA virions and then with artificial HA-coated microspheres. In addition to measuring antibody dissociation, the authors perform structural analyses to predict the conformational preferences of many HA IgGs from published structures. The authors conclude that low HA densities (down to ~10%) still support high avidity binding for the 2 IgGs tested, and thus there would be little evolutionary pressure for IAV to reduce the HA density as a strategy to evade immune recognition.

      MAJOR COMMENTS

      The data presented are generally convincing for the two antibodies tested, with some caveats listed below. I believe the microscopy technique is valuable and provides a significant contribution to the field, and I believe that the finding that avidity persists at low densities for IAV is compelling and worth communicating to other virologists. Overall, with the incorporation of the suggested major revisions, this manuscript represents a significant advancement in the field.

      A major limitation of the current study is the small number of antibodies tested. Two antibodies are quite few, particularly since this work attempts to generalize these observations with structural predictions of dozens or hundreds of HA antibodies. While I believe that the resilience of IgG binding to lower epitope densities is likely common to many HA antibodies (or antibodies in general), this work alone does not support this. To this end, the authors should acknowledge their limited sample size in the text or discussion and that the generalization to other antibodies is speculative. Alternatively, the authors could demonstrate with additional antibodies (such as F045-092 which is pointed out in Fig S3A and perhaps group 'i' antibodies according to Fig S3A).

      This is an important point, and we more explicitly acknowledge this limitation in lines 277-278.

      It seems to me lateral diffusion of HA in the viral membrane is an important discussion point that was missed in this manuscript. The authors should comment on what is known about the lateral mobility of HA on virions, and how this could impact the ability of an IgG to crosslink. The authors should comment about whether long range diffusion and/or short range "shuffling" of glycoproteins could contribute to crosslinking preferences of antibodies in addition to the tilt, which is the only movement discussed. As appropriate, the authors should then comment on how this may affect their interpretation of experiments using beads. In experiments on beads, there is certainly no lateral mobility of the HA trimers; what are the consequences of this on the analysis?

      We agree that this is an important consideration, and we have revised the manuscript (lines 296-298) to address these points. Briefly, we have previously performed fluorescence recovery after photobleaching of covalently labeled HA and NA on the surface of filamentous influenza particles (10.7554/eLife.43764; see Figure 1B of this reference for a representative example). This data indicates that long range diffusion does not seem to be occurring on the virion surface. Short range diffusion, or shuffling, has not been observed, but cannot be ruled out, and may increase conformations favorable to bivalent binding.

      Should the authors qualify the limitations in the scope of their experimental results and the system of choice (beads vs. virions) as described in my previous comments, I suggest three experiments that I believe are essential to support the authors' claims. Alternative to qualifying the limitations, two optional experiments are also listed that could support the authors' claims as they are - those require a more extensive experimental undertaking and are thus labeled [OPTIONAL].

      1) The photobleaching experiment shown in Figure S1A. I am concerned that measuring photobleaching in steady state conditions does not properly control for the experimental conditions. In steady state, bleached antibody could unbind and be replaced by fluorescent antibody that has diffused into the field of view. This should be more thoroughly controlled by irreversibly capturing antibody (such as with biotin) and imaging after excess antibody is washed away, or by some other method such as capturing and imaging virus that has been directly labeled with AF555. This should be possible using reagents and techniques already demonstrated by the authors.

      We have updated the supplemental information with a more rigorous control for photobleaching; the revised figures are shown in Fig S1A. In this experiment, fluorescent S139/1 IgG was bound to HK68 virions. The antibody was washed away, and the loss of fluorescence signal was imaged separately under two conditions: 1) Dissociation only; an image was collected at 0s and one at 60s. 2) Dissociation and photobleaching; an image was collected at a rate of 1 frame per second for 60 seconds. The difference between the endpoint intensities from both conditions is not statistically significant. This supports our conclusion that, in the absence of antibodies in solution that can exchange with those bound to virions, photobleaching does not make a detectable contribution to the loss of signal we observe in our antibody dissociation experiments.

      2) In imaging, the authors analyzed only filamentous virions because they exhibit the best signal to noise ratio, which is a reasonable technical simplification. However, this relies on the assumption that glycoprotein presentation is relatively constant between virions of different sizes. It would be helpful to perform some analysis of small virions in any movie where there is sufficient signal. This would support the assumption that rates for small virions are comparable to those of filaments in the same experiment. This should be possible by performing additional analysis on existing data, without requiring additional experiments.

      Thank you for calling our attention to a point that needs clarification. The analysis that was restricted to filaments was for the SEP-HA binding experiments (shown in Fig 3A&B). This was done in order to select only those particles that were not diffraction-limited, so that we could control for any systematic differences in size between the two populations by measuring HA signal per unit particle length. For the dissociation experiments (Fig 2), data was taken from all virions in the fields of view. For this analysis, the normalized dissociation curves were averaged in two ways to account for the potential discrepancy that the reviewer points out. In the first method, the average was taken with each virion equally weighted, while in the second method, the entire field of view was masked and normalized together. Both curves look very similar, suggesting that any potential differences between smaller virions and filaments are not enough to make a quantifiable difference in dissociation rate. A representative dissociation curve with both analyses shown side-by-side has been added in Figure S1B.

      3) In figure 3, C05 fab binding is used to assay HA content of the SEP HA virions. An additional method of confirming HA content that is more independent from the imaging assay would be beneficial, such as a Western blot to quantify HA relative to NP, NA, or M1 etc.

      We have used western blotting to quantify the amount of HA contained relative to M1 in each population. This new data is discussed in lines 163-168 of the revised manuscript and shown in Figure S2C. As noted in the revised text, western blot analysis suggests that the density of native HA is decreased to ~31% its normal level in SEP-HA virions, lower than the ~75% value determined via fluorescence microscopy. One possible reason for this disparity is the presence of virus-like particles in the SEP-HA sample that completely lack wildtype HA. These would be excluded from our imaging analysis but captured by the western blot.

      4) [OPTIONAL] In figure 4, it is depicted that FISW84 biases HA in a tilted conformation, and the authors reasonably propose the reduced flexibility discourages crosslinking by IgGs. From the modeling summarized in Figure S3A, are there any antibodies predicted to prefer crosslinking HA at the same angle FISW84 tilts the ectodomain? Would FISW84 enhance crosslinking by such an antibody?

      This is an interesting suggestion, and we have revised the manuscript (lines 247-249) to clarify our thinking on this point. Based on the structure of the FISW84 Fab (PDB ID 6HJQ), we conclude that binding of a single Fab fragment does not necessarily actively tilt the HA ectodomain in a specific direction. Rather, it restricts tilting in the direction that would cause a steric clash between the Fab and the membrane. As a result, HA can still sample a range of angles, but this range is no longer symmetrical about the ectodomain axis. By reducing the likelihood that two HA ectodomains would tilt towards each other at a favorable angle, we would expect all antibodies to be disadvantaged to some degree. A possible exception could be if three FISW84 Fab fragments manage to bind to a single HA trimer. In this case, the HA ectodomain would be forced to remain perpendicular to the membrane to accommodate them all. This would favor antibodies that prefer binding to HAs where the ectodomains are parallel to each other. In our analysis in Figure S3A, this includes primarily antibodies that bind to the HA central stalk, such as 31.b.09. However, we note that these antibodies may encounter barriers to bivalent binding that we do not consider here, including proximity to the FISW84 epitope and the high density of HA in the membrane.

      5) [OPTIONAL] In figure S3A, the authors display theoretical tilt and spacing preferences for many HA antibodies based on published structures. Interestingly, their group iii antibody is predicted to prefer greater spacing and tilt, and likewise the authors observe increased binding at lower densities (in figure 3E). It would be beneficial to the work to test group i antibodies (base binding) in the dissociation experiments. The behavior of a base binding antibody, particularly at low densities could reinforce the modeling performed for this work.

      This is an excellent suggestion which we are not currently able to pursue for technical reasons. In particular, it would be difficult to distinguish between increased binding of these antibodies at low antigen densities that is due to bivalent attachment (and thus reduced dissociation) versus increased accessibility of the epitope, which may be occluded at higher HA densities.

      The experiments are well explained and supported by methods that would enable reproducibility.

      The authors state "The statistical tests and the number of replicates used in specific cases are described in the figure legends" yet in many cases this information is absent. For the k values in fig 2D, some indication of error or confidence interval would be helpful.

      We have ensured that this information is included in each of the captions. Regarding the k values, formal error propagation is challenging due to the way the k values were derived. Specifically, these values were calculated by fitting the average of the three initial dissociation traces, rather than fitting each replicate individually and then averaging the rate constants. As a result, the usual methods for estimating confidence intervals or standard error of the mean are not directly applicable.

      MINOR COMMENTS

      o Some of the small details in fig 1A and fig S1 are lost due to small figure size - such as the sialic acid residues and lipid bilayer.

      We have resized the figure components.

      o Although described in the text, it could be helpful to incorporate into figure 2 why the BLI data is shown for S129 fab. Perhaps indicate in 2C that that curve is "too fast to accurately measure" and perhaps near the table in 2D indicate the blue data is from Lee et al. It may be fine to simply remove the BLI results from the figure and refer to them only in the discussion of the experiments. Even with the measured data, the difference between fab and IgG is striking enough to support the paper, and the BLI data may be more confusing in the figure than it adds.

      We have updated the caption for Figure 2D to clarify that binding between the S139/1 Fab and A/WSN/1933 HA is approaching the limit of detection in our assay, and that the additional rates are from Lee et al. We have also updated the table to make the presentation of the kinetic parameters more clear.

      o In figure 3A, better describe the fluorescent components in the fluorescent images in the legend.

      We have updated the caption for Figure 3A to describe the fluorescent components shown in the image. Specifically, the panel labeled 'HA' shows signal from a fluorescent FI6v3 scFv, while the panel labeled 'decoy' shows signal from the SEP-HA construct.

      o From personal experience, the flexibility of HA ectodomain can be significantly affected by how much of the membrane proximal linker region is retained or removed. Could the authors comment on how they chose the cutoff for their HA ectodomain used in the bead experiments and their rationale?

      This is an important point, and while the precise impact of the linker on HA flexibility remains uncertain, we agree that it may increase the freedom of motion of the ectodomain relative to the HA membrane anchor. We mention this caveat in the revised text (lines 188-191) and we have added an AlphaFold2 prediction of how our recombinant HA might look to Figure S2D.

      o In Figure S1B, if I understand correctly: black dashed line "IgG equivalent dissociation rate" is the experimental data, magenta "Crosslinking model fit" is the theoretically total antibody bound as described by the mathematical model. Then the gray lines "Double- /singly- bound antibodies plot the theoretical amount of antibody bound once and bound twice. If this is correct, I believe it would be clearer if the singly- and doubly- bound were plotted in separate colors, and that this is explained more clearly in the legend.

      We have revised the figure to show doubly- and singly-bound curves using different line styles.

      o Related to an earlier comment, if lateral diffusion may play a role, how might this differ between different types of antibodies?

      As mentioned in our previous response, we do not anticipate that lateral diffusion makes a significant contribution to antibody binding to the surface of virions, although it may be important on the cell surface.

      o Could the authors comment in the discussion on how their results on virions may translate to the surface of the infected cell, which is also decorated in viral glycoproteins? Early time points of infection could be an in vivo example of low-density HA. What extent may antibody binding and crosslinking affect viral proteins on the cell surface or the immune response?

      This is a very interesting point. Antibody binding to the infected cell surface has been shown to alter viral release and morphology, presumably at lower HA densities than those observed the viral surface. We have added a brief discussion of this point (lines 291-295) to the revised manuscript.

      o The github link in the methods is incorrect or not yet available.

      Thank you for noting this. We have updated the link.

      o Reference 1 has an incorrect or expired link.

      These references have been updated.

      Reviewer #3 (Significance (Required)):

      • This work represents a conceptual advance in our understanding of antibody action on viral pathogens. The authors adapt existing microscopy methodologies to measure antibody avidity in a new way that is better representative of in vivo conditions.

      • To my knowledge, this is the first instance of direct measurement of antibody off-rates from intact virus particles, instead of immobilized protein as in BLI, SPR, or interferometry.

      • This work should be of interest to virologist and biophysicists interested in the cooperative binding of antibodies and the relation of virus structural organization to antibody recognition. Immunologist may also be influenced by this work. This work may be followed up by other researchers similarly measuring the association and dissociation rates of antibodies with single virions, or otherwise comparing fab to IgG binding to gain insight into when crosslinking is or is not occurring.

      • Reviewer expertise: Single-virion imaging, protein complexes, biochemistry, influenza A.

      • I do not have sufficient expertise to evaluate the mathematical models and differential equations for modeling the k-on and k-off rates.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this study, Benegal et al. investigate the binding kinetics of HA-head-specific antibodies (S139/1 and C05) to intact influenza virus particles using a fluorescence microscopy-based technique to measure the dissociation rate (koff) of the antibodies. By applying their proposed equilibrium model for bivalent antibody binding to HA, the authors calculated the crosslinking rate (kx), which represents the rate at which a single-bound antibody crosslinks to an additional HA molecule. Their experiments revealed that antigen crosslinking significantly slows koff, reducing it by up to two orders of magnitude.

      The authors further utilized streptavidin-coated beads conjugated with biotinylated HA or biotinylated BSA at varying concentrations to control HA surface density. Their results demonstrated that the two tested HA-head-specific antibodies retained the ability to crosslink HAs even at ~10-fold lower HA surface densities. In a complementary experiment, they employed an HA-anchor-specific antibody to restrict HA flexibility, which led to reduced binding of S139/1 and C05 IgGs but not their Fab fragments. This finding suggests that HA flexibility, rather than density, is the primary determinant of antibody crosslinking and avidity.

      Overall, the authors present an innovative approach to elucidating the dissociation and crosslinking kinetics of antibodies targeting intact virions or nanoparticles. The study is well-designed, with alternative interpretations of the results carefully considered and addressed throughout. I have only a few minor comments and suggestions for clarification.

      Minor comments:

      1. In Figure 1, does the grey color of each IgG in panel C indicate the Fc domain? If so, please add the description of the colors to the figure legend. In fact, it may be better to explain all the colors used here (for HA1, HA2, Fab heavy chain, light chain, etc.).
      2. Under the section," Bivalent binding of S139/1 and C05 persists after ~10-fold reductions in HA surface densities", the beginning of the second paragraph writes, "For both S139/1 and C05 Fab, binding increases linearly with HA density, as expected for a monovalent interaction dictated by absolute HA availability rather than density (Fig. 3D). Interestingly, the same relationship is observed for S139/1 IgG."

      Visually, I think the same relationship also seems to hold for C05 IgG. Would it be better to perform some linear regression and report the R2 value for the fitting so that this assessment can be quantitative? 3. At the end of the same page, in the same paragraph, the authors mentioned, "In contrast to the IgG, Fab binding measured at twice the molar concentration of the IgG is nearly undetectable under these conditions, confirming the IgG binding is not occurring through monovalent interactions (Fig. S2E)." What are the conditions you are referring to? In Fig. S2E, there is only the Ab intensity for the Ab binding at 100% HA (and not the other percentages). For the Ab intensity of S139/1 Fab, what is the concentration of the Fab used in Figure 3D? Why could the intensity in this experiment for S139/1 Fab reach ~100,000, whereas that of the 8 nM in Fig. S2E can only reach ~20,000? 4. Under the section, "Tilting of HA about its membrane anchor contributes to C05 and S139/1 avidity", in the second paragraph, the authors wrote, "If this is correct, we reasoned that avidity could be reduced by constraining tilting of the HA ectodomain. To test this hypothesis, we used FISW84, an antibody that binds to the HA anchor epitope and biases the ectodomain into a tilted conformation (Fig. 4B)."

      Can you use some computational models (maybe the same one you used for Figure 4A) to show that when an HA trimer is bounded by FISW84 Fabs, the tilting of HA is constrained? I think this will help substantiate the assertion above. 5. It would be good if you could mention the strain of HA used in the experiments in Figure 4 in the actual Figure as well (as supposed to just in the figure legend). 6. I do not see a method section for the structure-based model you used in Figure 4. In the text, you cited your previous study (ref 28) for the model, but it would be good to write about this briefly (and how you specifically apply the model in this study) in this current manuscript. 7. In Figure S1 panel D, what is the unit of the antibody concentration? Could you please add it to the graph legend?

      Significance

      Previously, this group utilized the same fluorescence-based method to investigate the potency of anti-HA IgG1 antibodies in preventing viral entry versus egress, as well as the tendency of antibodies targeting different HA epitopes to crosslink two HA trimers in cis or in trans (He et al., J Virol, 2024). In this study, they extend their work by evaluating, in-depth, how the density and flexibility of hemagglutinin (HA) on the viral surface influence the binding avidity of anti-HA antibodies. Using two human IgG1 antibodies targeting the HA head, the authors demonstrate that these antibodies can crosslink two HA trimers in cis, even when the trimers are further apart than adjacent HAs. Notably, the study reveals that HA flexibility, rather than density, is the key determinant modulating antibody crosslinking. Even at a 10-fold reduced HA density compared to the original, the antibodies retained their ability to crosslink trimers.

      This study provides critical insights into the relationship between HA density, flexibility, and antibody function, adding to the broader understanding of antibody crosslinking-a topic frequently discussed in the field of influenza research. These findings could have significant implications for vaccine design, particularly for strategies involving the display of the HA ectodomain on nanoparticles, potentially guiding the development of more effective influenza vaccines. Furthermore, the broader relevance of these findings may extend to other viruses with similar structural and immunological properties.

      My expertise lies in the structural determination of antibody-antigen complexes in influenza and other pathogens. While I may not have sufficient expertise to evaluate specific technical details of the fluorescence-based methods employed, the authors have convincingly demonstrated the robustness of their experimental design and interpretation, supported by appropriate controls.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6, and -10, which were unable to infect the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase renders the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage-PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters. 

      Strengths: 

      The experiments carried out in this manuscript are thoughtful and rational and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlight the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures. 

      Weaknesses: 

      While the paper is strong, I do feel that further discussions could have gone into the decision to focus on CLEW-1 for the majority of the paper. The paper also doesn't provide any detailed information on the genetic composition of the phages. It is unclear if the phages isolated are temperate or virulent. Many temperate phages enter the lytic cycle in response to QS signalling, and while the data as it is doesn't suggest that is the case, perhaps the paper would be strengthened by further elimination of this possibility. At the very least it might be worth mentioning in the discussion section. 

      Thank you for your review. The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]. It turns out that the Clew phage are highly related, which is highlighted by the genomic comparison in the supplementary figure S1. It therefore made sense to focus our in-depth analysis on one of the phage. We have included a supplementary figure (S1A), demonstrating that the other Clew phage also require an intact psl locus for infection, to make that logic clearer. The phage are virulent (there is apparently a bit of a debate about this with regard to Bruynogheviruses, but we have not been able to isolate lysogens). This is now mentioned in the discussion.  

      Reviewer #2 (Public review): 

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria. While an interesting suggestion, the manner in which this paper is written makes it difficult to draw this conclusion. Also, some of the results do not directly follow from the data as presented and some relevant controls seem to be missing. 

      Thank you for your review. We would argue that the combination of demonstrating Psl-dependent binding of Clew-1 to P. aeruginosa, as well as demonstration of direct binding of Clew-1 to affinity-purified Psl, indicates that the phage binds directly to Psl and uses it as a receptor. In looking at the recommendations, it appears that the remark about controls refers to not using the ∆pslC mutant alone (as opposed to the ∆fliF2 ∆pslC double mutant) as a control for some of the binding experiments. However, since the ∆fliF2 mutant is more permissive for phage infection, analyzing the effect of deleting pslC in the context of the ∆fliF2 mutant background is the more stringent test. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      First off, I would like to congratulate the authors on this study and manuscript. It is very well executed and the writing and flow of the paper are excellent. The findings are intriguing and I believe the paper will be very well received by both the phage, Pseudomonas, and biofilm communities. 

      Thank you for your kind review of our work!

      I have very little to critique about the paper but I have listed a few suggestions that I believe could strengthen the paper if corrected: 

      Comments and suggestions: 

      (1) The paper initially describes 4 isolated phages but no rationale is given for why they chose to continue with CLEW-1, as opposed to CLEW-3, -6, and -10. The paper would benefit from going into more detail with phage genomics and perhaps characterize the phage receptor binding to PSL. 

      Clew-1, -3, -6, and -10 are actually quite similar to one another. The genomes are now uploaded to Genbank [accession# PQ790658.1, PQ790659.1, PQ790660.1, and PQ790661.1]. They all require an intact Psl locus for infection, we have updated Fig. S1 to show this for the remaining Clew phage. In the end, it made sense to focus on one of these related phage and characterize it in depth.

      (2) PA14 was used in some experiments but not listed in the strain table. 

      Thank you, this has been added in the resubmission.

      (3) Would have been good to see more strains/isolates used.

      We are currently characterizing the host range of Clew-1. It appears to be pretty limited, but this will likely be included in another paper that will focus on host range, not only of Clew-1, but other biofilm-tropic phage that we have isolated since then.

      (4) Could purified PSL be added to make non-PSL strain (like PA14) susceptible? 

      We have tried adding purified Psl to a psl mutant strain, but this does not result phage sensitivity. Further characterization of the Psl receptor, is something we are currently working on, but will likely be a much bigger story than can be easily accommodated in a revised manuscript.

      (5) No data on resistance development. 

      We have not done this as yet.

      (6) Alternative biofilm models. Both in vitro and in vivo. 

      We agree that exploring the interaction of Clew-1 with biofilms in greater detail is a logical next step. The revised manuscript does have data on the viability of P. aeruginosa biofilm bacteria after Clew-1 infection using either a bead biofilm model or LIVE/DEAD staining of static biofilms. However, expanding on this further (setting up flow-cell biofilms, developing reporters to monitor phage infection, etc.) is beyond the scope of this initial report and characterization of Clew-1.

      (7) There is a mistake in at least one reference. An unknown author is listed in reference 48. DA Garsin is not part of the paper. Might be worth looking into further mistakes in the reference list as I suspect this might be an issue related to the citation software.

      Thank you. Yes, odd how that extra author got snuck in. This has been corrected.

      (8) I don't seem to be able to locate a Genbank file or accession number. If it wasn't performed how was evolutionary relatedness data generated?

      The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]

      (9) No genomic information about the isolated phages. Are they temperate or virulent? This would be important information as only strictly lytic phages are currently deemed appropriate for phage therapy. 

      These phage are virulent. We have only been able to isolate resistant bacteria from plaques, but they do not harbor the phage (as detected by PCR). This matches what other researchers have found for Bruynogheviruses.

      Reviewer #2 (Recommendations for the authors): 

      Others have used different PA mutants lacking known phage receptors to pan for new phages. However, it is not totally clear how the screen here was selected for the Psl-specific phage. The authors used flagella and pili mutants and found Clew-1, -3, -6, and -10. These were all Bruynogheviruses. They also isolated a phage that uses the O antigen as a receptor. The family of this latter phage and how it is known to use this as a receptor is not described. 

      Phage Ocp-2 is a Pbunavirus. We added new supplementary figure S3, addressing the O-antigen receptor.

      The authors focused on Clew-1, but the receptor for these other Clew phages is not presented. For Clew-1 the phage could plaque on the fliF deletion mutant but not the wild-type strain. The reason for this never appears to be addressed. The authors leap to consider the involvement of c-di-GMP, but how this relates to fliF appears to be lacking. 

      We have included a supplementary figure demonstrating that all the Clew phage require Psl for infection (Fig. S1A). As noted above, we have uploaded the genomic data that underpins the comparison in our supplementary figure. The phage are all closely related. It therefore made sense to focus on one of the phage for the analysis.  

      It is particularly unclear why this phage doesn't plaque on PAO1 as this strain does make Psl. Related to this, it actually looks like something is happening to PAO1 in Figure S4 (although what units are on the x-axis is not entirely clear).

      We hypothesize that the fraction of susceptible cells in the population dictates whether the phage can make overt plaques. The supplementary figure S4 indicates that a subpopulation of the wild-type culture is susceptible and this is borne out by the fraction of wild type cells that the phage can bind to (~50%). The fliF mutation increases this frequency of susceptible cells to 80-90% (Fig. 3).

      The Tnseq screen to identify receptors is clever and identifies additional phosphodiesterase genes, the deletion of which makes PAO1 susceptible. And the screen to find resistant fliF mutants identified genes involved in Psl. However, the link between the phosphodiesterase mutants and the amount of Psl produced never appears to be established. And the statement that Psl is required for infection (line 130) is never actually tested.

      The link between c-di-GMP and Psl production is well-established in the literature. I think the requirement for Psl in infection is demonstrated multiple ways, including lack of plaque formation on psl mutant strains and lack of phage binding to strains that do not produce Psl, direct binding of the phage to affinity purified Psl.

      Figure 2C describes using a ∆fliF2 strain but how this is different (or if it is different) from ∆fliF described in the text is never explained.

      The difference in the deletions is explained in table S1, in the description for the deletion constructs used in their construction, pEXG2-∆fliF and pEXG2-∆fliF2 (∆fliF2 is smaller than ∆fliF and can be complemented completely with our complementing plasmid, pP37-fliF, which is the reason why we used the ∆fliF2 mutation going forward, rather than the ∆fliF mutation on which the phage was originally isolated).

      Similarly, there is a sentence (line 138) that "Attachment of Clew-1 is Psl-dependent" but this would appear to have no context.

      The relevant figure, Fig. 3, is cited in the next sentence and is the subject of the remaining paragraphs in this section of the manuscript.

      For Figure 3B, why wasn't the single ∆pslC mutant visualized in this analysis? Similar questions relate to the data in Figure 4.

      Analyzing the effect of the pslC deletion in the context of the ∆fliF2 mutant background, which is more permissive for phage infection, is the more stringent test.  

      The efficacy of Clew-1 in the mouse keratitis model is intriguing but it is unclear why the CFU/eye are so variable. The description of how the experiment was actually carried out is not clear. Was only one eye scratched or both? Were controls included with a scratch and no bacteria ({plus minus} phage)?

      One eye was infected. We did not conduct a no-bacteria control (just scratching the cornea is not sufficient to cause disease). The revised manuscript has an updated animal experiment in which we carried the infection forward to 72h with two phage treatments. Following this regiment, there is a significant decrease in CFU, as well as corneal opacity (disease). Variability of the data is a fairly common feature in animal experiments. There are a number of factors, such as does the mouse blink and remove some of the inoculum shortly after deposition of the bacteria or the phage after each treatment that could explain this variability.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Within the analysis we undertook we did look at paralogous blocks in pangraph, based on copy number per genome. However, this could have been clearer in the text and we will rectify this. We also focussed on duplicated/deleted blocks that were present in two of more sub-lineages. This is noted in figure 4 legend but we will make this clearer in other sections of the manuscript.

      We agree that indeed the way paralogs are handled could still be optimised, and that gene duplicates of some genes could have biological importance. The reviewer is suggesting that a synteny analysis between genomes would be best for finding specific regions that are duplicated/deleted within a genome, and if those sections are duplicated/deleted in the same regions of the genome. Since Pangraph does not give such information readily, a larger amount of analysis would be required to confirm such genome position-specific duplications. While this is indeed important, we deem this to be out of scope for the current publication, but will note this as a limitation in the discussion. However, this does not fundamentally change the main conclusions of our analysis.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph.

      Strengths:

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis.

      Weaknesses:

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable.

      We have now added a robust overview of the dataset to supplementary file 1. This is split into 3 sections: public genomes, which were assembled by others; sequenced genomes, which were created and assembled by us; the BUSCO information for all the genomes together. We did not assemble any public data ourselves but retrieved these from elsewhere. We have modified the text to be more specific on this (Line 114 onwards) and the supplementary file is updated to better outline the data.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided.

      We have now included an analysis where we looked to see if the sequencing platform influenced the resulting accessory genome size and the pseudogene count. The details of this are included in lines 207-219, and the results are outlined in lines 251-258. Essentially, we found no correlation between sequencing platform and genome characteristics, although less stringent cut-offs did suggest that PacBio SMRT-only assembled genomes may have larger accessory genomes. We do not believe this is enough to influence our larger inferences from this data. It should be noted that complete genomes, in general, give a better indication of pangenome size compared to draft genomes, as has been shown previously (e.g. Marin et al., 2024). Even with some small potential bias, this makes our analysis more robust than any previously published.

      In relation to the sequencing depth of our own data, all genomes had coverage above 30x, which Sanderson et al. (2024) has shown to be sufficient for highly accurate sequence recovery. We fixed an issue with the L9 isolate from the previous submission, which resulted in a better BUSCO score and overall quality of that isolate and the overall dataset.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done. It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies.

      We apologise for the ambiguity in the methodology. All the isolates were inputted to Pangraph to create the pangenome using this method. This is now made clearer in lines 175-177. Standard pangenome statistics (size, genome fluidity, etc.) derived from this Pangraph output are now present in the results section as well (lines 301-320).

      We then only looked at regions of difference at the sub-lineage level, meaning we grouped genomes by sub-lineage within the resulting graph and looked for blocks common between isolates of the same sub-lineage but absent from one or more other sub-lineages. We did this from both the Panaroo output and the Pangraph output and then retained only blocks found by both. The results of this are now outlined in lines 351-383.

      We focussed on these sub-lineage-specific regions to focus on long-term evolution patterns and not be influenced by single-genome short-term changes. We do not have enough genomes of closely related isolates to truly look at very recent evolution, although the small accessory genome indicates this is not substantial in terms of gene presence/absence. We also did not want potential mis-annotations in a single genome to heavily influence our findings due to the potential issues pointed out by the reviewer above. We state this more clearly in the introduction (lines 106-108), methods (lines 184-186) and results (345-347), and we indicate the limitations in the Discussion, lines 452-457 and 471-473. We also changed the title to ‘shaped’ instead of ‘driven by’.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step.

      We apologise for the confusion here as indeed the RDs terminology is very MTBC-specific. Current RDs are always relevant to H37Rv, as that is how original discovery of these regions was done and that is how RDScan works. We clarify this in the introduction (lines 67-68). If we found a large sequence polymorphism (e.g. by Pangraph) and searched for known RDs using RDScan, we then assigned a current RD name to this LSP. This uses H37Rv as a reference. If we did not find a known RD, we then classified the LSP as a new RD if it is present in H37Rv, or left the designation as an LSP if not in H37Rv, thus expanding the analysis beyond the H37Rv-centric approaches used by others previously. This is hopefully now made clearer in the methods, lines 187-194.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail.

      We have amended the introduction to explain this terminology better (lines 67-68). Naming of the RDs here came from using RDScan to assign current names to any accessory regions we found and if such a region was not a known RD, we gave it a lineage-related name, allowing for proper RD naming later (lines 187-194). Because the Bespiatyk paper is the basis for RDScan, our work implicitly compares to this throughout, as any RDs we find which were not picked up by RDScan are thus novel compared to that paper.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC.

      We have now amended the analysis to specifically designate a structural variant as a deletion if present in the majority of strains and absent in a minority, or an insertion/duplication if present in a minority and absent in a majority (lines 191-192). We also ran Panaroo without merging paralogs to examine duplication in this output; Pangraph implicitly includes paralogs already.

      From all these analyses we did not find any structural variants classed as insertions/duplications and did not find paralogs to be a major feature at the sub-lineage level (lines 377-383). While these features could be important on shorter timescales, we do not have enough closed genomes to confidently state this (limitation outlined in lines 452-457). Therefore, our assertion that deletions are a primary force shaping the long-term evolution in this group still holds.

      Reviewer #2 (Public Review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed.

      Weaknesses:

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this.

      We have included a Heaps law and genome fluidity calculation for each pangenome estimation to demonstrate that the pangenome is closed. This is detailed in lines 225-228 with results shown in lines 274-278 and 316- 320 and Supplementary Figure 2. We agree that more closely related genomes would benefit a future version of this analysis and indicate we indicate the limitations in the Discussion, lines 452-457 and 471-473.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      l. 24, "with distinct genomic features". I'm not sure what you are referring to here.

      We refer to the differences in accessory genome and related functional profiles but did not want to bloat the abstract with such additional details

      Introduction

      l. 40, "L1 to L9". A lineage 10 has been described recently: https://doi.org/10.3201/eid3003.231466.

      We have updated the text and the reference. Unfortunately, no closed genome for this lineage exists so we have not included it in the analyses. We note this in the results, like 232

      l.62/3, "caused by the absence of horizontal gene transfer, plasmids, and recombination". Recombination is not absent in the MTBC, only horizontal gene transfer seems to be, which is what the cited studies show. Indeed a few sentences later homologous recombination is mentioned as a cause of deletions.

      This has now been removed from the introduction

      l. 67, "within lineage diversity is thought to be mostly driven by SNPs". Again I'm not sure what is meant here with "driven by". Point mutations are probably the most common mutational events, but duplications, insertions, deletions, and gene conversion also occur and can affect large regions and possibly important genes, as shown in a recent preprint (https://doi.org/10.1101/2024.03.08.584093).

      We have changed the text to say ‘mostly composed of’. While indeed other SNVs may be contributing, the prevailing thought at lineage level is that SNPs are the primary source of diversity. The linked pre-print is looking at within transmission clusters and this has not been described at the lineage level, which could be done in a future work.

      l. 100/1. "that can account for variations in virulence, metabolism, and antibiotic resistance". I would phrase this conservatively since the functional inferences in this study are speculative.

      This has now been tempered to be less specific.

      Methods

      l. 108. That an assembly has a single contig does not mean that it is "closed". Many single contig assemblies on NCBI are reference-guided short-read assemblies, that is, fragments patched together rather than closed assemblies. The same could be true for long-read assemblies.

      We specifically chose those listed as closed on NCBI so rely on their checks to ensure this is true. We have stated this better in the paper, line 117.

      l. 111. From Supplementary Table 1 understand that for many genomes only the reads were available (no ASM number). Did you assemble these genomes? If yes, how? The assembly method is not indicated in the supplement, contrary to what is written here.

      All public genomes were downloaded in their assembled forms from the various sources. This is specified better in the text (line 118) and the supplementary table 1 now lists the accessions for all the assemblies.

      l. 113. How many assemblies passed this threshold? And is BUSCO actually useful to assess assembly quality in the MTBC? I assume the dynamic, repetitive gene families that cause problems for assembly and mapping in TB (PE, PPE, ESX) do not figure in the BUSCO list of single-copy orthologs.

      All assemblies passed the BUSCO thresholds for high-quality genomes as laid out in Supplementary Table 1. While indeed this does not include multi-copy genes such as PE/PPE we focussed on regions of difference at the sub-lineage level where two or more genomes represent that sub-lineage. This means any assembly issues in a single genome would need to be exactly the same in another of the same sub-lineage to be included in our results. Through this, we aimed to buffer out issues in individual assemblies.

      l. 147: Why is Panaroo used with -merge-paralogs? I understand that near-identical genes may not be too interesting from a functional perspective, but if the aim of the analysis is to make broad claims about processes driving genome evolution, paralogs should be considered.

      We chose to do so with merged paralogs to look for larger patterns of diversity beyond within-genome paralogs. Additionally, this was required to build the core phylogenetic tree. However, as the reviewer points out, this may bias our findings towards deletions and away from duplications as a primary evolutionary force.

      We repeated this without the merged paralogs option and indeed found a larger pangenome, as outlined in Table 1. However, at the sub-lineage level, this did not result in any new presence/absence patterns (lines 381-383). This means the paralogs tended to be in single genomes only. This still indicates that deletions are the primary force in the longer-term evolution of the complex but indeed on shorter spans this may be different.

      l. 153: remove the comment in brackets.

      This has been fixed and the proper URL placed in instead.

      l. 159: which genomes, and why those?

      This is now clarified to state all genomes were used for this analysis.

      l. 161, "gene blocks": since this analysis is introduced as capturing the non-coding part of the genome, maybe just call them "blocks"?

      All references to gene blocks are now changed to genomic blocks to be more specific.

      l. 162: what happens with blocks that yield no hits against RvD1, TbD1, and H37Rv?

      We named these with lineage-specific names (supplementary table 4) but did not assign RD names specifically.

      l. 164: where does the information about the regions of difference come from? How exactly were these regions determined?

      Awe have expanded this section to be more specific on the use of RDScan and new naming, along with how we determine if something is an RD/LSP.

      Results

      l. 185ff: This paragraph gives many details about the geographic origin of the samples, but what I'd expect here is a short description of assembly qualities, for example, the results of the BUSCO analysis, a description of your own Nanopore assemblies, or a small analysis of the number of indels/pseudogenes relative to sequencing technology or coverage (see comment in the public review).

      This section (lines 231-258) has been expanded considerably to give a better overview of the dataset and any potential biases. Supplementary table 1 has also been expanded to include more information on each strain.

      l. 187, "324 genomes published previously": 322 according to the methods section.

      The number has been fixed throughout to the proper total of public genomes (329).

      l. 201: define the soft core, shell, and cloud genes.

      This is now defined on line 262

      l. 228, "defined primarily by RD105 and RD207 deletions": this claim seems to come from the analysis of variable importance (Factoextra), which should be made clear here.

      This has been clarified on line 333.

      l. 237, "L8, serving as the ancestor of the MTBC": this is incorrect, equivalent to saying that the Chimpanzee is the ancestor of Homo sapiens.

      We have changed this to basal to align with how it is described in the original paper.

      l. 239, "The accessory genome of the MTBC". It is a bit confusing that the same term, 'accessory genome', is used here for the graph-based analysis, which is presented as a way to look at the non-coding part of the genome.

      We have clarified the terminology on line 347 and improved consistency throughout.

      l. 240/1, "specific to certain lineages and sublineages". What exactly do you mean by "specific" to? Present only in members of a certain lineage/sublineage? In all members of a certain lineage/sublineage? Maybe an additional panel in Figure 5, showing examples of lineage- and sublineage-specific variants, would help the reader grasp this key concept.

      We have clarified this on line 349 and the legend of what is now figure 4.

      l. 241/2, "82 lineage and sublineage-specific genomic regions ranging from 270 bp to 9.8 kb". Were "gene blocks" filtered for a minimum size, or why are there no variants smaller than 270 bp? A short description of all the blocks identified in the graph could be informative (their sizes, frequencies ...).

      Yes, a minimum of 250bp was set for the blocks to only look at larger polymorphisms. This is clarified on line 177 and 304.

      A second point: It is not entirely clear to me what Figure 6 is showing. Are you showing here a single representative strain per sublineage? Or have you somehow summarized the regions of difference shown in Figure 5 at the sublineage level? What is the tree on the left? This should be made clear in the legend and maybe also in the methods/results.

      In figure 4 (which was figure 6), because each RD is common to all members of the same sub-lineage, we have placed a single branch for each sub-lineage. This is has been clarified in the legend.

      l. 254, "this gene was classified as being in the core genome": why should a partially deleted gene not be in the core genome?

      You are correct, we have removed that statement.

      l. 258/259, "The Pangraph alignment approach identified partial gene deletion and non-coding regions of the DNA that were impacted by genomic deletion". I do not understand how you classify a structural variant identified in the pangenome graph as a deletion or an insertion.

      This has been clarified as relative to H37Rv, as this is standard practice for RDs and general evolutionary analyses in MTBC, as outlined above.

      l. 262/263 , "the accessory genome of the MTBC is small and is acquired vertically from a common ancestor within the lineage". If deletion is the main process involved here, "acquired" seems a bit strange.

      We agree and changed the header to better reflect the discussion on mis-annotation issues

      Figure 1: Good to know, but not directly relevant for the rest of the paper. Maybe move it to the supplement?

      This has been moved to Supplementary figure 1

      Figure 2: the y-axis is labeled 'Variable genome size', but from the text and the legend I figure it should be 'Number of accessory genes'?

      This has been changed to ‘accessory genes’ in Figure 1 (which was figure 2 in previous version).

      Figure 4: too small.

      We will endeavour to ensure this is as large as possible in the final version.

      Discussion

      l. 271, "MTBC accessory genome is ... acquired vertically". See above.

      Changed, as outlined above.

      l. 292, "appeared to be fragmented genes caused by misassemblies". Is there a way to distinguish "true" pseudogenes from misassemblies? This could be a relevant issue for low-coverage long-read assemblies (see public review).

      Not that we are currently aware of, but we do know other groups which are working on this issue.

      l. 300/1, "the whole-genome approach could capture higher genetic variations". Do you mean the graph approach? I'm not sure that comparing the two approaches here makes sense, as they serve different purposes. A pangenome graph is a summary of all genetic variation, while the purpose of Panaroo is to study gene absence/presence. So by definition, the graph should capture more genetic variation.

      This statement was specifically to state that much genetic variation in MTBC is outside the coding genes and so traditional “pangenome’ analyses are actually not looking at the full genomic variation.

      l. 302/3, "this method identified non-coding regions of the genome that were affected by genomic deletions". See the comments above regarding deletions versus insertions. I'd say this method identifies coding and non-coding regions that were affected by genomic deletions and insertions.

      We have undertaken additional analyses to be sure these are likely deletions, as outlined above.

      l. 305: what are "lineage-independent deletions"?

      We labelled these as convergent evolution, now clarified on line 443.

      l. 329: How is RD105 "caused" by the insertion of IS6110? I did not find RD105 mentioned in the Alonso et al. paper. Similarly below, l. 331, how is RD207 "linked" to IS6110?

      The RD105 connection was misattributed as IS6110 insertion is related to RD152, not RD105. This has now been removed.

      RD207 is linked to IS6110 as its deletion is due to recombination between two such elements. This is now clarified on line 486.

      l. 345, "the growth advantage gene group": not quite sure what this is.

      We have fixed this on line 499 to state they are genes which confer growth advantages.

      l. 373ff: The role of genetic drift in the evolution of the MTBC is an open question, other studies have come to different conclusions than Hershberg et al. (this has been recently reviewed: https://doi.org/10.24072/pcjournal.322).

      We have outlined this debate better in lines 527-531

      l. 375/6, "Gene loss, driven by genetic drift, is likely to be a key contributor to the observed genetic diversity within the MTBC." This sentence would need some elaboration to be intelligible. How does genetic drift drive gene loss?

      We have removed this.

      l. 395/6, "... predominantly driven by genome reduction. This observation underlines the importance of genomic deletions in the evolution of the MTBC." See comments above regarding deletions. I'm not convinced that your study really shows this, as it completely ignores paralogs and the processes counteracting reductive genome evolution: duplication and gene amplification.

      As outlined above, we have undertaken additional analyses to more strongly support this statement.

      l. 399, "the accessory genome of MTBC is a product of gene deletions, which can be classified into lineage-specific and independent deletions". Again, I'm not sure what is meant by lineage-independent deletions.

      We have better defined this in the text, line 443, to be related to convergent evolution.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      In lines 120-121, it is mentioned that TB-profiler v4.4.2 was used for lineage classification, but this version was released in February 2023. As I understand there have been some changes (inclusion/exclusion) of certain lineage markers. Would it not be appropriate to repeat lineage classification with a more recent version? This would of course require extensive re-analysis, so could the lineage marker database perhaps also be cited.

      We have rerun all the genomes through TB-Profiler v6.5 and updated the text to state this; the exact database used is also now stated.

      Could the authors perhaps include the sequencing summary or quality of the nanopore sequences? The L9 (Mtb8) sample had a relatively lower depth and resulted in two contigs. Yet one contig was the initial inclusion criteria. It is unclear whether these samples were excluded from some of the analyses. Mtb6 also has relatively low coverage. Was the sequencing quality adequate to accurately identify all the lineage markers, in particular those with a lower depth of coverage? Could a hybrid approach be an inexpensive way to polish these assemblies?

      We reanalysed the L9 sample and, with some better cleaning, got it to a single contig with better depth and overall score. This is outlined in the Supplementary table 1 sheets. While depth is average, it is still above the recommended 30x, which is needed for good sequence recovery (Sanderson et al., 2024). We did indeed recover all lineage markers from these assemblies.

      Recommendations for improving the writing and presentation.

      The introduction is well-written and recent MTBC pangenomic studies have been incorporated, but I am curious as to why this paper was not referred to: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922483/ I believe this was the first attempt to study the pangenome, albeit with a different research question. Nearly all previous analyses largely focused on utilizing the pangenome to investigate transmission.

      Indeed this study did look at a pangenome of sorts, but specifically SNPs and not genes or regions. Since the latter is the main basis for pangenome work these days, we chose not to include this paper.

      Minor corrections to the text and figures.

      In line 129, it is explained that DNA was extracted to be suitable for PacBio sequencing, but ONT sequencing was used for the 11 new sequences. Is this a minor oversight or do the authors feel that DNA extracted for PacBio would be suitable for ONT sequencing? It is a fair assumption.

      We apologise, this is a long-read extraction approach and not specific to PacBio. We have amended the text to state this.

      In line 153, this should be removed: (Conor, could you please add the script to your GitHub page?).

      This has been fixed now.

    1. Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Comments on revisions:

      Specific minor comments:

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-Off-BCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......".

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported that their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channel's interactions with MSPs might alter its structure, possibly not accurately representing/reflecting functional states of the channel.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is a promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane mimetics that could alter channel structure. Comparing liposomal ELIC to structures in various-sized nanodiscs gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      The overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. Including comparisons of the results to the native bacterial lipid environment would provide a more encompassing discussion of how the determined liposome structures might or might not relate to the native receptor in its native environment. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, no functional studies were performed to validate this statement.

      The goal of this study was to determine structures of ELIC in the same lipid environment in which its function is characterized. However, it is also worth noting that phosphatidylethanolamine and phosphatidylglyerol, two lipids used for the liposome formation, are necessary for ELIC function (PMID 36385237) and principal lipid components of gram-negative bacterial membranes in which ELIC is expressed.

      The desensitized structure of ELIC in liposomes shows a pore diameter at the hydrophobic L240 (9’) residue of 3.3 Å, which is anticipated to pose a large energetic barrier to the passage of ions due to the hydrophobic effect. We have included a graphical representation of pore diameters from the HOLE analysis for all liposome structures in Supplementary Figure 6B. While we have not tested the role of L240 in desensitization with functional experiments, it was shown by Gonzalez-Gutierrez and colleagues (PMID 22474383) that the L240A mutation apparently eliminates desensitization in ELIC. This finding is consistent with L240 (9’) being the desensitization gate of ELIC. We have referenced this study when discussing the desensitization gate in the Results.

      Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitized, and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether, these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.

      The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weaknesses

      Core parts of the method are not described and/or discussed in enough detail. While I do believe that liposomes will be, in most cases, better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure, and should be an integral part of the report. Therefore, I strongly felt that biochemistry should be better described and discussed. The results section starts with "Optimal reconstitution of ELIC in liposomes [...] was achieved by dialysis". There is no information on why dialysis is optimal, what it was compared to, the distribution of liposome sizes using different preparation techniques, etc... Reading the title, I would have expected a couple of paragraphs and figure panels on liposome reconstitution. Similarly, potential biochemical challenges are not discussed. The methods section mentions that the sample was "dialyzed [...] over 5-7 days". In such a time window, most of the members of this protein family would aggregate, and it is therefore a protocol that can not be directly generalised. This has to be mentioned explicitly, and a discussion on why this can't be done in two days, what else the authors tested (biobeads? ... ?) would strengthen the manuscript.

      To a lesser extent, the relative lack of both technical details and of a broad discussion also pertains to the cryoEM and thallium flux results. Regarding the cryoEM part, the authors focus their analysis on reconstructions from outward-facing particles on the basis of their better resolutions, yet there was little discussion about it. Is it common for liposome-based structures? Are inward-facing reconstructions worse because of the increased background due to electrons going through two membranes? Are there often impurities inside the liposomes (we see some in the figures)? The influence of the membrane mimetics on conformation could be discussed by referring to other families of proteins where it has been explored (for instance, ABC transporters, but I'm sure there are many other examples). If there are studies in other families of channels in liposomes that were inspirational, those could be mentioned. Regarding thallium flux assays, one argument is that they give access to kinetics and set the stage for time-resolved cryoEM, but if I did not miss it, no comparison of kinetics with other techniques, such as electrophysiology, nor references to eventual pioneer time-resolved studies are provided.

      Altogether, in my view, an updated version would benefit from insisting on every aspect of the methodological development. I may well be wrong, but I see this paper more like a milestone on sample prep for cryoEM imaging than being about the details of the ELIC conformations.

      Additions have been made to the Results and Discussion sections elaborating on the following points: 1) reconstitution of ELIC in liposomes using dialysis, the advantage of this over other methods such as biobeads, and whether the dialysis protocol can be shortened for other less stable proteins; 2) the issue of separating outward- and inward-facing channels; 3) referencing the effect of nanodiscs on ABC transporters, structures of membrane proteins in liposomes, and pioneering time-resolved cryo-EM studies; and 4) comparison of the kinetics of ELIC gating kinetics with electrophysiology measurements. With regards to the first point, it should be noted that all necessary details are provided in the Methods to reproduce the experiments including the reconstitution and stopped-flow thallium flux assay. It is also important to note that the same preparation for making proteoliposomes was used for assessing function using the stopped-flow thallium flux assay and for determining the structure by cryo-EM. This is now stated in the Results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major revisions:

      (1) The authors suggest that the desensitization gate is located at the 9' region within the pore. However, as stated by the authors, the 2' residues function as the desensitization gate in related channels. In a few of their HOLE analyzed structures (e.g. Figure 2B and 4B), there seems to be a constriction also at 2', but this finding is not discussed in the context of desensitization. Further functional testing of mutated 9' and/or 2' gates would bolster the argument for the location of the desensitization gate.

      As stated above, we have included HOLE plots of pore radius in Supplementary Fig. 6B and referenced the study showing that the L240A mutation (9’) in ELIC (PMID 22474383) appears to eliminate desensitization. This result along with the narrow pore diameter at 9’ in the desensitized structure suggests that 9’ is likely a desensitization gate in ELIC. In contrast, mutation of Q233 (2’) to a cysteine in a previous study produced a channel that still desensitizes (PMID 25960405). Since Q233 is a hydrophilic residue in contrast to L240, Q233 probably does not pose the same energetic barrier to ion translocation as L240 based on the structure.

      (2) In discussing functional states of ELIC and ELIC5 in different reconstitution methods, the authors reference constriction sites determined by HOLE analysis software. These constriction sites were key evidence for the authors to determine functional state, however, it is difficult to discern pore sizes based on the figures. Pore diameters and clear color designation (ie, green vs orange) with the figures would greatly aid their discussions.

      HOLE plots are displayed in Supplementary Fig. 6B and pore diameters are not provided in the text.

      (3) The authors had an intriguing finding that ELIC dimers are found in spNW25 scaffolds. Is there any functional evidence to suggest they could be functioning as dimers?

      There is no evidence that the function of ELIC or other pLGICs is altered by the formation of dimers of pentamers. Therefore, while this result is intriguing and likely facilitated by concentrating multiple ELIC pentamers within the nanodisc, it is not clear if these interactions have any functional importance. We have stated this in the Results.

      (4) Thallium flux assay to validate channel function within proteoliposomes. Proteoliposomes are known to be generally very leaky membranes, would be good to have controls without ELIC added to determine baseline changes in fluorescence.

      We have established from multiple previous studies that liposomes composed of 2:1:1 POPC:POPE:POPG (PMID 36385237 and 31724949) do not show significant thallium flux as measured by the stopped-flow assay (PMID 29058195) in the absence of ELIC activity. Furthermore, in the present study, the data in Fig. 1A of WT ELIC shows a low thallium flux rate 60 seconds after exposure to agonist when the ion channel has mostly desensitized. Therefore, this data serves also as a control indicating that the high thallium flux rates in response to agonist (at earlier delay times) are not due to leak, but rather due to ELIC channel activity.

      Minor revisions:

      (1) Abstract and introduction. 'Liganded' should be ligand

      We removed this word and changed it to “agonist-bound” for consistency throughout the manuscript.

      (2) Inconsistent formatting of FSC graphs in Supplemental Figure 4

      The difference is a consequence of the different formatting between cryoSPARC and Relion FSC graphs.

      Reviewer #2 (Recommendations for the authors):

      Minor writing remarks:

      The present report builds on previous work from the same team, and to my eye it would be a plus if this were conveyed more explicitly. I see it as a strength to explore various developments in several papers that complement each other. E.g in the introduction when citing reference 12 (Dalal 2024), later in introducing ref 15 (Petroff 2022), I wish I was reminded of the main findings and how they fit with the new results.

      We have expanded on the Results and Discussion detailing key findings from these studies that are relevant to the current study.

      Suggestions for analysis:

      Data treatment. Maybe I missed it, but I wondered if C1 vs C5 treatment of the liposome data showed any interesting differences? When I think about the biological membrane, I picture it as a very crowded place with lots of neighbouring proteins. I would not be surprised if, similarly to what they do in discs, the receptor would tend to stick to, or bump into, anything present also in liposomes (a neighboring liposome, some undefined density inside the liposome).

      We attempted to perform C1 heterogeneous refinement jobs in cryoSPARC and C1 3D classification in Relion5. For the WT datasets, these did not produce 3D reconstructions that were of sufficient quality for further refinement. For ELIC5 with agonist, the C1 reconstructions were not different than the C5 reconstructions. Furthermore, there was no evidence of dimers of pentamers from the 2D or 3D treatments, unlike what was observed in the spNW25 nanodiscs. This is likely because the density of ELIC pentamers in the liposomes was too low to capture these transient interactions. We have included this information in the Methods.

      In data treatment, we sometimes find only what we're looking for. I wondered if the authors tried to find, for instance, the open and D conformations in the resting dataset during classifications.

      This is an interesting question since some population of ELIC channels could visit a desensitized conformation in the absence of agonist and this would not be detected in our flux assay. After extensive heterogeneous refinement jobs in cryoSPARC and 3D classification jobs in Relion5, we did not detect any unexpected structures such as open/desensitized conformations in the apo dataset.

      In the analysis of the M4 motions, is there info to be gained by looking at how it interacts with the rest of the TMD? For instance, I wondered if the buried surface area between M4 and the rest was changed. Also one could imagine to look at that M4 separately in outward-facing and inward-facing conformations (because the tension due to the bilayer will not be the same in the outer layer in both orientations - intuitively, I'd expect different levels of M4 motions)

      We have expanded our analysis of the structures as recommended. We determined the buried surface area between M4 and the rest of the channel in the liganded WT and ELIC5 structures in liposomes and nanodiscs, as well as the area between the TMD interfaces for these structures. There appears to be a pattern where liposome structures show less buried surface area between M4 and the rest of the channel, and less area at the TMD interfaces. Overall, this suggests that the liposome structures of ELIC in the open-channel or desensitized conformations are more loosely packed in the TMD compared to the nanodisc structures.

      We have also further discussed the issue of separating outward- and inward-facing conformations in the Results. The problem with classifying outward- and inward-facing orientations is that top/down or tilted views of the particles cannot be easily distinguished as coming from channels in one orientation or the other, unless there are conformational differences between outward- and inward-facing channels that would allow for their separation during 3D heterogeneous refinement or 3D classification. Furthermore, since the inward-facing reconstructions are of much lower resolution than the outward-facing reconstructions, we suspect that these particles are more heterogeneous possibly containing junk, multiple conformations, or particles that are both inward- and outward-facing. On the other hand, the outward-facing structures are of good quality, and therefore we are more confident that these come from a more homogeneous set of particles that are likely outward-facing (Note that most particles are outward facing based on side views of the 2D class averages). That said, when examining the conformation of M4 in outward- and inward-facing structures, we do not see any significant differences with the caveat that the inward-facing structures are of poor quality and that inward- and outward-facing particles may not have been well-separated.

    1. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific. Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology.

      Starting with claim 1, there are three subclaims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not strongly supported.

      (1B) The experiment cannot support the claim that people represent or track elasticity because effort is the only dimension over which participants can engage in any meaningful decision-making. The other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies. Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort.

      Notes on rebuttal: The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      Notes on rebuttal: The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct (the authors claim otherwise, but see Fig 6C). However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency (SOA) and the elasticity bias---this result is consistent with any possible relationship (even a negative one). As it turns out, Figure S3 shows that there is effectively no relationship (r=0.03).

      Notes on rebuttal: The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences about elasticity inference. In the original submission, the authors stated that the study was designed to be "especially sensitive to overestimation of elasticity". A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias.

      When we further consider that elasticity inference is the only meaningful learning/decision-making problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      Notes on rebuttal: I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity-eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      Minor comments:

      Below are things to keep in mind.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    1. Author response:

      The following is the authors’ response to the original reviews

      Summary of our revisions

      (1) We have explained the reason why the untrained RNN with readout (value-weight) learning only could not well learn the simple task: it is because we trained the models continuously across trials with random inter-trial intervals rather than separately for each episodic trial and so it was not trivial for the models to recognize that cue presentation in different trials constitutes a same single state since the activities of untrained RNN upon cue presentation should differ from trial to trial (Line 177-185).

      (2) We have shown that dimensionality was higher in the value-RNNs than in the untrained RNN (Fig. 2K,6H).

      (3) We have shown that even when distractor cue was introduced, the value-RNNs could learn the task (Fig. 10).

      (4) We have shown that extended value-RNNs incorporating excitatory and inhibitory units and conforming to the Dale's law could still learn the tasks (Fig. 9,10-right column).

      (5) In the original manuscript, the non-negatively constrained value-RNN showed loose alignment of value-weight and random feedback from the beginning but did not show further alignment over trials. We have clarified its reason and found a way, introducing a slight decay (forgetting), to make further alignment occur (Fig. 8E,F).

      (6) We have shown that the value-RNNs could learn the tasks with longer cue-reward delay (Fig. 2M,6J) or action selection (Fig. 11), and found cases where random feedback performed worse than symmetric feedback.

      (7) We compared our value-RNNs with e-prop (Bellec et al., 2020, Nat Commun). While e-prop incorporates the effects of changes in RNN weights across distant times through "eligibility trace", our value-RNNs do not. The reason why our models can still learn the tasks with cue-reward delay is considered to be because our models use TD error and TD learning itself, even TD(0) without eligibility trace, is a solution for temporal credit assignment. In fact, TD error-based e-prop was also examined, but for that, result with symmetric feedback, but not with random feedback, was shown (their Fig. 4,5) while for another setup of reward-based e-prop without TD error, result with random feedback was shown (their SuppFig. 5). We have noted these in Line 695-711 (and also partly in Line 96-99).

      (8) In the original manuscript, we emphasized only the spatial locality (random rather than symmetric feedback) of our learning rule. But we have now also emphasized the temporal locality (online learning) as it is also crucial for bio-plausibility and critically different from the original value-RNN with BPTT. We also changed the title.

      (9) We have realized that our estimation of true state values was invalid (as detailed in page 34 of this document). Effects of this error on performance comparisons were small, but we apologize for this error.

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      *please note that we numbered your public review comments and recommendations for the authors as Pub1 and Rec1 etc so that we can refer to them in our replies to other comments.

      Pub1. The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained.

      These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      We have added an explanation of untrained RNN in Line 144-147:

      “As a negative control, we also conducted simulations in which these connections were not updated from initial values, referring to as the case with "untrained (fixed) RNN". Notably, the value weights w (i.e., connection weights from the RNN to the striatal value unit) were still trained in the models with untrained RNN.”

      We have also analyzed the dimensionality of network dynamic by calculating the contribution ratios of each principal component of the trajectory of RNN activities. It was revealed that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN. We have added these results in Fig. 2K and Line 210-220 (for our original models without non-negative constraint):

      “In order to examine the dimensionality of RNN dynamics, we conducted principal component analysis (PCA) of the time series (for 1000 trials) of RNN activities and calculated the contribution ratios of PCs in the cases of oVRNNbp, oVRNNrf, and untrained RNN with 20 RNN units. Figure 2K shows a log of contribution ratios of 20 PCs in each case. Compared with the case of untrained RNN, in oVRNNbp and oVRNNrf, initial component(s) had smaller contributions (PC1 (t-test p = 0.00018 in oVRNNbp; p = 0.0058 in oVRNNrf) and PC2 (p = 0.080 in oVRNNbp; p = 0.0026 in oVRNNrf)) while later components had larger contributions (PC3~10,15~20 p < 0.041 in oVRNNbp; PC5~20 p < 0.0017 in oVRNNrf) on average, and this is considered to underlie their superior learning performance. We noticed that late components had larger contributions in oVRNNrf than in oVRNNbp, although these two models with 20 RNN units were comparable in terms of cue~reward state values (Fig. 2J-left).”

      and Fig. 6H and Line 412-416 (for our extended models with non-negative constraint):

      “Figure 6H shows contribution ratios of PCs of the time series of RNN activities in each model with 20 RNN units. Compared with the cases with naive/shuffled untrained RNN, in oVRNNbp-rev and oVRNNrf-bio, later components had relatively high contributions (PC5~20 p < 1.4×10,sup>−6</sup> (t-test vs naive) or < 0.014 (vs shuffled) in oVRNNbp-rev; PC6~20 p < 2.0×10<sup>−7</sup> (vs naive) or PC7~20 p < 5.9×10<sup>−14</sup> (vs shuffled) in oVRNNrf-bio), explaining their superior value-learning performance.”

      Regarding the poor performance of the model with untrained RNN, we would like to add a note. It is sure that untrained RNN with sufficient dimensions should be able to well represent just <10 different states, and state values should be able to be well learned through TD learning regardless of whatever representation is used. However, a difficulty (nontriviality) lies in that because we modeled the tasks in a continuous way, rather than in an episodic way, the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using backprop-through-time (BPTT) for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      Pub2. The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      In the revised manuscript, we examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps. Our online value RNN models with random feedback could still achieve better performance (smaller squared value error) than the models with untrained RNN, although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      Also, we have added the note about our assumption and consideration on the time-step that we described in our provisional reply in Line 136-142:

      “We assumed that a single RNN unit corresponds to a small population of neurons that intrinsically share inputs and outputs, for genetic or developmental reasons, and the activity of each unit represents the (relative) firing rate of the population. Cortical population activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics [46] such as short-term facilitation, whose time constant can be around 500 milliseconds [47]. Therefore, we assumed that single time-step of our rate-based (rather than spike-based) model corresponds to 500 milliseconds.”

      Pub3. In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      We examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units” and described the details of the extended models in Line 844-862:

      Pub4. Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      We examined the performance of the models in a task in which distractor cue randomly appeared. As a result, our model with random feedback, as well as the model with backprop, could still learn the state values much better than the models with untrained RNN. We have added these results in Fig. 10 and subsection “4.2 Task with distractor cue”

      Reviewer #1 (Recommendations for the authors):

      Detailed comments to authors

      Rec1. Are the untrained RNNs discussed in methods? It seems quite good in estimating value but has a strong dopamine response at time of reward. Is nothing trained in the untrained RNN or are the W values trained. Untrained RNN are not bad at estimating value, but not as good as the two other options. It would seem reasonable that an untrained RNN (if I understand what it is) will be sufficient for such simple Pavlovian conditioning paradigms. This is provided that the RNN generates a complete, or nearly complete basis. Random RNN's provided that the random weights are chosen properly can indeed generate a nearly complete basis. Once there is a nearly complete temporal basis, it seems that a powerful enough learning rule will be able to learn the very simple Pavlovian conditioning. Since there are only 3 time-steps from cue to reward, an RNN dimensionality of 3 would be sufficient. A failure to get a good approximation can also arise from the failure of the learning algorithm for the output weights (W).

      As we mentioned in our reply to your public comment Pub1 (page 3-5), we have added an explanation of "untrained RNN" (in which the value weights were still learnt) (Line 144-147). We also analyzed the dimensionality of network dynamics by calculating the contribution ratios of principal components of the trajectory of RNN activities, showing that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN (Fig. 2K/Line 210-220, Fig.6H/Line 412-416). Moreover, also as we mentioned in our reply to your public comment Pub1, we have added a note that even learning of a small number of states was not trivially easy because we considered continuous learning across trials rather than episodic learning of separate trials and thus it was not trivial for the model to know that cue presentation in different trials after random lengths of inter-trial interval should still be regarded as a same single state (Line 177-185).

      Rec2. For all cases, it will be useful to estimate the dimensionality of the RNN. Is the dimensionality of the untrained RNN smaller than in the trained cases? If this is the case, this might depend on the choice of the initial random (I assume) recurrent connectivity matrix.

      As mentioned above, we have analyzed the dimensionality of the network dynamics, and as you said, the dimensionality of the model with untrained RNN (which was indeed the initial random matrix as you said, as we mentioned above) was on average smaller than the trained value RNN models (Fig. 2K/Line 210-220, Fig.6H/Line 412-416).

      Rec3. It is surprising that the error starts increasing for more RNN units above ~15. See discussion. This might indicate a failure to adjust the learning parameters of the network rather than a true and interesting finding.

      Thank you very much for this insightful comment. In the original manuscript, we set the learning rate to a fixed value (0.1), without normalization by the squared norm of feature vector (as we mentioned in Line 656-7 of the original manuscript) because we thought such a normalization could not be locally (biologically) implemented. However, we have realized that the lack of normalization resulted in excessively large learning rate when the number of RNN units was large and it could cause instability and error increase as you suggested. Therefore, in the revised manuscript, we have implemented a normalization of learning rate (of value weights) that does not require non-local computations, specifically, division by the number of RNN units. As a result, the error now monotonically decreased, as the number of RNN units increased, in the non-negatively constrained models (Fig. 6E-left) and also largely in the unconstrained model with random feedback, although still not in the unconstrained model with backprop or untrained RNN (Fig. 2J-left)

      Rec4. Not numbering equations is a problem. For example, the explanations of feedback alignment (lines 194-206) rely on equations in the methods section which are not numbered. This makes it hard to read these explanations. Indeed, it will also be better to include a detailed derivation of the explanation in these lines in a mathematical appendix. Key equations should be numbered.

      We have added numbers to key equations in the Methods, and references to the numbers of corresponding equations in the main text. Detailed derivations are included in the Methods.

      Rec5. What is shown in Figure 3C? - an equation will help.

      We have added an explanation using equations in the main text (Line 256-259).

      Rec6. The explanation of why alignment occurs is not satisfactory, but neither is it in previous work on feedforward networks. The least that should be done though

      Regarding why alignment occurs, what remained mysterious (to us) was that in the case of nonnegatively constrained model, while the angle between value weight vector (w) and the random feedback vector (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials, despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added these in the revised manuscript (Line 463-477):

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Rec7. I don't understand the qualitative difference between 4G and 4H. The difference seems to be smaller but there is still an apparent difference. Can this be quantified?

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      Rec8. More biologically realistic constraints.

      Are the weights allowed to become negative? - No.

      Figure 6C - untrained RNN with non-negative x_i. Again - it was not explained what untrained RNN is. However, given my previous assumption, this is probably because the units developed in an untrained RNN is much further from representing a complete basis function. This cannot be done with only positive values. It would be useful to see network dynamics of units for untrained RNN. It might also be useful in all cases to estimate the dimensionality of the RNN. For 3 time-steps, it needs to be at least 3, and for more time steps as in Figure 4, larger.

      As we mentioned in our reply to your public comment Pub3 (page 6-8), in the revised manuscript we examined models that incorporated inhibitory and excitatory units and followed Dale's law, which could still learn the tasks (Fig. 9, Line 479-520). We have also analyzed the dimensionality of network dynamics as we mentioned in our replies to your public comment Pub1 and recommendations Rec1 and Rec2.

      Rec9. A new type of untrained RNN is introduced (Fig 6D) this is the first time an explanation of of the untrained RNN is given. Indeed, the dimensionality of the second type of untrained RNN should be similar to the bioVRNNrf. The results are still not good.

      In the model with the new type of untrained RNN whose elements were shuffled from trained bioVRNNrf, contribution ratios of later principal components of the trajectory of RNN activities (Fig. 6H gray dotted line) were indeed larger than those in the model with native untrained RNN (gray solid line) but still much smaller than those in the trained value RNN models with backprop (red line) or random feedback (blue line). It is considered that in value RNN, RNN connections were trained to realize high-dimensional trajectory, and shuffling did not generally preserve such an ability.

      Rec10. The discussion is too long and verbose. This is not a review paper.

      We have made the original discussion much more compact (from 1686 words to 940 words). We have added new discussion, in response to the review comments, but the total length remains to be shorter than before (1589 words).

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain nonnegative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      We have examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units”.

      We have also examined the performance of the models in a task in which distractor cue randomly appeared, finding that our models could still learn the state values much better than the models with untrained RNN. We have added these result in Fig. 10 and subsection “4.2 Task with distractor cue”.

      Regarding the depth, we continue to think about it but have not yet come up with concrete ideas.

      Reviewer #2 (Recommendations for the authors):

      (1) I think the work would greatly benefit from more proofreading. There are language errors/oddities throughout the paper, I will list just a few examples from the introduction:

      Thank you for pointing this out. We have made revisions throughout the paper.

      line 63: "simultaneously learnt in the downstream of RNN". Simultaneously learnt in networks downstream of the RNN? Simulatenously learn in a downstream RNN? The meaning is not clear in the original sentence.

      We have revised it to "simultaneously learnt in connections downstream of the RNN" (Line 67-68).

      starting in line 65: " A major problem, among others.... value-encoding unit" is a run-on sentence and would more readable if split into multiple sentences.

      We have extensively revised this part, which now consists of short sentences (Line 70-75).

      line 77: "in supervised learning of feed-forward network" should be either "in supervised learning of a feed-forward network" or "in supervised learning of feed-forward networks".

      We have changed "feed-forward network" to "feed-forward networks" (Line 83).

      (2) Under what conditions can you use an online learning rule which only considers the influence of the previous timestep? It's not clear to me how your networks solve the temporal credit assignment problem when the cue-reward delay in your tasks is 3-5ish time steps. How far can you stretch this delay before your networks stop learning correctly because of this one-step assumption? Further, how much does feedback alignment constrain your ability to learn long timescales, such as in Murray, J.M. (2019)?

      The reason why our models can solve the temporal credit assignment problem at least to a certain extent is considered to be because temporal-difference (TD) learning, which we adopted, itself has a power to resolve temporal credit assignment, as exemplified in that TD(0) algorithms without eligibility trance can still learn the value of distant rewards. We have added a discussion on this in Line 702-705:

      “…our models do not have "eligibility trace" (nor memorable/gated unit, different from the original value-RNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]).”

      We have also examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps, and our models with random feedback could still achieve better performance than the models with untrained RNN although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      As for the difficulty due to random feedback compared to backprop, there appeared to be little difference in the models without non-negative constraint (Fig. 2M), whereas in the models with nonnegative constraint, when the cue-reward delay was elongated to 6 time-steps, the model with random feedback performed worse than the model with backprop (Fig. 6J bottom-left panel).

      (3) Line 150: Were the RNN methods trained with continuation between trials?

      Yes, we have added

      “The oVRNN models, and the model with untrained RNN, were continuously trained across trials in each task, because we considered that it was ecologically more plausible than episodic training of separate trials.” in Line 147-150. This is considered to make learning of even the simple cue-reward association task nontrivial, as we describe in our reply to your comment 9 below.

      (4) Figure 2I, J: indicate the statistical significance of the difference between the three methods for each of these measures.

      We have added statistical information for Fig. 2J (Line 198-203):

      “As shown in the left panel of Fig. 2J, on average across simulations, oVRNNbp and oVRNNrf exhibited largely comparable performance and always outperformed the untrained RNN (p < 0.00022 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units), although oVRNNbp somewhat outperformed or underperformed oVRNNrf when the number of RNN units was small (≤10 (p < 0.049)) or large (≥25 (p < 0.045)), respectively.”

      and also Fig. 6E (for non-negative models) (Line 385-390):

      “As shown in the left panel of Fig. 6E, oVRNNbp-rev and oVRNNrf-bio exhibited largely comparable performance and always outperformed the models with untrained RNN (p < 2.5×10<sup>−12</sup> in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units), although oVRNNbp-rev somewhat outperformed or underperformed oVRNNrf-bio when the number of RNN units was small (≤10 (p < 0.00029)) or large (≥25 (p < 3.7×10<sup>−6</sup>)), respectively…”

      Fig. 2I shows distributions, whose means are plotted in Fig. 2J, and we did not add statistics to Fig. 2I itself.

      (5) Line 178: Has learning reached a steady state after 1000 trials for each of these networks? Can you show a plot of error vs. trial number?

      We have added a plot of error vs trial number for original models (Fig. 2L, Line 221-223):

      “We examined how learning proceeded across trials in the models with 20 RNN units. As shown in Fig. 2L, learning became largely converged by 1000-th trial, although slight improvement continued afterward.”

      and non-negatively constrained models (Fig. 6I, Line 417-422):

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      As shown in these figures, learning became largely steady at 1000 trials, but still slightly continued, and we have added simulations with 3000 trials (Fig. 2M and Fig. 6J).

      (6) Line 191: Put these regression values in the figure caption, as well as on the plot in Figure 3B.

      We have added the regression values in Fig. 3B and its caption.

      (7) Line 199: This idea of being in the same quadrant is interesting, but I think the term "relatively close angle" is too vague. Is there another more quantatative way to describe this what you mean by this?

      We have revised this (Line 252-254) to “a vector that is in a relatively close angle with c , or more specifically, is in the same quadrant as (and thus within at maximum 90° from) c (for example, [c<sub>1</sub>  c<sub>2</sub>  c<sub>3</sub>]<sup>T</sup> and [0.5c<sub>1</sub> 1.2c<sub>2</sub> 0.8c<sub>3</sub>]T) “

      (8) Line 275: I'd like to see this measure directly in a plot, along with the statistical significance.

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      (9) Line 280: Surely the untrained RNN should be able to solve the task if the reservoir is big enough, no? Maybe much bigger than 50 units, but still.

      We think this is not sure. A difficulty lies in that because we modeled the tasks in a continuous way rather than in an episodic way (as we mentioned in our reply to your comment 3), the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using BPTT for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      (10) It's a bit confusing to compare Figure 4C to Figure 4D-H because there are also many features of D-H which do not match those of C (response to cue, response to late reward in task 1). It would make sense to address this in some way. Is there another way to calculate the true values of the states (e.g., maybe you only start from the time of the cue) which better approximates what the networks are doing?

      As we mentioned in our replies to your comments 3 and 9, our models with RNN were trained continuously across trials rather than separately for each episodic trial, and whether the models could still learn the state representation is a key issue. Therefore, starting learning from the time of cue would not be an appropriate way to compare the models, and instead we have made statistical comparison regarding key features, specifically, TD-RPEs at early and late rewards, as indicated in Fig. 4D-H.

      (11) Line 309: Can you explain why this non-monotic feature exists? Why do you believe it would be more biologically plausible to assume monotonic dependence? It doesn't seem so straightforward to me, I can imagine that competing LTP/LTD mechanisms may produce plasticity which would have a non-monotic dependence on post-synaptic activity.

      Thank you for this insightful comment. As you suggested, non-monotonic dependence on the postsynaptic activity (BCM rule) has been proposed for unsupervised learning (cortical self-organization) (Bienenstock et al., 1982 J Neurosci), and there were suggestions that triplet-based STDP could be reduced to a BCM-like rule and additional components (Gjorgjieva et al., 2011 PNAS; Shouval, 2011 PNAS). However, the non-monotonicity appeared in our model, derived from the backprop rule, is maximized at the middle and thus opposite from the BCM rule, which is minimized at the middle (i.e., initially decrease and thereafter increase). Therefore we consider that such an increase-then-decreasetype non-monotonicity would be less plausible than a monotonic increase, which could approximate an extreme case (with a minimum dip) of the BCM rule. We have added a note on this point in Line 355-358:

      “…the dependence on the post-synaptic activity was non-monotonic, maximized at the middle of the range of activity. It would be more biologically plausible to assume a monotonic increase (while an opposite shape of nonmonotonicity, once decrease and thereafter increase, called the BCM (Bienenstock-Cooper-Munro) rule has actually been suggested [56-58]).”

      (12) Line 363: This is the most exciting part of the paper (for me). I want to learn way more about this! Don't hide this in a few sentences. I want to know all about loose vs. feedback alignment. Show visualizations in 3D space of the idea of loose alignment (starting in the same quadrant), and compare it to how feedback alignment develops (ending in the same quadrant). Does this "loose" alignment idea give us an idea why the random feedback seems to settle at 45 degree angle? it just needs to get the signs right (same quadrant) for each element?

      In reply to this encouraging comment, we have made further analyses of the loose alignment. By the term "loose alignment", we meant that the value weight vector w and the feedback vector c are in the same (non-negative) quadrant, as you said. But what remained mysterious (to us) was while the angle between w and c was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the nonnegative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      As for visualization, because the model's dimension was high such as 12, we could not come up with better ways of visualization than the trial versus angle plot (Fig. 3A, 8A,F). Nevertheless, we would expect that the abovementioned additional analyses of loose alignment (with graphs) are useful to understand what are going on.

      (13) Line 426: how does this compare to some of the reward modulated hebbian rules proposed in other RNNs? See Hoerzer, G. M., Legenstein, R., & Maass, W. (2014). Put another way, you arrived at this from a top-down approach (gradient descent->BP->approximated by RF->non-negativity constraint>leads to DA dependent modulation of Hebbian plasticity). How might this compare to a bottom up approach (i.e. starting from the principle of Hebbian learning, and adding in reward modulation)

      The study of Hoerzer et al. 2014 used a stochastic perturbation, which we did not assume but can potentially be integrated. On the other hand, Hoerzer et al. trained the readout of untrained RNN, whereas we trained both RNN and its readout. We have added discussion to compare our model with Hoerzer et al. and other works that also used perturbation methods, as well as other top-down approximation method, in Line 685-711 (reference 128 is Hoerzer et al. 2014 Cereb Cortex):

      “As an alternative to backprop in hierarchical network, aside from feedback alignment [36], Associative Reward-Penalty (A<sub>R-P</sub>) algorithm has been proposed [124-126]. In A<sub>R-P</sub>, the hidden units behave stochastically, allowing the gradient to be estimated via stochastic sampling. Recent work [127] has proposed Phaseless Alignment Learning (PAL), in which high-frequency noise-induced learning of feedback projections proceeds simultaneously with learning of forward projections using the feedback in a lower frequency. Noise-induced learning of the weights on readout neurons from untrained RNN by reward-modulated Hebbian plasticity has also been demonstrated [128]. Such noise- or perturbation-based [40] mechanisms are biologically plausible because neurons and neural networks can exhibit noisy or chaotic behavior [129-131], and might improve the performance of value-RNN if implemented.

      Regarding learning of RNN, "e-prop" [35] was proposed as a locally learnable online approximation of BPTT [27], which was used in the original value RNN 26. In e-prop, neuron-specific learning signal is combined with weight-specific locally-updatable "eligibility trace". Reward-based e-prop was also shown to work [35], both in a setup not introducing TD-RPE with symmetric or random feedback (their Supplementary Figure 5) and in another setup introducing TD-RPE with symmetric feedback (their Figure 4 and 5). Compared to these, our models differ in multiple ways.

      First, we have shown that alignment to random feedback occurs in the models driven by TD-RPE. Second, our models do not have "eligibility trace" (nor memorable/gated unit, different from the original valueRNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]). However, as mentioned before, single time-step in our models was assumed to correspond to hundreds of milliseconds, incorporating slow synaptic dynamics, whereas e-prop is an algorithm for spiking neuron models with a much finer time scale. From this aspect, our models could be seen as a coarsetime-scale approximation of e-prop. On top of these, our results point to a potential computational benefit of biological non-negative constraint, which could effectively limit the parameter space and promote learning.”

      Related to your latter point (and also replying to other reviewer's comment), we also examined the cases where the random feedback in our model was replaced with uniform feedback, which corresponds to a simple bottom-up reward-modulated triplet plasticity rule. As a result, the model with uniform feedback showed largely comparable, but somewhat worse, performance than the model with random feedback. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1)<sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN. and also added a biological implication of the results in Line 644-652:

      We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (postsynaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      We have examined the cases where the feedback was uniform, i.e., in the direction of (1, 1, ..., 1) in both models without and with non-negative constraint. In both models, the models with uniform feedback performed somewhat worse than the original models with random feedback, but still better than the models with untrained RNN. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN.”

      We have also added a discussion on the biological implication of the model with uniform feedback mentioned in our provisional reply in Line 644-652:

      “We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      In addition, while preparing the revised manuscript, we found a recent simulation study, which showed that uniform feedback coupled with positive forward weights was effective in supervised learning of one-dimensional output in feed-forward network (Konishi et al., 2023, Front Neurosci).

      We have briefly discussed this work in Line 653-655:

      “Notably, uniform feedback coupled with positive forward weights was shown to be effective also in supervised learning of one-dimensional output in feed-forward network [114], and we guess that loose alignment may underlie it.”

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      We have added a discussion on the prediction of our models, mentioned in our provisional reply, in Line 627-638:

      “oVRNNrf predicts that the feedback vector c and the value-weight vector w become gradually aligned, while oVRNNrf-bio predicts that c and w are loosely aligned from the beginning. Element of c could be measured as the magnitude of pyramidal cell's response to DA stimulation. Element of w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell. Then, the abovementioned predictions could be tested by (i) identify cortical, striatal, and VTA regions that are connected, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether DA→pyramidal responses and pyramidal→striatal responses are associated across pyramidal cells, and whether such associations develop through learning.”

      Moreover, we have considered another (technically more doable) prediction of our model, and described it in Line 639-643:

      “Testing this prediction, however, would be technically quite demanding, as mentioned above. An alternative way of testing our model is to manipulate the cortical DA feedback and see if it will cause (re-)alignment of value weights (i.e., cortical striatal strengths). Specifically, our model predicts that if DA projection to a particular cortical locus is silenced, effect of the activity of that locus on the value-encoding striatal activity will become diminished.”

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task? [1] https://www.nature.com/articles/s41467-020-17236-y

      As for a specific feature of non-negative models, we did not describe (actually did not well recognize) an intriguing result that the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left (please mind the difference in the vertical scales)). This suggests that the non-negative constraint effectively limited the parameter space and thereby learning became efficient. We have added this result in Line 392-395:

      “Remarkably, oVRNNrf-bio generally achieved better performance than both oVRNNbp and oVRNNrf, which did not have the non-negative constraint (Wilcoxon rank sum test, vs oVRNNbp : p < 7.8×10,sup>−6</sup> for 5 or ≥25 RNN units; vs oVRNNrf: p < 0.021 for ≤10 or ≥20 RNN units).”

      Also, in the models with non-negative constraint, the model with random feedback learned more rapidly than the model with backprop although they eventually reached a comparable level of errors, at least in the case with 20 RNN units. This is presumably because the value weights did not develop well in early trials and so the backprop-based feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning. We have added this result in Fig. 6I and Line 417-422:

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      We have also added a discussion on how our model can be positioned in relation to other models including the study you mentioned (e-prop by Bellec, ..., Maass, 2020) in subsection “Comparison to other algorithms” of the Discussion):

      Regarding the slightly better performance of the non-negative model with random feedback than that of the non-negative model with backprop when the number of RNN units was large (mentioned in our provisional reply), state values in the backprop model appeared underdeveloped than those in the random feedback model. Slightly better performance of random feedback than backprop held also in our extended model incorporating excitatory and inhibitory units (Fig. 9B).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In the cue-reward association task with 3 time-steps delay, the non-negative model with random feedback performed largely comparably to the non-negative model with backprop, and this remained to hold in a task where distractor cue, which was not associated with reward, appeared in random timings. We have added the results in Fig. 10 and subsection “4.2 Task with distractor cue”.

      We have also examined the cases where the cue-reward delay was elongated. In the case of longer cue-reward delay (6 time-steps), in the models without non-negative constraint, the model with random feedback performed comparably to (and slightly better than when the number of RNN units was large) the model with backprop (Fig. 2M). In contrast, in the models with non-negative constraint, the model with random feedback underperformed the model with backprop (Fig. 6J, left-bottom). This indicates a difference between the effect of non-negative random feedback and the effect of positive+negative random feedback.

      We have further examined the performance of the models in terms of action selection, by extending the models to incorporate an actor-critic algorithm. In a task with inter-temporal choice (i.e., immediate small reward vs delayed large reward), the non-negative model with random feedback performed worse than the non-negative model with backprop when the number of RNN units was small. When the number of RNN increased, these models performed more comparably. These results are described in Fig. 11 and subsection “4.3 Incorporation of action selection”.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      As for 7a), 'CSC (complete serial compound)' was actually not the name of the task but the name of the 'punctate' state representation, in which each state (timing from cue) is represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), ..., and (0, 0, ..., 1). As you pointed out, using the name of 'CSC' would make the text appearing more technical than it actually is, and so we have moved the reference to the name of 'CSC' to the Methods (Line 903-907):

      “For the agents with punctate state representation, which is also referred to as the complete serial compound (CSC) representation [1, 48, 133], each timing from a cue in the tasks was represented by a 10-dimensional one-hot vector, starting from (1 0 0 ... 0)<sup>T</sup> for the cue state, with the next state (0 1 0 ... 0) <sup>T</sup> and so on.”

      and in the Results we have instead added a clearer explanation (Line 163-165):

      “First, for comparison, we examined traditional TD-RL agent with punctate state representation (without using the RNN), in which each state (time-step from a cue) was represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), and so on.”

      As for 7b), we have added the rationale for our examination of the tasks with probabilistic structures (Line 282-294):

      “Previous work [54] examined the response of DA neurons in cue-reward association tasks in which reward timing was probabilistically determined (early in some trials but late in other trials). There were two tasks, which were largely similar but there was a key difference that reward was given in all the trials in one task whereas reward was omitted in some randomly determined trials in another task. Starkweather et al. [54] found that the DA response to later reward was smaller than the response to earlier reward in the former task, presumably reflecting the animal's belief that delayed reward will surely come, but the opposite was the case in the latter task, presumably because the animal suspected that reward was omitted in that trial. Starkweather et al.[54] then showed that such response patterns could be explained if DA encoded TD-RPE under particular state representations that incorporated the probabilistic structures of the task (called the 'belief state'). In that study, such state representations were 'handcrafted' by the authors, but the subsequent work [26] showed that the original value-RNN with backprop (BPTT) could develop similar representations and reproduce the experimentally observed DA patterns.”

      As for 7c), we have extensively revised the text of the results, adding high-level explanations while trying to reduce the lengthy low-level descriptions (e.g., Line 172-177 for Fig2E-G).

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      There is actually an unexpected finding with non-negative model: the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left), presumably because the nonnegative constraint effectively limited the parameter space and thereby learning became efficient, as we mentioned in our reply to your point 6a above (we did not well recognize this at the time of original submission).

      Another potential merit of our present work is the simplicity of the model and the task. This simplicity enabled us to derive an intuitive explanation on why feedback alignment could occur. Such an intuitive explanation was lacking in previous studies while more precise mathematical explanations did exist. Related to the mechanism of feedback alignment, one thing remained mysterious to us at the time of original submission. Specifically, in the non-negatively constraint random feedback model, while the angle between the value weight (w) and the random feedback (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Correction of an error in the original manuscript

      In addition to revising the manuscript according to your comments, we have made a correction on the way of estimating the true state values. Specifically, in the original manuscript, we defined states by relative time-steps from a reward and estimated their values by calculating the sums of discounted future rewards starting from them through simulations. However, we assumed variable inter-trial intervals (ITIs) (4, 5, 6, or 7 time-steps with equal probabilities), and so until receiving cue information, agent should not know when the next reward will come. Therefore, states for the timings up to the cue timing cannot be defined by the upcoming reward, but previously we did so (e.g., state of "one timestep before cue") without taking into account the ITI variability.

      We have now corrected this issue, having defined the states of timings with respect to the previous (rather than upcoming) reward. For example, when ITI was 4 time-steps and agent existed in its last time-step, agent will in fact receive a cue at the next time-step, but agent should not know it until actually receiving the cue information and instead should assume that s/he was at the last time-step of ITI (if ITI was 4), last − 1 (if ITI was 5), last − 2 (if ITI was 6), or last − 3 (if ITI was 7) with equal probabilities (in a similar fashion to what we considered when thinking about state definition for the probabilistic tasks). We estimated the true values of states defined in this way through simulations. As a result, the corrected true value of the cue-timing has become slightly smaller than the value described in the original manuscript (reflecting the uncertainty about ITI length), and consequently small positive TD-RPE has now appeared at the cue timing.

      Because we measured the performance of the models by squared errors in state values, this correction affected the results reporting the performance. Fortunately, the effects were relatively minor and did not largely alter the results of performance comparisons. However, we sincerely apologize for this error. In the revised manuscript, we have used the corrected true values throughout the manuscript, and we have described the ways of estimating these values in Line 919-976.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the three reviewers for the time and caution taken to assess our manuscript, and for their constructive feedback that will help improve the study. We herewith provide a revision plan, expecting that the additional experiments and corrections will address the key points raised by the reviewers.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      Summary: The manuscript by Delgado et al. reports the role of the actin remodeling Arp2/3 complex in the biology of Langerhans cells, which are specialized innate immune cells of the epidermis. The study is based on a conditional KO mouse model (CD11cCre;Arpc4fl/fl), in which the deletion of the Arp2/3 subunit ArpC4 is under the control of the myeloid cell specific CD11c promoter.

      In this model, the assembly of LC networks in the epidermis of ear and tail skin is preserved when examining animals immediately after birth (up to 1 week). Subsequently however LCs from ArpC4-deleted mice start displaying morphological aberrations (reduced elongation and number of branches at 4 weeks of age). Additionally, a profound decline in LC numbers is reported in the skin of both the ear and tail of young adult mice (8-10 weeks).

      To explore the cause of such decline, the authors then opt for the complementary in vitro study of bone-marrow derived DCs, given the lack of a model to study LCs in vitro. They report that ArpC4 deletion is associated with aberrantly shaped nuclei, decreased expression of the nucleoskeleton proteins Lamin A/C and B1, nuclear envelop ruptures and increased DNA damage as shown by γH2Ax staining. Importantly, they provide evidence that the defects evoked by ArpC4 deletion also occur in the LCs in situ (immunofluorescence of the skin in 4-week old mice).

      Increased DNA damage is further documented by staining differentiating DCs from ArpC4-deleted mice with the 53BP1 marker. In parallel, nuclear levels of DNA repair kinase ATR and recruitment of RPA70 (which recruits ATR to replicative forks) are reduced in the ArpC4-deleted condition. In vitro treatment of DCs with the topoisomerase II inhibitor etoposide and the Arp2/3 inhibitor CK666 induce comparable DNA damage, as well as multilobulated nuclei and DNA bridges. The authors conclude that the ArpC4-KO phenotype might stem, at least in part, from a defective ability to repair DNA damages occurring during cell division.

      The study in enriched by an RNA-seq analysis that points to an increased expression of genes linked to IFN signaling, which the authors hypothetically relate to overt activation of innate nucleic acid sensing pathways.

      The study ends by an examination of myeloid cell populations in ArpC4-KO mice beyond LCs. Skin cDC2 and cDC2 subsets display skin emigration defects (like LCs), but not numerical defects in the skin (unlike LCs). Myeloid cell subsets of the colon are also present in normal numbers. In the lungs, interstitial and alveolar macrophages are reduced, but not lung DC subsets. Collectively, these observations suggest that ArpC4 is essential for the maintenance of myeloid cell subsets that rely on cell division to colonize or to self-maintain within their tissue of residency (including LCs).

      MAJOR COMMENTS

      1. ArpC4 and Arp2/3 expression The authors argue that LCs from Arpc4KO mice should delete the Arpc4 gene in precursors that colonize the skin around birth. It would be important to show it to rule out the possibility that the lack of phenotype (initial seeding, initial proliferative burst) in young animals (first week) could be related to an incomplete deletion of ArpC4 expression. Also important would be to show what is happening to the Arp2/3 complex in LCs from Arpc4KO mice.

      __Response: __We thank this reviewer for the careful assessment of our manuscript. Regarding this specific comment, we would like to clarify that we do not expect ArpC4 to be deleted in LC precursors, as CD11c is only expressed once the cells have entered the epidermis. Instead, we expect the deletion to take place after birth around day 2-4 (Chorro et al., 2009). For this reason, we performed a deletion PCR of epidermal cells at postnatal day 7 (P7), a time at which the proliferative burst occurs. This analysis revealed CD11c-Cre-driven recombination in the ArpC4 locus (Fig. S2C). This experiment indicates that ArpC4 deletion does not alter LC proliferation and postnatal network formation.


      Revision plan: We will revise the manuscript text to more clearly explain when ArpC4 will be deleted during development when using the CD11c-Cre transgene, and better emphasize the rationale for the deletion PCR.

      In the in vitro studies with DCs, the level of ArpC4 and Arp2/3 deletion at the protein level is also not documented.


      __Response: __We have previously analyzed the expression of ArpC4 in BMDCs in a recent study, confirming its loss in CD11c-Cre;ArpC4fl/fl cells at the protein level: Rivera et al. Immunity 2022; doi: 10.1016/j.immuni.2021.11.008. PMID: 34910930 (Fig. S2D). Therefore, in the current manuscript we only refer to that paper (Results, first paragraph).

      The authors explain that surface expression of the CD11c marker, which drives Arpc4 deletion, gradually increased during differentiation of DCs: from 50% to 90% of the cells. Does that mean that loss of ArpC4 expression is only effective in a fraction of the cells examined before day 10 of differentiation (e.g. in the RNA-seq analysis)?

      __Response: __The reviewer is correct, there is heterogeneity in CD11c expression, which is inherent of these DC culture model, implying that Arpc4 gene deletion will be partial. However, despite this, we were able to detect significant differences between the transcriptomes of control and CD11c-Cre;ArpC4fl/fl DCs in early phases during differentiation, emphasizing that the phenotype of ArpC4 loss is robust.


      Revision Plan: We will include a notion on this heterogeneity in the revised manuscript text.

      Intra-nuclear versus extra-nuclear activities of Arp2/3

      The authors favor a model whereby intra-nuclear ArpC4 helps maintaining nuclear integrity during proliferation of DCs (and possibly LCs). However, multiple pools of Arp2/3 have been described and accordingly, multiple mechanisms may account for the observed phenotype: i) cytoplasmic pool to drive the protrusions sustaining the assembly of the LC network and its connectivity with keratinocytes ; ii) peri-nuclear pool to protect the nucleus ; iii) Intra-nuclear pool to facilite DNA repair mechanisms e.g. by stabilizing replicative forks (the scenario favored by the authors).


      __Response: __The referee is correct, and this is actually discussed in our manuscript (page 11, upper paragraph): we cannot exclude that several pools of branched actin are influencing the phenotype we here describe.

      Unfortunately, we have previously tested several antibodies against ArpC4, but in our hands, and despite comprehensive optimization, they did not yield specific signals that would enable us to assess changes in subcellular localization in murine cells. Upon this reviewer's comment, we will now reassess the available tools and found an antibody against ArpC2 (Millipore, Anti-p34-Arc/ARPC2, 07-227-I-100UG) that may work based on published data. We have ordered this product to test it for IF staining of ArpC2, hoping to be able to delineate the subcellular localization of ArpC2 in DCs and potentially LCs.

      Revision plan: Upon receipt, we will test the ArpC2 antibody (Millipore, #07-227-I-100UG) both in cultured DCs and in epidermal whole mounts, running various optimization protocols regarding fixation, permeabilization and blocking reagents, next to antibody dilution. That way we hope to be able to detect the subcellular localization of Arp2/3 complex components as requested by this reviewer.

      It is recommended that the authors try to gather more supportive data to sustain the intra-nuclear role. Documenting ArpC4 presence in the nucleus would help support the claim. It could be combined with treatments aiming at blocking proliferation in order to reinforce the possibility that a main function of ArpC4 is to protect proliferating cells by favoring DNA repair inside the nucleus.

      __Response: __We thank this reviewer for this very helpful comment. As outlined in the previous response, we will aim at obtaining subcellular localization data for Arp2/3 complex components, and along with that study a potential intranuclear localization. Beyond that, in comparison to commonly cultured cell types, however, we face two hurdles addressing the nuclear Arp2/3 role in full: 1) Due to poor transduction rates and epigenetic silencing, we cannot sufficiently express exogenous constructs such as ArpC4-NLS in DCs to assess the subcellular localization of Arp2/3 complex components. 2) We have performed preliminary tests to block proliferation in DCs, using the cyclin D kinase 1 inhibitor RO3306 at different concentrations and incubation times during DC differentiation. Unfortunately, most cells were found dead after treatment. Further lowering the inhibitor concentrations (below 3.5uM) will likely not block the cell cycle, rendering this approach unsuited.

      Revision plan: We will test the suitability of additional antibodies directed against Arp2/3 complex components to assess their subcellular localization, with the aim to discriminate peripheral cytoplasmic vs. perinuclear vs. intranuclear localization. In addition, we will add a comment in the discussion, further addressing this point. In the case we remain unable to pinpoint that Arp2/3 resides in the nucleus, we will further tone down our current phrasing in the discussion, also emphasizing the possibility that cytoplasmic or perinuclear pools of the complex may indirectly help maintain the integrity of the genome in LCs.

      Nuclear envelop ruptures

      The nuclear envelop ruptures are not sufficiently documented (how many cells were imaged? quantification?). The authors employ STED microscopy to examine Lamin B1 distribution. The image shown in Figure 4A does not really highlight the nuclear envelop, but rather the entire content. Whether it is representative is questionable. We would expect Lamin B1 staining intensity to be drastically reduced given the quantification shown in Figure 3D. In addition, although the authors have stressed in the previous figure that Arpc4-KO is associated with nucleus shape aberrations, the example shown in Figure 4A is that of a nucleus with a normal ovoid shape.

      It is recommended to quantify the ruptures with Lap2b antibodies (or another staining that would better delineate the envelop) in order to avoid the possible bias due to the reduced staining intensity of Lamin B1.

      __Response: __NE ruptures were quantified by imaging NLS-GFP-expressing DCs in microchannels to visualize leakage of their nuclear content (Fig. 4B,C). The STED image mentioned by the referee (Fig. 4A,D) was only shown to further illustrate examples of NE ruptures, here using Lamin B as an immunofluorescence marker for the NE. We do agree with the reviewer that it was not chosen optimally to represent the ArpC4-KO phenotype regarding nuclear shape and Lamin B1.

      Revision plan: We will provide representative examples of nuclear illustrations of the ArpC4-KO phenotype vs. control cells. In addition, we will perform STED microscopy of Lap2B immunostained DCs as suggested by the referee.

      A missing analysis is that of nuclear envelop ruptures as a function of nucleus deformations.

      __Response: __As stated in the manuscript (page 5, third paragraph), the morphology of DCs is quite heterogeneous. As mentioned above, nuclear rupture events were quantified by live-imaging of NLS-GFP expressing DCs, enabling the tracing of rupture events. Live imaging is the only robust manner to measure nuclear membrane rupture events as they are transient due to rapid membrane repair (Raab et al. Science 2016). The NLS-GFP label itself, however, is not accurate enough to also quantify nuclear deformations. The latter therefore was quantified after cell fixation, using DAPI and/or immunostaining for NE envelope markers (Figures 3 and S3).

      Revision plan: We will quantify nuclear deformations using Lap2B staining of the nuclear envelope as suggested by the referee.

      Fig 4B-C: same frequency of Arpc4-KO and WT cells displaying nuclear envelop ruptures in the 4-µm channels; however image show a rupture for the Arpc4-KO and no rupture for the WT cells (this is somehow misleading). Are ruptures similar in Arpc4-KO and WT cells in this condition?

      __Response: __We apologize for choosing an image that better reflects our quantification, our mistake.

      Revision plan: We will choose an image that better reflects our quantification.

      Fig 4D-E: is their a direct link between nuclear envelop ruptures and ƴH2A.X?

      __Response: __At present, we can only correlate the findings of increased gH2Ax and elevated events of nuclear envelope ruptures in ArpC4-KO DCs. Rescue experiments are very difficult to impossible in DCs (e.g. restoring Lamin A/C and B levels in the KOs and subsequently assessing the amount of DNA damage). While we are afraid that we cannot address a potential link between NE ruptures and DNA damage by experiments in a manner feasible within this manuscript's revision, we have discussed this interesting aspect based on observations in immortalized cell culture systems (page 10). However, we would like to note that this was indeed shown for different cell types in Nader et al. Cell 2021. This effect results from access of cytosolic nuclease Trex1 to nuclear DNA.

      Revision plan: This point will be clarified in our revised manuscript.


      Interesting (but optional) would be to understand what is happening to DNA, histones? Is their evidence for leakage in the cytoplasm?

      __Response: __We have not investigated so far. We will attempt to do so.

      Revision plan: To address this aspect, we plan to perform immunostainings for double-stranded DNA in the cytoplasm (along with an NE marker). This approach has been used in the field to mark cytoplasmic DNA.

      RNA seq analysis

      The RNA-seq analysis suffers from a lack of direct connection with the rest of the study. The extracted molecular information is not validated nor further explored. It remains very descriptive. The PCA analysis suggests a « more pronounced transcriptomic heterogeneity in differentiating Arpc4KO DCs ». However it seems difficult to make such a claim from the comparison of 3 mice per group. In addition, such heterogeneity is not seen in the more detailed analysis (Fig 5F). The authors claim that « day 10 control and Arpc4KO DCs showed no to very little differences in gene expression, in contrast to cells at days 7-9 of differentiation ». This is not obvious from the data displayed in the corresponding figure. In addition, it is not expected that cells that may take a divergent differentiation path at days 7-9 may would return to a similar transcriptional activity at day 10.

      A point that is not discussed is that before day 10 of DC differentiation, Arpc4 KO is expected to only occur in about 50% of the cell population. This is expected to impact the RNA-seq analysis.

      Not all clusters have been exploited (e.g. cluster 3 elevated, cluster 6 partly reduced). I suggest the authors reconsider their analysis and analysis of the RNA-seq analysis (or eventually invest in complementary analysis).

      __Response: __Despite a comprehensive analysis of the different transcriptomes of control and ArpC4 mutant cells during DC differentiation, we decided to focus the presentation and discussion of our RNAseq results on the most notable findings. Of these, the elevated innate immune responses in ArpC4-KO DCs (Fig. 5E,H) caught our particular attention, as this seemed highly meaningful in light of DC and LC functions.

      Revision plan: As suggested by the referee, in the revised manuscript, we will better connect the RNAseq data to the other cellular and molecular analyses shown, complementing these results by investigating the potential involvement of innate immune responses in the ArpC4-KO phenotype.

      What causes the profound numerical drop of LC in the epidermis?

      A major open question is what causes the massive drop of LCs. Although differentiating Arpc4KO DCs start accumulating DNA damage upon proliferation, they succeed in progressing through the cell cycle. There is even a slightly elevated expression of cell cycle genes at day 7 of differentiation in the DC model.

      Only a trend for increased apoptosis is observed in ear and tail skin. It would be important to provide complementary data documenting increased death (or aberrant emigration?) of LCs in the 4-8 week time window.

      __Response: __We agree with the reviewer that this is an important question. We exclude that elevated emigration causes the decline of LCs in ArpC4-KO epidermis, as ArpC4-mutant LCs are significantly reduced (and not increased) in skin-draining lymph nodes (Fig. 7E). To assess whether increased cell death contributed to LC loss, we have tried to identify LCs that are just about to die. As the reviewer noted, we could only observe a trend of apoptosis-positive LCs in mutant epidermis. We assume that this is because of a quick elimination of compromised LCs following DNA damage, with only a short time passing until LCs with impaired genome integrity will be cleared from the system, making it very difficult to detect gH2Ax-positive cells that are positive for markers of cell death.

      Revision plan: Despite the abovementioned expected limitations to detect DNA-damage-positive but viable LCs in vivo, for the manuscript revision we will collect 6-week-old mice to analyze LC numbers and apoptosis (cleaved Caspase-3), complementing our data derived from 7-day and 4-week-old mice (Figures S2A,B, S2E,F). Suited mice have been born end of May; we are ready to analyze them at 6-weeks of age, accordingly.

      Functional consequences

      Although the study reports novel aspects of LC biology, the consequence of ArpC4 deletion for skin barrier function and immunosurveillance are not investigated. It would seem very relevant to test how this model copes with radiation, chemical and/or microorganism challenges.

      __Response: __We fully agree with this reviewer that this is a very interesting point. Therefore, next to assessing the steady-state circulation of LCs and DCs, we also addressed the consequence of ArpC4 loss for LC function in chemically challenged skin: we performed skin painting experiments using the contact sensitizer fluorescein isothiocyanate (FITC), diluted in the sensitizing agent dibutyl phthalate (DBP), to detect cutaneous-derived phagocytes within draining lymph nodes. These experiments revealed that migration of Arpc4KO LCs (as well as of Arpc4KO DCs) to skin-draining lymph nodes was impaired (Fig. 7C-E), confirming an in vivo role of ArpC4 for immune cell migration to lymphatic organs following a chemical challenge. Considering the lengthy legal approval procedures for new animal experimentation procedures, additional in vivo challenges -beyond the provided FITC challenge study- would take at least 6 additional months, which would delay excessively the revision of our manuscript.

      Revision Plan: We will better explain the FITC painting experiment to highlight its importance.

      MINOR COMMENTS:

      1- Figure 1D

      Gating strategy: twice the same empty plots. The content seems to be missing... Does this need to be shown in the main figure?

      __Response: __We apologize for this problem; there might be a problem due to file conversion of PDF reader software. In our PDF versions (including the published bioRxiv preprint) we do see the data points (see below); however, we have earlier experienced incomplete FACS plots during manuscript preparation.


      Revision plan: We will take extra care and double-check the results after converting the figures into PDFs. In addition, we will provide JPG files when submitting the revised manuscript, to prevent such problems.

      2- Figure 2

      Best would be to keep same scale to compare P1 and P7 (tail skin, figure 2A)

      Response and revision plan: We will replace the examples with micrographs of comparable scale (already solved, will be provided with manuscript revision).

      Overlay of Ki67 and MHC-II does not allow to easily visualize the double-positive cells (Fig 2C)

      Response and revision plan: We will provide single-channel image next to the merged view, and improve the visualization of double-positive cells (already solved, will be provided with manuscript revision)

      Quality of Ki67 staining different for Arpc4-KO (less intense, less focused to the nuclei): a technical issue or could that reflect something?

      Response and revision plan: We thank the reviewer for spotting this. We have re-assessed all Ki67 micrographs and noted that the originally chosen examples indeed are not fully representative. We have meantime selected more representative examples of Ki67-positive cells in control and mutant tissues, reflecting no difference in the principal nature of Ki67 staining (already taken care of, will be provided with manuscript revision).

      Fig 2C: Panels mounted differently for ear and tail skin (different order to present the individual stainings, Dapi for tail skin only).

      Response and revision plan: We will harmonize the sequence of panels in figure 2 with submission of the revised manuscript.

      3- LC branch analysis (Fig 1 and 2)

      While Fig 1 indicates that ear skin LCs form in average twice as few branches as tail skin LCs (3-4 versus 8-9 branches per cell), Fig 2 shows the opposite (10-12 versus 6-7 branches per cell).

      Is this due to a very distinct pattern between the 2 considered ages (4 weeks versus 8-10 weeks)? Could the author double-check that there is no methodological bias in their analysis?


      Response: We thank the reviewer for hinting us on this apparent inconsistency. Indeed, our initial analysis suffered from a bias in detecting LC dendrites, as the tissue cellularity and overall morphology significantly differs between 4-week-old and adult animals: In adult animals, the immunostainings show a higher baseline background signal for the skin epithelium compared to P28. We had noted this beforehand and had adjusted the imaging pipeline accordingly, with a more stringent thresholding to eliminate background signals in the case of adult tissues. While we were able to detect the described ArpC4 phenotype, this strategy resulted in a reduced ability to detect dendrites (both in control and mutant tissues), explaining the seemingly reduced number of dendrites in adult vs. 4-week-old tissues.

      Revision plan: We have double-checked both the micrographs and the corresponding quantifications and did not identify errors. Instead, our assumption -that a too high stringency for background reduction in adults caused the discrepancy- turned out correct. At present, we are re-doing the detailled analyses of LC morphology at 4-week and adult stages by confocal microscopy using a 63x objective rather than a 40x objective as done previously. First results confirm that with this approach the number of LC dendrites across these ages are largely comparable, while the phenotypes of ArpC4 loss are retained. We will provide a completely new analysis with revision of the manuscript.

      4- Fig 3 E-G

      How many animals were examined (n=5)? Reproducible accros animals? Why was it done with 4-week animals (phenotype not complete? Event occurring before loss in numbers...)

      Response and revision plan: As mentioned in the figure legend for Fig. 3F we have analysed N = 4 control and N= 5 KO mice (for clarity, we will add this information to Figure 3E and G in the revised document). We chose the 4-week time-point as this was the stage when the loss of LCs first became apparent (even though non-significant at this age). We aimed to learn whether changes in nuclear morphology and nuclear envelope markers represented early molecular and cellular events following ArpC4 loss. Compared to later stages, this strategy poses a reduced risk to detect indirect effects of ArpC4 loss. We will clarify this in the revised manuscript text.

      Staining Lamin A/C globally more intense in the Arpc4-KO epidermis (also seems to apply to the masks corresponding to the LCs). Surprising to see that the quantification indicates a major drop of Lamin A/C intensity in the LCs.

      Response and revision plan: We again thank the reviewer for this careful assessment. The originally chosen micrographs are indeed not fully representative. As with many tissue stainings, there is inter-sample variability. We have now revisited the micrographs and did not find a significant global reduction of Lamin A/C in the entire epidermis (including keratinocytes/KCs). The drop of Lamin A/C intensity is restricted to ArpC4 LCs -and not KCs- and in line with the reduced Lamin A/C expression data in DCs (Fig. 3C,D). We have selected more representative examples, which will be provided with the revised manuscript.

      Legend Fig 4D replace confocal microscopy by STED microscopy

      Revision plan: We will replace "confocal microscopy" by "STED microscopy" accordingly.

      6- Figure 4F

      Intensity/background of γH2Ax staining very distinct between the 2 micrographs shown for WT and Arpc4-KO epidermis.

      Response and revision plan: We have revisited the micrographs and now selected more representative examples, which will be provided in the revised manuscript.

      7- Figure 7C, F, H

      Gating strategies: would be better to harmonize the style of the plots (dot plots and 2 types of contour plots have been used...)

      Response and revision plan: We agree and will provide a harmonized plot illustration in the revised manuscript.

      8- Figure 7H

      Legend of lower gating strategy seems to be wrong (KO and not WT).

      Response and revision plan: We thank the reviewer for pointing out this mistake. A corrected figure display will be provided with revision.

      Reviewer #1 (Significance (Required)):

      Strengths: the general quality of the manuscript is high. It is very clearly written and it contains a very detailed method section that would allow reproducing the reported experiments. This work entails a clear novelty in that it represents the first investigation of the role of ArpC4 in LCs. It opens an interesting perspective about specific mechanisms sustaining the maintenance of myeloid cell subsets in peripheral tissues. This work is therefore expected to be of interest for a large audience of cellular immunologists and beyond. Challenging skin function with an external trigger would lift the relevance for a even wider audience (see main point 6).

      __Response: __see point 6.

      Limitations: in its current version the manuscript suffers from a lack of solidity around a few analysis (see main points on ArpC4 and Arp2/3 protein expression, nuclear envelop rupture analysis,...). It also tends to formulate a narrative centered on the ArpC4 intra-nuclear function that is not definitely proven.

      The field of expertise of this reviewer is: cellular immunology and actin remodeling.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      SUMMARY This is a study in experimental mice employing both in vitro and, importantly, in vivo approaches. EPIDERMAL LANGERHANS CELLS serve as a paradigm for the maintenance of homeostasis of myeloid cells in a tissue, epidermis in this case. In addition to well known functions of the ACTIN NETWORK in cell migration, chemotaxis, cell adherence and phagocytosis the authors reveal a critical function of actin networks in the survival of cells in their home tissue.

      Actin-related proteins (Arp), specifically here the Arp2/3 complex, are necessary to form the filamentous actin networks. The authors use conditional knock-out mice where Arpc4 (an essential component of the Arp2/3 complex) is deleted under the control of CD11c, the most prominent dendritic cell marker which is also expressed on Langerhans cells. In normal mice, epidermal Langerhans cells reside in the epidermis virtually life-long. They initially settle the epidermis around and few days after birth an establish a dense network by a burst of proliferation and then they "linger on" by low level maintenance proliferation. In the epidermis of Arpc4 knock-out mice Langerhans cells also start off with this proliferative burst but, strikingly, they do not stay but are massively reduced by the age of 8-12 weeks.

      The analyses of this decline revealed that

      -- the shape (number of nuclear lobes) and integrity of cell nuclei was compromised; they were fragile and ruptured to some degree when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- DNA damage, as detected by staining for gamma-H2Ax or 53BP1 accumulated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- recruitment of DNA repair molecules was inhibited when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- gene signatures of interferon signaling and response were increased when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- in vivo migration of dendritic cells and Langerhans cells from the skin to the draining lymph nodes in an inflammatory setting (FITC painting of the skin) was impaired when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- the persistence of the typical dense network of Langerhans cells in the epidermis, created by proliferation shortly after birth, is abrogated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing. Importantly, this was not the case for myeloid cell populations that settle a tissue without needing that initial burst of proliferation. For instance, numbers of colonic macrophages were not affected when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing.

      Thus, the authors conclude that the Arp2/3 complex is essential by its formation of actin networks to maintain the integrity of nuclei and ensure DNA repair thereby ascertaining the maintenance proliferation of Langerhans cells and, as the consequence, the persistence of the dense epidermal netowrk of Langerhans cells.

      Up-to-date methodology from the fields of cell biology and cellular immunology (cell isolation from tissues, immunofluorescence, multiparameter flow cytometry, FISH, "good old" - but important - transmission electronmicroscopy, etc.) was used at high quality (e.g., immunofluorescence pictures!). Quantitative and qualitative analytical methods were timely and appropriate (e.g., Voronoi diagrams, cell shape profiling tools, Cre-lox gene-deletion technology, etc.). Importantly, the authors used a clever method, that they had developed several years ago, namely the analysis of dendritic cell migration in microchannels of defined widths. Molecular biology methods such as RNAseq were also employed and analysed by appropriate bioinformatic tools.

      MAJOR COMMENTS:

      • ARE THE KEY CONCLUSIONS CONVINCING? Yes, they are.

      • SHOULD THE AUTHORS QUALIFY SOME OF THEIR CLAIMS AS PRELIMINARY OR SPECULATIVE, OR REMOVE THEM ALTOGETHER? No, I think it is ok as it stands. The authors are wording their claims and conclusions not apodictically but cautiously, as it should be. They point out explicitely which lines of investigations they did not follow up here.

      • WOULD ADDITIONAL EXPERIMENTS BE ESSENTIAL TO SUPPORT THE CLAIMS OF THE PAPER? REQUEST ADDITIONAL EXPERIMENTS ONLY WHERE NECESSARY FOR THE PAPER AS IT IS, AND DO NOT ASK AUTHORS TO OPEN NEW LINES OF EXPERIMENTATION. I think that the here presented experimental evidence suffices to support the conclusions drawn. No additional experiments are necessary.

      • ARE THE SUGGESTED EXPERIMENTS REALISTIC IN TERMS OF TIME AND RESOURCES? IT WOULD HELP IF YOU COULD ADD AN ESTIMATED COST AND TIME INVESTMENT FOR SUBSTANTIAL EXPERIMENTS. Not applicable.

      • ARE THE DATA AND THE METHODS PRESENTED IN SUCH A WAY THAT THEY CAN BE REPRODUCED? Yes, they are.

      • ARE THE EXPERIMENTS ADEQUATELY REPLICATED AND STATISTICAL ANALYSIS ADEQUATE? Yes.

      __Response: __We thank the reviewer very much for assessing our work, for providing constructive suggestions, and for acknowledging the strength of the study.

      MINOR COMMENTS:

      • SPECIFIC EXPERIMENTAL ISSUES THAT ARE EASILY ADDRESSABLE. None

      • ARE PRIOR STUDIES REFERENCED APPROPRIATELY? Essentially yes. Regarding the reduction / loss of the adult epidermal Langerhans cell network, it may be of some interest to also refer to / discuss to another one of the few examples of this phenomenon. There, the initial burst of proliferation is followed by reduced proliferation and increased apoptosis when a critical member of the mTOR signaling cascade is conditionally knocked out (Blood 123:217, 2014).

      __Response and revision plan: __As suggested, we will include into the revised manuscript further examples with related phenotypes regarding the progressive decline of LCs.

      • ARE THE TEXT AND FIGURES CLEAR AND ACCURATE? Yes they are. Figures are well arranged for easy comprehension.

      • DO YOU HAVE SUGGESTIONS THAT WOULD HELP THE AUTHORS IMPROVE THE PRESENTATION OF THEIR DATA AND CONCLUSIONS?

      1. Materials & Methods. The authors write, regarding flow cytometry of epidermal cells: "Briefly, 1cm2 of back skin from 8-14 weeks old female wild-type and knockout littermates was dissociated in 0.25 mg/mL Liberase (Sigma, cat. #5401020001) and 0.5 mg/mL DNase (Sigma, cat.#10104159001) in 1 mL of RPMI (Sigma) and mechanically disaggregated in Eppendorf tubes, FOLLOWED BY INCUBATED for 2 h at 37 {degree sign}C." Followed by what?

      __Response and revision plan: __We apologize for this mistake. The text should read: "... followed by blocking and antibody labeling of cells in single cell suspension.". We will provide the correct text in the revised manuscript.

      Materials & Methods. BMDC electronmicroscopy. What is "IF". Please specify.

      __Response and revision plan: __We also regret this mistake in the method text. It should read: "... For electron microscopy analysis, after PDMS removal, cells were fixed using 2.5% glutaraldehyde ...". We will correct this in the revised manuscript.

      RESULTS in gene expression analyses. The authors observe some increase in apoptosis (as detected by cleaved-Caspase-3 staining). Is this observation in immunofluorescence also evident in the RNAseq data (where the IFN changes were seen), i.e., in Figure 5.

      __Response and revision plan: __We will check our RNAseq data regarding any changes in apoptosis-related genes and, if so, include these in the revised manuscript.

      Figure 7 F and G. Perhaps the authors may want to swap upper and lower panels in F or G, so that macrophage FACS plots and bar graphs are in the same row - ob, obiously, DC plots and bars likewise.

      __Response and revision plan: __We agree and will harmonize the panel sequence in the revised manuscript.

      Figure 7H. "Gating strategy in ArpC4WT Lung (previously gated in Live CD45+ cells)" - The lower row is knock-out, not WT. This is indicated correctly in the legand, but in the figure both rows are labeled as WT.

      __Response and revision plan: __Indeed, the legend information is correct, but the corresponding figure panel is incorrect. We will provide a corrected version with revision.

      The reference by Park et al. 2021 is missing in the list.

      __Response and revision plan: __We will add the reference to the revised bibliography.

      Figure 1D. Sure, the bar graphs are meant to say "CD11c"? The FACS plots show "CD11b".

      __Response and revision plan: __We will check the panels and correct where necessary.

      As to cDC1. In Figure 1D the FACS plot shows an absence of CD103+ cDC1 cells. In contrast, In Figure 7A-left side panel, there is not difference in cDC1 cells between WT and KO mice. Is therefore the flow cytometry plot in Figure 1D not representative regarding cDC1 cells? Correct?

      __Response and revision plan: __The reviewer is correct about this apparent discrepancy. We have not observed differences in the control vs. Aprc4-KO cDC1 population, hence Figure 7 represents our findings. For figure 1, we have by mistake chosen a non-representative plot, with the aim of illustrating the gating strategy. We apologize for this mistake and will provide a corrected an representative FACS plot figure in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      • DESCRIBE THE NATURE AND SIGNIFICANCE OF THE ADVANCE (E.G. CONCEPTUAL, TECHNICAL, CLINICAL) FOR THE FIELD. This is a conceptual advance. It adds a big step to our understanding of how immune cells in tissues (which all come from the bone marrow or are seeded before birth from embryonal hematopoietic organs such as yolk sac and fetal liver) can remain resident in these tissues. For cell types such as Langerhans cells, which establish their final population density within their tissues of residence, the presented finding convincingly buttress the role of proliferation and thereby the role for the actin-related protein complex 2/3 (Arp2/3).

      • PLACE THE WORK IN THE CONTEXT OF THE EXISTING LITERATURE (PROVIDE REFERENCES, WHERE APPROPRIATE). While we know much about actin-related proteins (Arp), as correctly cited by the authors, this knowledge is derived mostly from in vitro studies. The submitted study translates the findings to an in vivo setting for the first time.

      • STATE WHAT AUDIENCE MIGHT BE INTERESTED IN AND INFLUENCED BY THE REPORTED FINDINGS. Skin immunologists foremost, but these findings are of interest to the entire community of immunologists, but also cell biologists.

      • DEFINE YOUR FIELD OF EXPERTISE. My expertise is in skin immunology, in particular skin dendritic cells including Langerhans cells.

      We acknowledge the referee for their positive assessment of our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The manuscript identifies a role of the Arp2/3 complex, the major regulator of actin branching in cells, for controlling the homeostasis of murine Langerhans cells (LCs), a specialized subset of dendritic cells in the skin epidermis. The findings of the study are based on the analysis of CD11c-Cre Arpc4-flox mice, a conditional knockout mouse model, which interferes with Arp2/3 function in Langerhans cells and other CD11c-expressing myeloid cells, e.g. dendritic cell or macrophage subsets. By using immunofluorescence and flow cytometry analysis of epidermis and skin tissues, the authors provide a detailed analysis of LC numbers at different developmental stages (postnatal day 1, 7, 28, and adult mice) and demonstrate that Arpc4-deficiency does not interfere with the establishment of LC networks until postnatal day 28. However, LCs in ear and tail skin are substantially reduced in Arpc4-deficient mice at 8-12 weeks of age. In parallel to their in vivo model, the authors analyze cultures of bone marrow-derived dendritic cells (BMDCs) from control and CD11c-Cre Arpc4-flox mice. Arpc4-deficiency in BMDCs, which develop over 8-10 days in culture, results in nuclear shape and lamina abnormalities, as well as signs of increased DNA damage. Aspects of this phenotype are also detected in Langerhans cells in epidermal preparations. Transcriptomic analysis of BMDCs highlights a gene signature of increased expression of the interferon response pathway and alterations in cell cycle regulation. Arpc4-deficient BMDCs show increased expression of DNA damage markers and reduced expression of certain DNA repair factors. Based on these correlative findings from the BMDC model, the authors conclude that the decline in LC numbers might develop from the accumulation of DNA damage over time, which the authors phrease "pre-mature aging of Langerhans cells". Lastly, the authors show a heterogenous picture how Arp2/3 depletion affects distinct DC populations in CD11c-Cre Arpc4-flox mice. While some tissue-resident DC subsets appear normal in numbers, others are declined in numbers in the tissue. This may be related to their proliferation potential in tissues.

      Major comments:

      • Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      1) The authors claim that Arpc4 deficiency selectively compromises myeloid cell populations that rely on proliferation for tissue colonization (Figure 7). The presented data might give hints for such a general hypothesis, but solid experimental proof to prove this is lacking. When comparing myeloid cell subsets from foru different irgans, the authors refer to published data that some dendritic cell subsets are more proliferative in tissues than others and that CD11cCre Arpc4-flox mice appear to have reduced cell numbers in these populations. However, the presented data are purely correlative and no functional connection to cell proliferation has been made to the phenotypes. While some dendritic cell subsets (Langerhans cells, alveolar DCs) show reduced cell numbers in CD11cCre Arpc4-flox mice, other myeloid cell cells subsets are unaffected (e.g. dermal cDC1 and 2, colon macrophages).There could be plenty of other reasons that might underly the observed discrepancies between these cell subsets, e.g. Arp2/3 knockout efficiency and myeloid cell turnover in the tissue are just two examples, which have not been taken into consideration. Direct measurement of cell proliferation, e.g. BrdU labeling, and the observed phenotype would be missing to make such claims. The data could either be removed. Experimentally addressing these points could take 3-6 months.

      Response and planned revisions: We thank the referee for bringing this point. We agree that these results give hints that support our conclusion but that do not address this question directly. However, we would like to insist on the fact that our conclusion is based on studies from others showing that alveolar macrophages self-maintain themselves through proliferation (Bain et al. Mucosal Immunology 2022). In contrast, it has been reported that most colonic macrophages are derived from monocytes that are being recruited to the gut through life (Bain et al. Mucosal Immunity 2023)

      We propose to better explain and discuss these points in our revised manuscripts. In addition, we will stress that we do not exclude that different intracellular Arpc4-dependent processes might contribute to the phenotypes observed (beyond maintenance of DNA integrity). These revisions will help mitigate our conclusions and leave open the potential implication of alternative mechanisms that will be discussed as suggested by the referee.

      2) The authors claim that DC subsets (e.g. dermal cDCs), which develop from pre-DCs, are not affected by Arp2/3 depletion (Figure 7, although the FACS plot in Fig. 1D would suggest a different picture for cDC1). This is surprising in light of the data with bone marrow-derived DCs (BMDCs), the major in vitro model of this study, which develop from CDPs that again develop from pre-DCs. BMDCs did show aberrant nuclei and signs of DNA damage. How would the authors then explain the discrepancies of the BMDC model with DC subsets, where the authors feel that the pre-DC origin explains the phenotypic difference? This is a general concern of the data interpretation and conclusions.

      __Response: __We thank the referee to bring this point that indeed requires clarification. Two non-exclusive hypotheses could explain this apparent discrepancy:

      • The ontogeny of bone-marrow-derived DCs: Depending on the protocol used, there might be variations in the precursors DCs develop from. We use one of the first protocols, which was pioneered by Paola Ricciardi-Castagnoli lab (Winzler et al. J.Exp.Med. 1997). It relies on a supernatant from J558 cells transfected with GMCSF, which contains additional cytokines and mainly generate DC2-like DCs. Langerhans cells are closer to DC2s, which resemble more macrophages than DC1s. We thus chose this protocol rather than the protocols that use Flt3-L, which produce both DC1s and DC2s developed from common dendritic-cell precursors (CDPs). It is thus possible that our BM-derived DCs develop from other precursor cells that are possibly closer to monocyte precursors.
      • As shown in Figure 5C, kinetics of acquisition of CD11c expression, and thus deletion of the Arpc4 gene, might be distinct in vivo and in vitro. In vivo, as stated in our manuscript, DCs acquire CD11c as preDCs and undergo few rounds of divisions after. In vitro, as shown by our cycling experiments, BM-derived DCs continuously cycle, so they will keep dividing after having acquire CD11c (around day 7) and deleting the Arpc4 gene. __Revision plan: __We propose to mention these hypotheses in the discussion of our manuscript to explain the apparent contradiction raised by the referee.

      3) In line with point 2, the authors never show that BMDCs show reduced proliferation, reduced cell numbers or increased cell death in Arpc4-deficient cell cultures, as a consequence of the detected DNA damage and impaired DNA repair. In fact, Figure 5C even shows that cell growth rates between control and KO are equal. This is a major mismatch in the current study. Since the authors use the BMDC model to explain the declining cell numbers in Langerhans cells (which derive from fetal liver cells), this phenotype is not mirrored by the BMDC culture and it remains open whether the observed changes in nuclear DNA damage and repair are indeed directly linked to the observed phenotype of declining cell numbers in the tissue. These aspects require argumentation why cell growth is unchanged in KO cells. Additional experiments addressing these points with sufficient biological replicates (cultures from different mice) could take 2-3 months, including preparation time.

      __Response____: __We thank the referee for bringing this point, which was probably not properly discussed in the first version of our manuscript. Indeed, Arpc4KO BM-derived DCs do not show the premature cell death phenotype observed in LCs in vivo, as stated by the referee. There are at least two putative non-exclusive explanations for this. First, unlike LCs, which are long-lived cells, BM-derived DCs can be kept in culture for only 10-12 days. As DNA damage-induced cell death takes time (LCs only start to die about 3-4 weeks after network establishment), the lifespan of BM-DCs could simply not be long enough to observe this phenotype. Second, in the epidermis, LCs are physically constrained and continuously exposed to diverse signals that might increase their sensitivity to DNA damage and thereby induction of subsequent cell death.

      __Revision Plan: __We will clarify this point in our revised manuscript by providing putative explanations for the death phenotype of Arpc4-deficient LCs not being observed in BM-derived DCs. We will further explain that this does not invalidate this cellular model as it was used to raise hypotheses on the putative role played by Arpc4 in myeloid cells, i.e. maintenance of DNA integrity, which was then confirmed in vivo (Arpc4KO LCs do indeed display DNA damage in the epidermis). Without this "imperfect cellular model", we would have probably not been able to uncover this novel function of Arp2/3 in immune cells.

      4) The authors refer to a "pre-mature aging" phenotype of Arpc4-deficient BMDCs and LCs, based on reductions in Lamin B, Lamin A and increases in gH2AX and 53BP1. I find this term and overstatement of the current data and suggest that other markers for cell senescence, such as p53, Rb, p21 and b-Galactosidase are then also used to make such strong claim on "aging" and cell senescence. Experimentally addressing this point with sufficient biological replicates could take 2-3 months, including preparation time.

      __Revision Plan: __We will assess the expression of these genes and senescence signatures in our RNAseq analysis as well as in Arpc4WT and Arpc4KO-derived DCs, as suggested by the referee.

      5) The study does not provide a mechanism how the Arp2/3 complex would mediate the observed effects on DNA damage and repairs has not been addressed in the cell model, and only potential scenarios from other non-myeloid cell lines are discussed. It remains unclear whether the observed phenotypes in Arpc4-depleted myleoid cells relate to the direct nuclear function of Arp2/3 or the cytosolic function of Arp2/3, including its roles in cytoskeletal regulation that may have secondary effects on the nuclear alterations. This is a general concern of the presented data, data on mechanism might require more than 6 months.

      __Revision Plan: __The referee is correct: Our manuscript shows that Arp2/3 deficiency in specific myeloid cells impacts on their survival in vivo and proposes that this could result at least in part from impaired maintenance of DNA integrity in these cells. We do not know whether this also applies to non-myeloid cells, which, although very interesting, is beyond the scope of the present study. In addition, we do not have any experimental tool to distinguish whether the DNA damage phenotype of Arpc4KO cells involves the nuclear or cortical pool of F-actin, this is why we have left this question open in the discussion of our manuscript.

      6) OPTIONAL: The authors make a strong case arguing that the increased interferon expression signature (based on the transcriptomics data) reflects the nuclear ruptures in Arpc4-deficient cells and adds to the observed phenotype. If this is so, what happens then in STING knockout cells in the presence of CK666 inhibitor?


      __Revision Plan____: __The referee is correct in that we do not show this point experimentally and should therefore temper this conclusion.

      • Are the data and the methods presented in such a way that they can be reproduced?

      1) The analyses include quite a number of intensity calculations of immunofluorescence signals (Fig. 3D, E; Fig. 4E, Fig. 5B and 6B)? The background stainings are often variable or very high. In some cases it is even unclear whether stainings are really detecting protein and go beyond background staining (Fig. 6A, Fig. 5F). How were immunofluorescence data acquired and dealt with different background staining intensities?

      __Revision Plan: __We will carefully describe the microscopes used for image acquisition as well as the downstream analyses for each experiment, which indeed vary depending on the signals observed with distinct antibodies or construct.

      2) It remained unclear to me on which basis the nuclear deformations in Fig. 3G, H were calculated?

      __Revision Plan: __We will carefully describe the methods used to quantify nuclear deformations.

      3) The detailed phenotype of control mice is a bit unclear. It appears as if these were Cre-negative animals. Did the authors have some proof-of-principle experiments showing that CD11cCre Arpc4 +/+ animals have comparable phenotypes to Cre-negative animals?

      • Are the experiments adequately replicated and statistical analysis adequate?

      __Revision Plan: __We have never observed any decline in LC numbers in other mouse lines/genotypes (for example in cPLA2flox/flox;CD11c-Cre mice shown in the manuscript, Fig. S6B), excluding a putative role for the Cre in LC death.

      For most experiments, the number of biological replicates (mice, or BMDC cultures from different mice) and individual values (n, cells) are indicated. Statistical analysis appears adequate.

      Minor comments:

      • Prior published studies on Arp2/3 function in immune cells are referenced accordingly. A number of additional pre-print manuscripts on this topic have not been cited and could be considered referencing.


      __Revision Plan: __We will fix this point and cite additional, relevant preprints.

      • The text is very clearly and very well written. Figures are clear and accurate for most cases. There are some open questions:

      • Fig. 1B: The number of dots betwenn graph and legend do not match. The dots are not n=12 for both genotypes. Additionally: What do the symbols in the circles in the graph stand for? This is also in another later figure unclear.

      • Fig. 2C: The current IF presentation (overlay MHCII with Ki67) is not very helpful. An additional image that shows only the Ki67 signal in the MHCII mask would be very helpful.

      • Fig. 4B: BMDCs of which culture day were used for these experiments?

      • Fig. 4A and D shows the same representative cells for two biological messages, which is only moderately convincing regarding a "general" phenotype.

      • Fig. 5, B: Scale bars are missing.

      __Revision Plan: __We will fix all these points.

      Reviewer #3 (Significance (Required)):

      Strengths and Advance:

      The study provides strong data and a very detailed analysis of how the Arp2/3 complex regulates stages of Langerhans cell development and homeostasis. The role of the Arp2/3 complex as regulator of actin branching, which is involved in many cellular functions, has previously not been reported for this cell type. Previous research in immune cells have already studied the Arp2/3 complex, but studies were focussed on its role in migration and the majority of published phenotypes related to cell migration. While there are already a number of in vitro studies showing that the Arp2/3 complex can regulate aspects of cell cycle control or cell death in non-immune cells, most of these studies were performed with immortalized, non-immune cell lines, which can be more easily manipulated to dissect mechanistic aspects of the cellular phenotype, but are limited in their physiological interpretation. Hence, it is a major strength of this study to investigate the effects of Arp2/3 in a primary immune cell type, directly in the native and physiological environment. This is important because in vitro data from other cell types cannot be easily extrapolated to any other cell type and it is critical for our understanding to collect physiological data from tissues, where the biology really happens. The finding that the Arp2/3 complex regulates the tissue-residency of Langerhans cell through processes that are unrelated to migration are partially unexpected, shifting the view of this protein complex's physiological role to other cell biological processes, e.g. regulation of cell proliferation.

      Limitations: The limitations of the study are detailed in the five major points listed above. The study accumulates many experiments that characterize the phenotype of Arpc4-depleted cells, showing signs of DNA damage in Langerhans cells and cultures of BMDCs. How the Arp2/3 complex would mechanistically mediate the observed effects on DNA damage and repairs have not been addressed. It also remains open whether this is due to the effects of the Arp2/3 complex in the nucleus or the cytosol, which would be biologically extremely important to understand. Above that, there are some discrepancies regarding the phenotype of the BMDC model, which does neither entirely match the Langerhans cell phenotype in the tissue (reduced proliferation, LC derive from different progenitors), nor other endogenous DC populations, which should also derive from similar progenitors.

      Audience and reviewer background:

      In its current form, the manuscript will already be of interest for several research fields: Langerhans cell and dendritic cell homeostasis, immune cell trafficking, actin and cytoskeleton regulation in immune cells, physiological role of actin-regulating proteins. My own field of expertise is immune cell trafficking in mouse models, leukocyte migration and cytoskeletal regulation. I cannot judge the analysis and clustering of the bulk RNA sequencing data.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      • This is a study in experimental mice employing both in vitro and, importantly, in vivo approaches. EPIDERMAL LANGERHANS CELLS serve as a paradigm for the maintenance of homeostasis of myeloid cells in a tissue, epidermis in this case. In addition to well known functions of the ACTIN NETWORK in cell migration, chemotaxis, cell adherence and phagocytosis the authors reveal a critical function of actin networks in the survival of cells in their home tissue.

      • Actin-related proteins (Arp), specifically here the Arp2/3 complex, are necessary to form the filamentous actin networks. The authors use conditional knock-out mice where Arpc4 (an essential component of the Arp2/3 complex) is deleted under the control of CD11c, the most prominent dendritic cell marker which is also expressed on Langerhans cells. In normal mice, epidermal Langerhans cells reside in the epidermis virtually life-long. They initially settle the epidermis around and few days after birth an establish a dense network by a burst of proliferation and then they "linger on" by low level maintenance proliferation. In the epidermis of Arpc4 knock-out mice Langerhans cells also start off with this proliferative burst but, strikingly, they do not stay but are massively reduced by the age of 8-12 weeks.

      • The analyses of this decline revealed that

      a) the shape (number of nuclear lobes) and integrity of cell nuclei was compromised; they were fragile and ruptured to some degree when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      b) DNA damage, as detected by staining for gamma-H2Ax or 53BP1 accumulated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      c) recruitment of DNA repair molecules was inhibited when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      d) gene signatures of interferon signaling and response were increased when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      e) in vivo migration of dendritic cells and Langerhans cells from the skin to the draining lymph nodes in an inflammatory setting (FITC painting of the skin) was impaired when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      f) the persistence of the typical dense network of Langerhans cells in the epidermis, created by proliferation shortly after birth, is abrogated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing. Importantly, this was not the case for myeloid cell populations that settle a tissue without needing that initial burst of proliferation. For instance, numbers of colonic macrophages were not affected when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing.

      • Thus, the authors conclude that the Arp2/3 complex is essential by its formation of actin networks to maintain the integrity of nuclei and ensure DNA repair thereby ascertaining the maintenance proliferation of Langerhans cells and, as the consequence, the persistence of the dense epidermal netowrk of Langerhans cells.

      • Up-to-date methodology from the fields of cell biology and cellular immunology (cell isolation from tissues, immunofluorescence, multiparameter flow cytometry, FISH, "good old" - but important - transmission electronmicroscopy, etc.) was used at high quality (e.g., immunofluorescence pictures!). Quantitative and qualitative analytical methods were timely and appropriate (e.g., Voronoi diagrams, cell shape profiling tools, Cre-lox gene-deletion technology, etc.). Importantly, the authors used a clever method, that they had developed several years ago, namely the analysis of dendritic cell migration in microchannels of defined widths. Molecular biology methods such as RNAseq were also employed and analysed by appropriate bioinformatic tools.

      Major comments:

      • ARE THE KEY CONCLUSIONS CONVINCING? Yes, they are.

      • SHOULD THE AUTHORS QUALIFY SOME OF THEIR CLAIMS AS PRELIMINARY OR SPECULATIVE, OR REMOVE THEM ALTOGETHER? No, I think it is ok as it stands. The authors are wording their claims and conclusions not apodictically but cautiously, as it should be. They point out explicitely which lines of investigations they did not follow up here.

      • WOULD ADDITIONAL EXPERIMENTS BE ESSENTIAL TO SUPPORT THE CLAIMS OF THE PAPER? REQUEST ADDITIONAL EXPERIMENTS ONLY WHERE NECESSARY FOR THE PAPER AS IT IS, AND DO NOT ASK AUTHORS TO OPEN NEW LINES OF EXPERIMENTATION. I think that the here presented experimental evidence suffices to support the conclusions drawn. No additional experiments are necessary.

      • ARE THE SUGGESTED EXPERIMENTS REALISTIC IN TERMS OF TIME AND RESOURCES? IT WOULD HELP IF YOU COULD ADD AN ESTIMATED COST AND TIME INVESTMENT FOR SUBSTANTIAL EXPERIMENTS. Not applicable.

      • ARE THE DATA AND THE METHODS PRESENTED IN SUCH A WAY THAT THEY CAN BE REPRODUCED? Yes, they are.

      • ARE THE EXPERIMENTS ADEQUATELY REPLICATED AND STATISTICAL ANALYSIS ADEQUATE? Yes.

      Minor comments:

      • SPECIFIC EXPERIMENTAL ISSUES THAT ARE EASILY ADDRESSABLE. None

      • ARE PRIOR STUDIES REFERENCED APPROPRIATELY? Essentially yes. Regarding the reduction / loss of the adult epidermal Langerhans cell network, it may be of some interest to also refer to / discuss to another one of the few examples of this phenomenon. There, the initial burst of proliferation is followed by reduced proliferation and increased apoptosis when a critical member of the mTOR signaling cascade is conditionally knocked out (Blood 123:217, 2014).

      • ARE THE TEXT AND FIGURES CLEAR AND ACCURATE? Yes they are. Figures are well arranged for easy comprehension.

      • DO YOU HAVE SUGGESTIONS THAT WOULD HELP THE AUTHORS IMPROVE THE PRESENTATION OF THEIR DATA AND CONCLUSIONS?

      • Materials & Methods. The authors write, regarding flow cytometry of epidermal cells: "Briefly, 1cm2 of back skin from 8-14 weeks old female wild-type and knockout littermates was dissociated in 0.25 mg/mL Liberase (Sigma, cat. #5401020001) and 0.5 mg/mL DNase (Sigma, cat.#10104159001) in 1 mL of RPMI (Sigma) and mechanically disaggregated in Eppendorf tubes, FOLLOWED BY INCUBATED for 2 h at 37 {degree sign}C." Followed by what?

      • Materials & Methods. BMDC electronmicroscopy. What is "IF". Please specify.

      • RESULTS in gene expression analyses. The authors observe some increase in apoptosis (as detected by cleaved-Caspase-3 staining). Is this observation in immunofluorescence also evident in the RNAseq data (where the IFN changes were seen), i.e., in Figure 5.

      • Figure 7 F and G. Perhaps the authors may want to swap upper and lower panels in F or G, so that macrophage FACS plots and bar graphs are in the same row - ob, obiously, DC plots and bars likewise.

      • Figure 7H. "Gating strategy in ArpC4WT Lung (previously gated in Live CD45+ cells)" - The lower row is knock-out, not WT. This is indicated correctly in the legand, but in the figure both rows are labeled as WT.

      • The reference by Park et al. 2021 is missing in the list.

      • Figure 1D. Sure, the bar graphs are meant to say "CD11c"? The FACS plots show "CD11b".

      • As to cDC1. In Figure 1D the FACS plot shows an absence of CD103+ cDC1 cells. In contrast, In Figure 7A-left side panel, there is not difference in cDC1 cells between WT and KO mice. Is therefore the flow cytometry plot in Figure 1D not representative regarding cDC1 cells? Correct?

      Significance

      • DESCRIBE THE NATURE AND SIGNIFICANCE OF THE ADVANCE (E.G. CONCEPTUAL, TECHNICAL, CLINICAL) FOR THE FIELD. This is a conceptual advance. It adds a big step to our understanding of how immune cells in tissues (which all come from the bone marrow or are seeded before birth from embryonal hematopoietic organs such as yolk sac and fetal liver) can remain resident in these tissues. For cell types such as Langerhans cells, which establish their final population density within their tissues of residence, the presented finding convincingly buttress the role of proliferation and thereby the role for the actin-related protein complex 2/3 (Arp2/3).

      • PLACE THE WORK IN THE CONTEXT OF THE EXISTING LITERATURE (PROVIDE REFERENCES, WHERE APPROPRIATE). While we know much about actin-related proteins (Arp), as correctly cited by the authors, this knowledge is derived mostly from in vitro studies. The submitted study translates the findings to an in vivo setting for the first time.

      • STATE WHAT AUDIENCE MIGHT BE INTERESTED IN AND INFLUENCED BY THE REPORTED FINDINGS. Skin immunologists foremost, but these findings are of interest to the entire community of immunologists, but also cell biologists.

      • DEFINE YOUR FIELD OF EXPERTISE. My expertise is in skin immunology, in particular skin dendritic cells including Langerhans cells.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Revision Plan

      June 28, 2025

      Manuscript number: RC-2025-02982

      Corresponding author(s): Babita Madan, Nathan Harmston, David Virshup

      General Statements In Wnt signaling, the relative contributions of ‘canonical (β-catenin dependent) and non- canonical (β-catenin independent) signaling remains unclear. Here, we exploited a unique and highly robust in vivo system to study this. Our study is therefore the first comprehensive analysis of the β-catenin independent arm of the Wnt signaling pathway in a cancer model and illustrates how a combination of cis-regulatory elements can determine Wnt-dependent gene regulation.

      We are very pleased with the reviews; it appears we communicated our goal and our findings clearly, and in general the reviewers felt the study provided important information, was well planned and the results were “crystal clear”.

      While more experiments could strengthen and extend the results, we feel our results are already very robust due to the use of multiple replicates in the in vivo system.

      The Virshup lab in Singapore closed July 1, 2025 and so additional wet lab studies are not feasible.

      1. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Below we address the points raised by the reviewers:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The article has the merit of addressing a yet-unsolved question in the field (if beta-catenin can also repress genes) that only a limited number of studies has tried to tackle, and provides useful datasets for the community. The system employed is elegant, and the PORCN-inhibition bypassed by a ____constitutively active beta-catenin is clean and ingenious. The manuscript is clearly written.

      We thank the reviewers for their kind comments on the importance of the data. Our orthotopic model provides the opportunity to exploit robust Wnt regulated gene expression in a more responsive microenvironment than can be achieved in cell culture and simple flank xenograft models.

      Here we propose a series of thoughts and comments that, if addressed, would in our opinion improve the study and its description.

      1) We wonder why a xenograft model is necessary to induce a robust WNT response in these cells.

      The authors describe this set-up as a strength, as it is supposed to provide physiological relevance, yet it is not clear to us why this is the case.

      We welcome the opportunity to expand on our choice of an orthotopic xenograft model. It has been long established that cancer cells behave differently in different in vivo locations (Killion et al., 1998). Building on this, we confirmed this in our system that identical pancreatic cancer cells treated with the same PORCN inhibitor had very different responses in vitro, in the flank and in their orthotopic environment (Madan et al., 2018). To quote from our prior paper, “Looking only at genes decreasing more than 1.5-fold at 56 hours, we would have missed 817/1867 (44%) genes using a subcutaneous or 939/1867 (50%) using an in vitro model. Thus, the overall response to Wnt inhibition was reduced in the subcutaneous model and further blunted in vitro. An orthotopic model more accurately represents real biology.

      The reason for this is presumably the very different orthotopic microenvironment, including tissue appropriate stroma-tumor, vascular-tumor, lymphatic-tumor, and humoral interactions.

      Moreover, as the authors homogenize the tumour to perform bulk RNA-seq, we wonder whether they are not only sequencing mRNA from the cancer cells but also from infiltrating immune cells and/or from the surrounding connective tissue.

      In experiments generating RNA-seq data from xenograft models, the resulting sequences can originate from either human (graft) or mouse (host). In order to account for this, following standard practice, we filtered reads prior to alignment using Xenome (Conway et al., 2012). We have added additional text to the methods to highlight this step in our pipeline.

      2) If, as the established view implies, Wnt/beta-catenin only leads to gene activation, pathway

      inhibition would free up the transcriptional machinery - there is evidence that some of its constituents are rate-limiting. The free machinery could now activate some other genes: the net effect observed would be their increased transcription upon Wnt inhibition, irrespective of beta-catenin's presence. Could this be considered as an alternative explanation for the genes that go up in both control and bcat4A lines upon ETC-159 administration? This, we think, is in part corroborated by the absence of enrichment of biological pathways in this group of genes. The genes that are beta-catenin-dependent and downregulated (D&R) are obviously not affected by this alternative explanation.

      This is an interesting suggestion, and we will incorporate this thought into our discussion of potential mechanisms.

      3) The authors mention that HPAF-II are Wnt addicted. Do they die upon ETC-159 administration, and is this effect rescued by exogenous WNT addition?

      We and several others have previously reported that Wnt-addicted cells differentiate and/or senesce upon Wnt withdrawal in vivo but not in vitro. This is related to the broader changes in gene expression in the orthotopic tumors. The effect of PORCN inhibition has been demonstrated by us and others and is rescued by Wnt addition, downstream activation of Wnt signaling by e.g. APC mutation, and, as we show here, stabilized β-catenin.

      4) Line 120: the authors write about Figure 1C: "This demonstrates that the growth of β-cat4A cells in vitro largely requires Wnts to activate β-catenin signaling." The opposite is true: control cells require WNT and form less colony with ETC159, while β-cat4A are independent from Wnt secretion.

      We appreciate the reviewer pointing out our mis-statement. This error has now been corrected in the revised manuscript.

      5) Lines 226-229: "The β-catenin independent repressed genes were notably enriched for motifs bound by homeobox factors including GSC2, POU6F2, and MSGN1. This finding aligns with the known role of non-canonical Wnt signaling in embryonic development" This statement assumes that target genes, or at least the beta-catenin independent ones, are conserved across tissues, including developing organs. This contrasts with the view that target genes in addition to the usual suspects (e.g., AXIN2, SP5 etc.) are modulated tissue-specifically - a view that the authors (and in fact, these reviewers) appear to support in their introduction.

      We agree with the reviewer that a majority of Wnt-regulated genes are tissue specific. Indeed, the β-catenin independent Wnt-repressed genes may also be tissue specific. In other tissues, we speculate that other β-catenin independent Wnt-repressed genes may also have homeobox factor binding sites as well and so the general concept remains valid. We do not have sufficient data in other tissues to resolve this issue.

      7) The luciferase and mutagenesis work presented in Figure 5 are crystal-clear. One important aspect that remains to be clarified is whether beta-catenin and/or TCF7L2 directly bind to the NRE sites. Or do the authors hypothesize that another factor binds here? We suggest the authors to show TCF7L2 binding tracks at the NRE/WRE motifs in the main figures.

      A major question of the reviewers was, can we provide additional evidence that the NRE is bound by LEF/TCF family members. Our initial analysis of more datasets indicates TCF7L2 peaks are enriched on NREs in Wnt-β-catenin responsive cell lines like HCT116 and PANC1. These analyses appear to further support the model that the NRE binds TCF7L2, but we fully agree these analyses can neither prove nor disprove the model.

      In our revision, we will analyze additional cut and run datasets as suggested and look at the HEPG2 datasets suggested by reviewer 1. We are concerned about tissue specificity as some of the genes are not expressed in e.g. HEPG2 or HEK293 cells where datasets are available. However, our data continues to support a functional role for the NRE in the modulation of β-catenin regulated genes. The best analysis would be more ChIP-Seq or Cut and Run assays on tissues, not cells, but these studies are beyond what we can do.

      What about other TCF/LEFs and beta-catenin? Are there relevant datasets that could be explored to test whether all these bind here during Wnt activation?

      As above, We will analyze additional ChIP and Cut & Run datasets to address this question looking at β-catenin and other LEF/TCF family members. We also reflect on the fact that ChIP-Seq does not necessarily imply that the targeted factor (e.g.,TCF7L2) is bound in the target site in all the cells.

      The repression might be mediated by beta-catenin partnering with other factors that bind the NRE even by competing with TCF7L2.

      We appreciate the insightful comments and now incorporate this into our discussion.

      8) In general, while we greatly appreciate the github page to replicate the analysis, we feel that the methods' description is lacking, both concerning analytical details (e.g., the cutoff used for MACS2 peak calling) or basic experimental planning (e.g, how the luciferase assays were performed).

      We thank reviewers for the suggestions and will add further details regarding the analysis

      and experimental planning in the method sections.

      9) The paper might benefit from the addition of quality metrics on the RNA-seq. Interesting for example would be to see a PCA analysis - as a more unbiased approach - rather than the kmeans clustering.

      We have this data and will add it to the revised manuscript.

      10) It seems that in Figure 3A the clusters are mislabelled as compared to Figure 3B and Figure 1. Here the repressor clusters are labelled DR5, DR6 and DN7 whereas in the rest of the paper they are labelled DR1, DR2 and DN1.

      Thank you for pointing out this issue. This has now been corrected in Figure 3.

      11) The siCTNNB1 in Figure 5E is described to be a significant effect in the text whereas in Figure 5E this has a p value of 0.075.

      Thank you for pointing out the p value did not cross the 0.05 threshold. We have modified the text to remove the word ‘significant’.

      12) Line 396: 'Here we confirm and extend the identification of a TCF-dependent negative regulatory element (NRE), where beta-catenin interacts with TCF to repress gene expression'. We suggest caution in stating that beta-catenin and TCF directly repress gene expression by binding to NRE. In the current state the authors do not show that TCF & beta-catenin bind to these elements. See our previous point 7.

      We appreciate the suggestion of the reviewers. We will be more cautious in our interpretation.

      Further suggestions - or food for thoughts:

      13) A frequently asked question in the field concerns the off-target effects of CHIR treatment as opposed to exposure to WNT ligands. CHIR treatment - in parallel to bcat4A overexpression - would allow the authors to delineate WNT independent effects of CHIR treatment and settle this debate.

      We thank the reviewers for suggesting this interesting experiment to sort out the non- Wnt effects of GSK3 inhibition. Such a study would require a new set of animal experiments and a different analysis; we think this is beyond the scope of this manuscript.

      14) We think that Figure 4C could be strengthened by adding more public TCF-related datasets (e.g., from ENCODE) to confirm the observation across datasets from different laboratories. In particular, the HEPG2 could possibly be improved as there is an excellent TCF7L2 dataset available by ENCODE.

      Many more datasets are easily searchable through: https://www.factorbook.org/.

      As above, we will analyze the HEPG2 dataset. We plan on updating Fig 4 with data from analysis from different datasets such as (Blauwkamp et al., 2008; Zambanini et al., 2022).

      15) The authors show that there is no specific spacing between NREs and WREs. This implies that it is not likely that TCF7L2 recognizes both at the same time through the C-clamp. Do the authors think that there might be a pattern discernible when comparing the location of WRE and NRE in relation to the TCF7L2 ChIP-seq peak summit? This would allow inferring whether TCF7L2 more likely directly binds the WRE (presumably) and if the NRE is bound by a cofactor.

      This is an interesting suggestion and we will conduct this analysis as suggested on available datasets (as the result may be different in different tissue types with varying degrees of Wnt/β-catenin signaling).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Overall, the study provides a solid framework for understanding noncanonical transcriptional ____outputs of Wnt signaling in a cancer context. The majority of the conclusions are well supported by the data. However, there are a few substantive points that require clarification before the manuscript is ready for publication.

      Major Comments

      The authors' central claim-that their findings represent a comprehensive analysis of the β-catenin- independent arm of Wnt signaling and uncover a "cis-regulatory grammar" governing Wnt-dependent gene activation versus repression-is overstated based on the presented data.

      We appreciate the reviewers concern and will temper our language.

      Specifically:

      • Figure 3B identifies TF-binding motifs enriched among different Wnt-responsive gene clusters, but the authors only functionally investigate the role of NRE in β-catenin-dependent repression, particularly in the context of TCF motif interaction.

      • To support a broader claim regarding cis-regulatory grammar, additional analyses are required:

      o What is the distribution of NREs across all clusters? Are they exclusive to β-catenin-dependent repressed clusters, or more broadly present?

      The distribution of the NREs is a statistically significant enrichment; they are observed in the repressed clusters more frequently than expected by chance alone, but they are present elsewhere as well. We have tempered our language around the cis-regulatory grammar.

      o Do NREs interact with other enriched motifs beyond TCF? Is this interaction specific to repression or also involved in activation?

      This is an interesting question beyond the scope of this analysis. Our dataset uses multiple interventions; The NREs may interact with other motifs but we would need more transcriptional analysis data with biological intervention to assess this.

      o A more comprehensive analysis of cis-element combinations is needed to draw conclusions about their collective influence on gene regulation across clusters.

      We agree; This would be a great question if we had TCF binding data in our orthotopic xenograft model. It’s a dataset we do not have, nor do we have the resources to pursue this.

      Other important clarifications:

      • The use of the term "wild-type" to describe HPAF-II cells is potentially misleading. These cells are not genetically wild-type and harbor multiple oncogenic alterations.

      Thank you for pointing this out. We will use the word “parental” in the text

      • The manuscript does not clearly present the kinetics of Wnt target downregulation upon ETC-159 treatment of HPAF-II cells. Understanding whether repression mirrors activation dynamics (e.g., delay or persistence of Wnt effects) is essential to interpreting the system's temporal behavior.

      We previously addressed the temporal dynamics of activation and repression in our more comprehensive time course papers (Harmston et al., 2020; Madan et al., 2018); there are differences in the dynamics that are difficult to tease out in this new dataset as the density of time points is less. Having said that, we will compare the time course and annotate the sets of genes identified in this current study with the data from our original study to provide more information on the temporal dynamics of this system.

      Minor Comment

      • The statement in Figure 1C (lines 119-120) that "growth of β-cat4A cells in vitro largely requires Wnts to activate β-catenin signaling" is inconsistent with the data. As the β-cat4A allele encodes a constitutively active form of β-catenin, Wnts should not be required. Please revise this conclusion for clarity.

      We thank the reviewers for pointing out this mis-statement. We have corrected this.

      Reviewer #2 (Significance (Required)):

      This study offers a systematic classification of Wnt-responsive gene expression dynamics, differentiating between β-catenin-dependent and -independent mechanisms. The insights into temporal expression patterns and the potential role of the NRE element in transcriptional repression add depth to our understanding of Wnt signaling. These findings have relevance for developmental biology, stem cell biology, and cancer research-particularly in understanding how Wnt-mediated repression may influence tumor progression and therapeutic response.

      Nice review; thank you.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      … The work advances understanding of Wnt mediated repression via cis regulatory grammar.

      Major Concerns

      1) Statistical thresholds and clustering - The criteria for classifying β catenin-dependent versus - independent genes rely on FDR cutoffs above or below 0.1. If the more stringent cutoff of 0.05 was used, how many genes would still be considered Wnt regulated?

      We can readily address this in a revised manuscript.

      2) Validation of selected β catenin-dependent and -independent Wnt target genes - While the authors identify β catenin-dependent and -independent Wnt target genes (4 selected genes from different clusters in Fig.2), RT-qPCR based validation of Axin2 has been performed in Fig. S3. Authors should also validate other 3 genes as well.

      We had considered performing qPCR to re-validate some of our gene-expression changes but qPCR analyses is intrinsically more error prone than RNAseq, and we believe the literature shows that qPCR from the same samples will not add any extra utility. Previous studies that have examined this question have reported excellent correlation between the RNAseq and pPCR (Asmann et al., 2009; Griffith et al., 2010; Wu et al., 2014).

      3) NRE mechanistic insight - The most important contribution of this manuscript is the extension of the importance of the NRE motif in Wnt regulated enhancers. But the mutagenesis data provided is insufficient to conclusively nail down that the NREs are responsible for the repression. The effects in the synthetic reporters in Fig. 4D are small - it's not clear that there is much activity in the MimRep to be repressed by the NREs. The data in Fig. 5 is a better context to test the importance of the NREs, but the authors use deletion analysis which is too imprecise and settle for single nucleotide mutants in individual NREs in the ABHD11-AS1 reporter. In the Axin2 report, they mutate sequences outside of the NRE. It's too inconsistent. They should mutate 3 or 4 positions within the NRE in BOTH motifs in the context of the ABHD11-AS1 reporter. Same for the Axin2 reporter.

      We feel our analysis, coupled with the Kim paper (Kim et al., 2017), support the role of the NRE. We agree that more data is always desirable, but in our current circumstances are we cannot add additional wetlab experiments.

      Regarding Figure 4D, this is a synthetic system lacking the endogenous elements in the promoter. We agree with the reviewer that the effect is small but we would also like to point out that adding the well-established 2WRE in front of the MinRep increased the transcription activity to 1.5 fold, which is of similar magnitude change of the 2NRE deceasing the transcriptional activity 1/1.5 = 0.6.

      In Kim et al, it was shown that mutating the 11st nucleotide of the NRE motif showed the strongest effect, so we followed their lead in only mutated the 11st nucleotide in ABHD11- AS1 NRE.

      As for the putative NRE sequence present in AXIN2 promoter, its downstream sequence is polyT (__GTGTTTTTTTT__TTTTTTTTTT), if we only mutate 11st nucleotide to G/C, we could create similar sequence to NRE, so we mutated sequences outside of the NRE to fully disrupt it.

      4) Even if the mutagenesis is done more completely, the results simply replicate that of the Goentoro group. In Kim et al 2017, they provide suggestive (not convincing) evidence that TCFs directly bind to the NRE. The authors of this manuscript should explore that in more detail, e.g., can purified TCF bind to the NRE sequence? Can the authors design experiments to directly test whether beta-catenin is acting through the NRE - their data currently only demonstrates that the NRE provide a negative input to the reporters - that's an important mechanistic difference.

      We point out that our minimal reporter studies with the NRE showed a repressive effect in HCT116 (colorectal cancer cells with stabilized β-catenin) but not HT1080 (sarcoma cells with low Wnt) supporting the importance of β-catenin acting through the NRE (Figs. 4D, 4E).

      We fully agree with the reviewers that additional study of TCF interaction with the NRE would be of value. While EMSA and culture-based ChIP assays would be of some value, the best study should be done in vivo where the system is most robust. We are not in a position to do these studies, but we will add in a discussion of this as a limitation of the current study.

      5) In vertebrates, some TCFs are more repressive than others and TLEs have been implicated in repressive. Exploring these factors in the context of the NRE would increase the value of this story.

      This is an interesting idea but beyond the scope of the current manuscript. It is likely this would be dependent on tissue specific expression, local expression levels, and local binding of co-factors. As we look at other TCF members in other datasets we may be able to address this. Further wetlab experiments are beyond the scope of this work.

      **Referees cross-commenting**

      I respectfully disagree that the luciferase assays are sufficient. Using deletion analysis to understand the function of specific binding sites is insufficient and the more specific mutations of NREs are incomplete. Regarding this paper extending our knowledge of direct transcriptional repression by Wnt/bcat signaling, I don't agree that it adds much - there are numerous datasets where Wnt signaling activates and represses genes - the trick is determining whether any of the repressed genes are the result and direct regulation by TCF/bcat. They don't explore that. The main finding is an extension of the work by Lea Goentoro on the importance of the NRE motif, but they don't address whether TCF directly associates with this sequence. Goentoro argued in the 2017 paper that it does, but that data is unconvincing to me. Can purified TCF bind the NRE? Without that information (done carefully) this manuscript is very limited.

      We respectfully disagree with the reviewer regarding the contribution of this manuscript. There are certainly many datasets looking at Wnt-regulated genes in tissue culture, but these cell-based studies are underpowered to really understand Wnt biology. There are only two papers, ours and Cantú’s, that address Wnt repressed genes in any depth. No prior papers have differentiated β-catenin dependent from β-catenin independent genes before, and certainly not in an orthotopic animal model.

      A major impact of our study is the finding that only 10% of Wnt regulated genes are independent of β-catenin, at least in pancreatic cancer. We feel this is a major contribution. We further add to this analysis by re-enforcing/extend the prior evidence on the NRE in humans (and correct the motif sequence!) for Wnt-repressed genes. Our data supports the fine-tuning of the Wnt/β-catenin regulated genes by a cis-regulatory grammar.

      Reviewer #3 (Significance (Required)):

      Overall, this study advances our understanding of the dual roles of Wnt signaling in gene activation and repression, highlighting the role of the NRE motif. But this is an extension of the original NRE paper (Kim et al 2017) with no mechanistic advance beyond that original work. The transcriptomics in the first part of the manuscript have some value, but similar data sets already exist.

      We respectfully but strongly disagree with the reviewer. First, our work examines the NRE in a large-scale in vivo transcriptome dataset, significantly extending the candidate gene approach of Kim et al. Secondly, we disagree with the comment that “similar data sets already exist.” Indeed, reviewer 1 (C. Cantú) specifically pointed out we had addressed an “yet-unsolved question in the field” on whether and how β-catenin repressed genes.

      __3. __Description of the revisions that have already been incorporated in the transferred manuscript

      To date we have only corrected several typographical errors.

      1. Description of analyses that authors prefer not to carry out

      We fully agree with the reviewers that additional study of TCF interaction with the NRE would be of value. While EMSA and cell culture-based ChIP assays would be of some modest value, they have already been done in vitro by Kim et al. (Kim et al., 2017) and the best next study should be done in vivo in Wnt-responsive cancers or tissues where the biology is most robust (Madan et al., 2018) . We are not in a position to do these studies, but we will add this into the discussion as a limitation of the current study. We also acknowledge that the NRE may interact with other currently unidentified factors.

      Reviewer 1 asked about considering experiments to determine non-Wnt effects of GSK3 inhibitors like CHIR. Such a study, while interesting, would require a new set of animal experiments and a different analysis; we think this is beyond the scope of this manuscript.

      Finally, we note that the Virshup lab at Duke-NUS Medical School in Singapore, where these in vivo studies were performed, has closed as of July 1, 2025 and the various lab members have moved on to new adventures. Because of this, we are unable to undertake new wet-lab studies.

      Thank you for your consideration,

      For the authors,

      David Virshup

      References:

      Asmann YW, Klee EW, Thompson EA, Perez EA, Middha S, Oberg AL, Therneau TM, Smith DI,

      Poland GA, Wieben ED, Kocher J-PA. 2009. 3’ tag digital gene expression profiling of human

      brain and universal reference RNA using Illumina Genome Analyzer. BMC Genom 10:531–

      1. doi:10.1186/1471-2164-10-531

      Blauwkamp TA, Chang MV, Cadigan KM. 2008. Novel TCF-binding sites specify transcriptional

      repression by Wnt signalling. The EMBO Journal 27:1436–1446. doi:10.1038/emboj.2008.80

      Conway T, Wazny J, Bromage A, Tymms M, Sooraj D, Williams ED, Beresford-Smith B. 2012.

      Xenome—a tool for classifying reads from xenograft samples. Bioinformatics 28:i172–i178.

      doi:10.1093/bioinformatics/bts236

      Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou

      Y-C, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H,

      Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA.

      1. Alternative expression analysis by RNA sequencing. Nat Methods 7:843–847.

      doi:10.1038/nmeth.1503

      Harmston N, Lim JYS, Arqués O, Palmer HG, Petretto E, Virshup DM, Madan B. 2020.

      Widespread Repression of Gene Expression in Cancer by a Wnt/β-Catenin/MAPK Pathway.

      Cancer Res 81:464–475. doi:10.1158/0008-5472.can-20-2129

      Killion JJ, Radinsky R, Fidler IJ. 1998. Orthotopic models are necessary to predict therapy of

      transplantable tumors in mice. Cancer metastasis reviews 17:279–284.

      Kim K, Cho J, Hilzinger TS, Nunns H, Liu A, Ryba BE, Goentoro L. 2017. Two-Element

      Transcriptional Regulation in the Canonical Wnt Pathway. Current Biology 27:2357-2364.e5.

      doi:10.1016/j.cub.2017.06.037

      Madan B, Harmston N, Nallan G, Montoya A, Faull P, Petretto E, Virshup DM. 2018. Temporal

      dynamics of Wnt-dependent transcriptome reveals an oncogenic Wnt/MYC/ribosome axis. J

      Clin Invest 128:5620–5633. doi:10.1172/jci122383

      Wu AR, Neff NF, Kalisky T, Dalerba P, Treutlein B, Rothenberg ME, Mburu FM, Mantalas GL,

      Sim S, Clarke MF, Quake SR. 2014. Quantitative assessment of single-cell RNA-sequencing

      methods. Nat Methods 11:41–46. doi:10.1038/nmeth.2694

      Zambanini G, Nordin A, Jonasson M, Pagella P, Cantù C. 2022. A new cut&run low volume-

      urea (LoV-U) protocol optimized for transcriptional co-factors uncovers Wnt/b-catenin tissue-

      specific genomic targets. Development 149. doi:10.1242/dev.201124

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      MHC (Major Histocompatibility Complex) genes have long been mentioned as cases of trans-species polymorphism (TSP), where alleles might have their most recent common ancestor with alleles in a different species, rather than other alleles in the same species (e.g., a human MHC allele might coalesce with a chimp MHC allele, more recently than the two coalesce with other alleles in either species). This paper provides a more complete estimate of the extent and ages of TSP in primate MHC loci. The data clearly support deep TSP linking alleles in humans to (in some cases) old world monkeys, but the amount of TSP varies between loci.

      Strengths:

      The authors use publicly available datasets to build phylogenetic trees of MHC alleles and loci. From these trees they are able to estimate whether there is compelling support for Trans-species polymorphisms (TSPs) using Bayes Factor tests comparing different alternative hypotheses for tree shape. The phylogenetic methods are state-of-the-art and appropriate to the task.

      The authors supplement their analyses of TSP with estimates of selection (e.g., dN/dS ratios) on motifs within the MHC protein. They confirm what one would suspect: classical MHC genes exhibit stronger selection at amino acid residues that are part of the peptide binding region, and non-classical MHC exhibit less evidence of selection. The selected sites are associated with various diseases in GWAS studies.

      Weaknesses:

      An implication drawn from this paper (and previous literature) is that MHC has atypically high rates of TSP. However, rates of TSP are not estimated for other genes or gene families, so readers have no basis of comparison. No framework to know whether the depth and frequency of TSP is unusual for MHC family genes, relative to other random genes in the genome, or immune genes in particular. I expect (from previous work on the topic), that MHC is indeed exceptional in this regard, but some direct comparison would provide greater confidence in this conclusion.

      We agree that context is important! Although we expected to get the most interesting results from studying the classical genes, we did include the non-classical genes specifically for comparison. They are located in the same genomic region, have multiple sequences catalogued in different species (although they are less diverse), and perform critical immune functions. We think this is a more appropriate set to compare with the classical MHC genes than, say, a random set of genes. Interestingly, we did not detect TSP in these non-classical genes. This likely means that the classical MHC genes are truly exceptional, but it could also mean that not enough sequences are available for the non-classical genes to detect TSP. 

      It would be very interesting to repeat this analysis for another gene family to see whether such deep TSP also occurs in other immune or non-immune gene families. We are lucky that decades of past work and a dedicated database exists for cataloging MHC sequences. When this level of sequence collection is achieved for other highly polymorphic gene families, it will be possible to do a comparable analysis.  

      Given the companion paper's evidence of genic gain/loss, it seems like there is a real risk that the present study under-estimates TSP, if cases of TSP have been obscured by the loss of the TSP-carrying gene paralog from some lineages needed to detect the TSP. Are the present analyses simply calculating rates of TSP of observed alleles, or are you able to infer TSP rates conditional on rates of gene gain/loss?

      We were not able to infer TSP rates conditional on rates of gene gain/loss. We agree that some cases of TSP were likely lost due to the loss of a gene paralog from certain species. Furthermore, the dearth of MHC whole-region and allele sequences available for most primates makes it difficult to detect TSP, even if the gene paralog is still present. Long-read sequencing of more primate genomes should help with this. We agree that it would also be very interesting to study TSPs that were maintained for millions of years but were lost recently.

      Figure 5 (and 6) provide regression model fits (red lines in panel C) relating evolutionary rates (y axis not labeled) to site distance from the peptide binding groove, on the protein product. This is a nice result. I wonder, however, whether a linear model (as opposed to non-linear) is the most biologically reasonable choice, and whether non-linear functions have been evaluated. The authors might consider generalized additive models (GAMs) as an alternative that relaxes linearity assumptions.

      We agree that a linear model is likely not the most biologically reasonable choice, as protein interactions are complex. However, we made the choice to implement the simplest model because the evolutionary rates we inferred were relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      The connection between rapidly evolving sites, and disease associations (lines 382-3) is very interesting. However, this is not being presented as a statistical test of association. The authors note that fast-evolving amino acids all have at least one association: but is this really more disease-association than a random amino acid in the MHC? Or, a randomly chosen polymorphic amino acid in MHC? A statistical test confirming an excess of disease associations would strengthen this claim.

      To strengthen this claim, we added Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the eLife template does not allow). Here, we plot the number of associations for each amino acid against evolutionary rate, revealing a significant positive slope in Class I. We also added explanatory text for this figure in lines 400-404.

      Reviewer #2 (Public review):

      Summary

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534).  We also added text (lines 216-219 and 250-252) to more explicitly point out that our method is conservative when few sequences are available.

      We also added a paragraph to the discussion which addresses data quality and mismapping issues (lines 473-499).

      We clarified the role of our companion paper (line 49-50) by changing “In our companion paper, we explored the relationships between the different classical and non-classical genes” to “In our companion paper, we built large multi-gene trees to explore the relationships between the different classical and non-classical genes.” We also changed the text in lines 97-99 from “In our companion paper, we compared genes across dozens of species and learned more about the orthologous relationships among them” to “In our companion paper, we built trees to compare genes across dozens of species. When paired with previous literature, these trees helped us infer orthology and assign sequences to genes in some cases.”

      Reviewer #3 (Public review):

      Summary

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provided evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses

      I've identified weaknesses in several areas (details follow in the next section):

      -  Inadequate description and presentation of the data used

      -  Large parts of the results read like extended figure captions, which breaks the flow. - Older literature on the subject is duly cited, but the authors don't really discuss their findings in the context of this literature.

      -  The potential impact of mechanisms other than long-term maintenance of allelic lineages by balancing selection, such as interspecific introgression and incorrect orthology assessment, needs to be discussed.

      We address these comments in the more detailed section below.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The abstract could benefit from being sharpened. A personal pet peeve is a common habit of saying we don't know everything about a topic (line 16 - "lack a full picture of primate MHC evolution"); We never know everything on a topic, so this is hardly a strong rationale to do more work on it. This is followed by "to start addressing this gap" - which is vague because you haven't explicitly stated any gap, you simply said we are not yet omniscent on the topic. Please clearly identify a gap in our knowledge, a question that you will be able to answer with this paper.

      That makes sense! We added another sentence to the abstract to make the specific gap clearer. Inserted “In particular, we do not know to what extent genes and alleles are retained across speciation events” in lines 16-17.

      Reviewer #2 (Recommendations for the authors):

      - Some discussion of alternative explanations when certain comparisons were not found to have TSP - is this consistent with genetic drift sometimes leading to lineage loss, or does it suggest that the proposed tradeoff between autoimmunity and pathogen recognition might differ depending on primates' life history and/or exposure to similar pathogens? Could the trade-off of pathogen to self-recognition not be as costly in some species?

      This is consistent with genetic drift, as no lineages are expected to be maintained across these distantly-diverged primates under neutral selection. These ideas are certainly possible, but our Bayes Factor test only reveals evidence (or lack thereof) for deviations from the species tree and cannot provide reasons why or why not.

      - It would be interesting to put these results on very long-term balancing selection in the context of what has been reported at the region for shorter term balancing selection. The discussion compares findings of previous genes in the literature but not regarding the time scale.

      Indeed, there is some evidence for the idea of “divergent allele advantage”, in which MHC-heterozygous individuals have a greater repertoire of peptides that they can present, leading to greater resistance against pathogens and greater fitness. This heterozygote advantage thus leads to balancing selection (Pierini and Lenz, 2018; Chowell et al., 2019). Our discussion mentions other time scales of balancing selection across the primates at the MHC and other loci, but we choose to focus more on long-term than short-term balancing selection.

      - Lines 223-226 - how is the difference in BF across exons in MHC-A to be interpreted? The paragraph is about MHC-A, but then the explanation in the last sentence is for when similar BF are observed which is not the case for MHC-A. Is this interpreted as lack of evidence for TSP? Or something about recombination or gene conversion? Or that one exon may be under balancing selection but not the other?

      Thank you for pointing out the confusing logic in this paragraph. 

      Previous: “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Many sequences had to be excluded from MHC-A comparisons because they were identified as gene-converted in the \textit{GENECONV} analysis or were previously identified as recombinants \citep{Hans2017,Gleimer2011,Adams2001}. Importantly, for MHC-A we do not see concordance in Bayes factors across the different exons, whereas we do for the other gene groups. Similar Bayes factors across all exons for a given comparison is thus evidence in favor of TSP being the primary driver of the observed deep coalescence structure (rather than recombination or gene conversion).” Current (lines 228-238): 

      “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Past work suggests that this gene has had a long history of gene conversion affecting different exons, resulting in different evolutionary histories for different parts of the gene \citep{Hans2017,Gleimer2011,Adams2001}. Indeed, we excluded many MHC-A sequences from our Bayes factor calculations because they were identified as gene-converted in our \textit{GENECONV} analysis or were previously suggested to be recombinants. As shown in \FIG{bayes_factors_classI}, the lack of concordance in Bayes factors across the different exons for MHC-A is evidence for gene conversion, rather than balancing selection, being the most important factor in this gene's evolution. In contrast, the other gene groups generally show concordance in Bayes factors across exons. We interpret this as evidence in favor of TSP being the primary driver of the observed deep coalescence structure for MHC-B and -C (rather than recombination or gene conversion).”

      - In Figures 5C and 6C, the points sometimes show a kind of smile pattern of possibly higher rates further from the peptide. Did authors explore other fits like a polynomial? Or, whether distance only matters in close proximity to the peptide? Out of curiosity, is it possible to map substitution time/branch into the distance to the peptide binding region for each substitution? Is there any pattern with distance to interacting proteins in non-peptide binding MHC proteins like MHC-DOA? Although they don't have a PBR they do interact with other proteins.

      Thank you for these ideas! We did not explore other fits, such as a polynomial, because we wanted to implement the simplest model. Our evolutionary rates are relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      There is most likely a relationship between evolutionary rate and the distance to interacting proteins in the non-peptide-binding molecules MHC-DM and -DO. However, there are few currently available models and it is difficult to determine which residues in these models are actually interacting. However, researchers with more experience in protein interactions would be able to undertake such an analysis. 

      - How biased is the database towards human alleles? Could this affect some of the analyses, including the coincidence of rapidly evolving sites with associations? Are there more associations than expected under some null model?

      While the database is indeed biased toward human alleles, we included only a small subset of these in order to create a more balanced data set spanning the primates. This is unlikely to affect the coincidence of rapidly-evolving sites with associations; however, we note that there are no such association studies meeting our criteria in other species, meaning the associations are only coming from studies on humans.

      - To this reader, it is unnecessary and distracting to describe the figures within the text; there are frequent sentences in the text that belongs in the figure legend instead (e.g., lines 139-143, 208-211, 214-215, 328-330, etc). It would be better to focus on the results from the figures and then cite the figure, where the colors and exactly what is plotted can be in the figure legend.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      - I'm still concerned that the poor mappability of short-read data is contributing in some ways. Were the sequences in the database mostly from long-reads? Was nucleotide diversity calculated directly from the sequences in the database or from another human dataset? Is missing data at some sites accounted for in the denominator?

      The sequences in the database are mostly from short reads and come from a wide array of labs. We have added a paragraph to the discussion to explain the limitations of this (lines 473-499). However, the nucleotide diversity calculations shown in Figure 1 do not rely on the MHC database; rather, they are calculated from the human genomes in the 1000 Genomes project. Nucleotide diversity would be calculable for other species, but we did not do so for exactly the reason you mention–too much missing data.

      - The Figure 2 and Figure 3 supplements took me a little bit to understand - is it really worth pointing out the top 5 Bayes-factor comparisons when there is no evidence for TSP? A lot of the colored squares are not actually supporting TSP but in the grids you can't see which are and which aren't without looking at the Bayes Factor. I wonder if it would help if only those with BF > 100 were shown? Or if these were marked some other way so that it was easy to see where TSPs are supported.

      Thank you for your perspective on these figures! We initially limited them to only show >100 Bayes factors for each gene group and region, but some gene groups have no high Bayes factors. Additionally, the “summary” tree pictured in these figures is necessarily a simplification of the full space of posterior trees. We felt that showing low Bayes factor comparisons could help readers understand this relationship. For example, allele sets that look non-monophyletic on the summary tree may still have a low Bayes factor, showing that they are generally monophyletic throughout the larger (un-visualizable) space of trees.

      Reviewer #3 (Recommendations for the authors):

      Specific comments

      Abstract

      I think the abstract would benefit from some editing. For example, one might get the impression that you equate allele sharing, which would normally be understood as sharing identical sequences, with sharing ancestral allelic lineages. This distinction is important because you can have many TSPs without sharing identical allele sequences. In l. 20 you write about "deep TSP", which requires either definition of reformulation. In l. 21-23 you seem to suggest that long-term retention of allelic lineages is surprising in the light of rapid sequence evolution - it may be, depending on the evolutionary scenarios one is willing to accept, but perhaps it's not necessary to float such a suggestion in the abstract where it cannot be properly explained due to space constraints? The last sequence needs a qualifier like "in some cases".

      Thank you for catching these! For clarity, we changed several words:

      ● “alleles” to “allelic lineages” in line 13

      ● “deep” to “ancient” in line 21

      ● “Despite” to “in addition to” in line 22

      ● Added “in some cases” to line 28

      Results - Overall, parts of the results read like extended figure captions. I understand that the authors want to make the complex figures accessible to the reader. However, including so much information in the text disrupts the flow and makes it difficult to follow what the main findings and conclusions are.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      l. 37-39 such a short sentence on non-classical MHC is necessarily an oversimplification, I suggest it be expanded or deleted.

      There is certainly a lot to say about each of these genes! While we do not have space in this paper’s introduction to get into these genes’ myriad functions, we added a reference to our companion paper in lines 40-41:

      “See the appendices of our companion paper \citep{Fortier2024a} for more detail.”

      These appendices are extensive, and readers can find details and references for literature on each specific gene there. In addition, several genes are mentioned in analyses further on in the results, and their specific functions are discussed in more detail when they arise.

      l. 47 -49 It would be helpful to briefly outline your criteria for selecting these 17 genes, even if this is repeated later.

      Thank you! For greater clarity, we changed the text (lines 50-52) from “Here, we look within 17 specific genes to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.” to “Here, we look within 17 specific genes---representing classical, non-classical, Class I, and Class II ---to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.“  

      l.85-87 I may be completely wrong, but couldn't problems with establishing orthology in some cases lead to false inferences of TSP, even in primates? Or do you think the data are of sufficient quality to ignore such a possibility? (you touch on this in pp. 261-264)

      Yes, problems with establishing orthology can lead to false inferences of TSP, and it has happened before. For example, older studies that used only exon 2 (binding-site-encoding) of the MHC-DRB genes inferred trees that grouped NWM sequences with ape and OWM sequences. Thus, they named these NWM genes MHC-DRB3 and -DRB5 to suggest orthology with ape/OWM MHC-DRB3 and -DRB5, and they also suggested possible TSP between the groups. However, later studies that used non-binding-site-encoding exons or introns noticed that these NWM sequences did not group with ape/OWM sequences (which now shared the same name), providing evidence against orthology. This illustrates that establishing orthology is critical before assessing TSP (as is comparing across regions). This is part of the reason we published a companion paper (https://doi.org/10.7554/eLife.103545.1), which clears up questions of orthology and supports the analyses we did in this paper. In cases where orthology was ambiguous, this also helped us to be conservative in our conclusions here. The problems with ambiguous gene assignment are also discussed in lines 488-499.

      l. 88-93 is the first place (others are pp. 109-118 and 460-484) where a fuller description of the data used would be welcome. It's clear that the amount of data from different species varies enormously, not only in the number of alleles per locus, but also in the loci for which polymorphism data are available. In such a synthesis study, one would expect at least a tabulation of the data used in the appendices and perhaps a summary table in the main article.

      l. 109-118 Again, a more quantitative summary of the data used, with reference to a table, would be useful.

      Thank you! To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534). Supplementary Files listing the exact alleles and sequences used in each group are also included in the resubmission.

      l. 123-124 here you say that the definition of the "16 gene groups" is in the methods (probably pp. 471-484), but it would be useful to present an informative summary of your rationale in the introduction or here

      Thank you! We agree that it is helpful to outline these groups earlier. We have changed the paragraph in lines 123-135 from: 

      “We considered 16 gene groups and two or three different genic regions for each group: exon 2 alone, exon 3 alone, and/or exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. See the Methods for more detail on how gene groups were defined. Because few intron sequences were available for non-human species, we did not include them in our analyses.” To: 

      “We considered 16 gene groups spanning MHC classes and functions. These include the classical Class I genes (MHC-A-related, MHC-B-related, MHC-C-related), non-classical Class I genes (MHC-E-related, MHC-F-related, MHC-G-related), classical Class IIA genes (MHC-DRA-related, MHC-DQA-related, MHC-DPA-related), classical Class IIB genes (MHC-DRB-related, MHC-DQB-related, MHC-DPB-related), non-classical Class IIA genes (MHC-DMA-related, MHC-DOA-related, and non-classical Class IIB genes (MHC-DMB-related, MHC-DOB-related). We studied two or three different genic regions for each group: exon 2 alone, exon 3 alone, and (for Class I) exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. Because few intron sequences were available for non-human species, we did not include them in our analyses.”

      l. 100 "alleles" -> "allelic lineages"

      Thank you for catching this. We have changed this language in line 104.

      l. 227-238 it's important to discuss the possible effect of the number of sequences available on the detectability of TSP - this is particularly important as the properties of MHC genealogies may differ considerably from those expected for neutral genealogies.

      This is a good point that may not be obvious to readers. We have added several sentences to clarify this:

      Line 193-194: “In a neutral genealogy, monophyly of each species' sequences is expected.”

      Line 213-219: “Note that the number of sequences available for comparison also affects the detectability of TSP. For example, if the only sequences available are from the same allelic lineage, they will coalesce more recently in the past than they would with alleles from a different lineage and would not show evidence for TSP. This means our method is well-suited to detect TSP when a diverse set of allele sequences are available, but it is conservative when there are few alleles to test. There were few available alleles for some non-classical genes, such as MHC-F, and some species, such as gibbon.”

      Line 244-246: “However, since there are fewer alleles available for the non-classical genes, we note that our method is likely to be conservative here.”

      l. 301 and 624-41 it's been difficult for me to understand the rationale behind using rates at mostly gap positions as the baseline and I'd be grateful for a more extensive explanation

      Normalizing the rates posed a difficult problem. We couldn’t include every single sequence in the same alignment because BEAST’s computational needs scale with the number of sequences. Therefore, we had to run BEAST separately on smaller alignments focused on a single group of genes at a time. We still wanted to be able to compare evolutionary rates across genes, but because of the way SubstBMA is implemented, evolutionary rates are relative, not absolute. Recall that to help us compare the trees, we included a common set of “backbone” sequences in all of the 16 alignments. This set included some highly-diverged genes. Initially, we planned to use 4-fold degenerate sites as the baseline sites for normalization, but there simply weren’t enough of them once we included the “backbone” set on top of the already highly diverse set of sequences in each alignment. This diversity presented an opportunity.  In BEAST, gaps are treated as missing and do not contribute any probability to the relevant branch or site (https://groups.google.com/g/beast-users/c/ixrGUA1p4OM/m/P4R2fCDWMUoJ?pli=1). So, we figured that sites that were “mostly gap” (a gap in all the human backbone sequences but with an insertion in some sequence) were mostly not contributing to the inference of the phylogeny or evolutionary rates. Because the “backbone” sequences are common to all alignments, making the “mostly gap” sites somewhat comparable across sets while not affecting inferred rates, we figured they would be a reasonable choice for the normalization (for lack of a better option).

      We added text to lines 680 and 691-693 to clarify this rationale.

      l. 380-84 this overview seems rather superficial. Would it be possible to provide a more quantitative summary?

      To make this more quantitative, we plotted the number of associations for each amino acid against evolutionary rate, shown in Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the template does not allow). This reveals a significant positive slope for the Class I genes, but not for Class II. We also added explanatory text for this figure in lines 400-404.

      Discussion - your approach to detecting TSP is elegant but deserves discussion of its limitations and, in particular, a clear explanation of why detecting TSP rather than quantifying its extent is more important in the context of this work. Another important point for discussion is alternative explanations for the patterns of TSP or, more broadly, gene tree - species tree discordance. Although long-term maintenance of allelic lineages due to long-term balancing selection is probably the most convincing explanation for the observed TSP, interspecific introgression and incorrect orthology assessment may also have contributed, and it would be good to see what the authors think about the potential contribution of these two factors.

      Overall, our goal was to use modern statistical methods and data to more confidently assess how ancient the TSP is at each gene. We have added several lines of text (as noted elsewhere in this document) to more clearly illustrate the limitations of our approach. We also agree that interspecific introgression and incorrect orthology assessment can cause similar patterns to arise. We attempted to minimize the effect of incorrect orthology assessment by creating multi-gene trees and exploring reference primate genomes, as described in our companion paper (https://doi.org/10.7554/eLife.103545.1), but cannot eliminate it completely. We have added a paragraph to the discussion to address this (lines 488-499). Interspecific introgression could also cause gene tree-species tree discordance, but we are not sure about how systematic this would have to be to cause the overall patterns we observe, nor about how likely it would have been for various clades of primates across the world.

      l. 421 -424 A more nuanced discussion distinguishing between positive selection, which facilitates the establishment of a mutation, and directional selection, which leads to its fixation, would be useful here.

      We added clarification to this sentence (line 443-445), from “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate.” to “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate, generating ample mutations upon which selection may act.”

      l. 432-434 You write here about the shaping of TCR repertoires, but I couldn't find any such information in the paper, including Table 1.

      We did not include a separate column for these, so they can be hard to spot. They take the form of “TCR 𝛽 Interaction Probability >50%”, “TCR Expression (TRAV38-1)”, or “TCR 𝛼 Interaction Probability >50%” and can be found in Table 1.

      l. 436-442 Here a more detailed discussion in the context of divergent allelic advantage and even the evolution of new S-type specificities in plants would be valuable.

      We added an additional citation to a review article to this sentence (lines 438-439).  

      l. 443 The use of the word "training" here is confusing, suggesting some kind of "education" during the lifetime of the animal.

      We agree that “train” is not an entirely appropriate term, and have changed it to “evolve” (line 465).

      489-491 What data were used for these calculations?

      Apologies for missing this citation! We used the 1000 genomes project data, and the citation has been updated (line 541-542).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      PAPS is required for all sulfotransferase reactions in which a sulfate group is covalently attached to amino acid residues of proteins or to side chains of proteoglycans. This sulfation is crucial for properly organizing the apical extracellular matrix (aECM) and expanding the lumen in the Drosophila salivary gland. Loss of Papss potentially leads to decreased sulfation, disorganizing the aECM, and defects in lumen formation. In addition, Papss loss destabilizes the Golgi structures.

      In Papss mutants, several changes occur in the salivary gland lumen of Drosophila. The tube lumen is very thin and shows irregular apical protrusions. There is a disorganization of the apical membrane and a compaction of the apical extracellular matrix (aECM). The Golgi structures and intracellular transport are disturbed. In addition, the ZP domain proteins Piopio (Pio) and Dumpy (Dpy) lose their normal distribution in the lumen, which leads to condensation and dissociation of the Dpy-positive aECM structure from the apical membrane. This results in a thin and irregularly dilated lumen.

      1. The authors describe various changes in the lumen in mutants, from thin lumen to irregular expansion. I would like to know the correct lumen diameter, and length, besides the total area, by which one can recognize thin and irregular.

      We have included quantification of the length and diameter of the salivary gland lumen in the stage 16 salivary glands of control, Papss mutant, and salivary gland-specific rescue embryos (Figure 1J, K). As described, Papss mutant embryos have two distinct phenotypes, one group with a thin lumen along the entire lumen and the other group with irregular lumen shapes. Therefore, we separated the two groups for quantification of lumen diameter. Additionally, we have analyzed the degree of variability for the lumen diameter to better capture the range of phenotypes observed (Figure 1K'). These quantifications enable a more precise assessment of lumen morphology, allowing readers to distinguish between thin and irregular lumen phenotypes.

      The rescue is about 30%, which is not as good as expected. Maybe the wrong isoform was taken. Is it possible to find out which isoform is expressed in the salivary glands, e.g., by RNA in situ Hyb? This could then be used to analyze a more focused rescue beyond the paper.

      Thank you for this point, but we do not agree that the rescue is about 30%. In Papss mutants, about 50% of the embryos show the thin lumen phenotype whereas the other 50% show irregular lumen shapes. In the rescue embryos with a WT Papss, few embryos showed thin lumen phenotypes. About 40% of the rescue embryos showed "normal, fully expanded" lumen shapes, and the remaining 60% showed either irregular (thin+expanded) or slightly overexpanded lumen. It is not uncommon that rescue with the Gal4/UAS system results in a partial rescue because it is often not easy to achieve the balance of the proper amount of the protein with the overexpression system.

      To address the possibility that the wrong isoform was used, we performed in situ hybridization to examine the expression of different Papss spice forms in the salivary gland. We used probes that detect subsets of splice forms: A/B/C/F/G, D/H, and E/F/H, and found that all probes showed expression in the salivary gland, with varying intensities. The original probe, which detects all splice forms, showed the strongest signals in the salivary gland compared to the new probes which detect only a subset. However, the difference in the signal intensity may be due to the longer length of the original probe (>800 bp) compared to other probes that were made with much smaller regions (~200 bp). Digoxigenin in the DIG labeling kit for mRNA detection labels the uridine nucleotide in the transcript, and the probes with weaker signals contain fewer uridines (all: 147; ABCFG, 29; D, 36; EFH, 66). We also used the Papss-PD isoform, for a salivary gland-specific rescue experiment and obtained similar results to those with Papss-PE (Figure 1I-L, Figure 4D and E).

      Furthermore, we performed additional experiments to validate our findings. We performed a rescue experiment with a mutant form of Papss that has mutations in the critical rescues of the catalytic domains of the enzyme, which failed to rescue any phenotypes, including the thin lumen phenotype (Figure 1H, J-L), the number and intensity of WGA puncta (Figure 3I, I'), and cell death (Figure 4D, E). These results provide strong evidence that the defects observed in Papss mutants are due to the lack of sulfation.

      Crb is a transmembrane protein on the apicolateral side of the membrane. Accordingly, the apicolateral distribution can be seen in the control and the mutant. I believe there are no apparent differences here, not even in the amount of expression. However, the view of the cells (frame) shows possible differences. To be sure, a more in-depth analysis of the images is required. Confocal Z-stack images, with 3D visualization and orthogonal projections to analyze the membranes showing Crb staining together with a suitable membrane marker (e.g. SAS or Uif). This is the only way to show whether Crb is incorrectly distributed. Statistics of several papas mutants would also be desirable and not just a single representative image. When do the observed changes in Crb distribution occur in the development of the tubes, only during stage 16? Is papss only involved in the maintenance of the apical membrane? This is particularly important when considering the SJ and AJ, because the latter show no change in the mutants.

      We appreciate your suggestion to more thoroughly analyze Crb distribution. We adapted a method from a previous study (Olivares-Castiñeira and Llimargas, 2017) to quantify Crb signals in the subapical region and apical free region of salivary gland cells. Using E-Cad signals as a reference, we marked the apical cell boundaries of individual cells and calculated the intensity of Crb signals in the subapical region (along the cell membrane) and in the apical free region. We focused on the expanded region of the SG lumen in Papss mutants for quantification, as the thin lumen region was challenging to analyze. This quantification is included in Figure 2D. Statistical analysis shows that Crb signals were more dispersed in SG cells in Papss mutants compared to WT.

      A change in the ECM is only inferred based on the WGA localization. This is too few to make a clear statement. WGA is only an indirect marker of the cell surface and glycosylated proteins, but it does not indicate whether the ECM is altered in its composition and expression. Other important factors are missing here. In addition, only a single observation is shown, and statistics are missing.

      We understand your concern that WGA localization alone may not be sufficient to conclude changes in the ECM. However, we observed that luminal WGA signals colocalize with Dpy-YFP in the WT SG (Figure 5-figure supplement 2C), suggesting that WGA detects the aECM structure containing Dpy. The similar behavior of WGA and Dpy-YFP signals in multiple genotypes further supports this idea. In Papss mutants with a thin lumen phenotype, both WGA and Dpy-YFP signals are condensed (Figure 5E-H), and in pio mutants, both are absent from the lumen (Figure 6B, D). We analyzed WGA signals in over 25 samples of WT and Papss mutants, observing consistent phenotypes. We have included the number of samples in the text. While we acknowledge that WGA is an indirect marker, our data suggest that it is a reliable indicator of the aECM structure containing Dpy.

      Reduced WGA staining is seen in papss mutants, but this could be due to other circumstances. To be sure, a statistic with the number of dots must be shown, as well as an intensity blot on several independent samples. The images are from single confocal sections. It could be that the dots appear in a different Z-plane. Therefore, a 3D visualization of the voxels must be shown to identify and, at best, quantify the dots in the organ.

      We have quantified cytoplasmic punctate WGA signals. Using spinning disk microscopy with super-resolution technology (Olympus SpinSR10 Sora), we obtained high-resolution images of cytoplasmic punctate signals of WGA in WT, Papss mutant, and rescue SGs with the WT and mutant forms of Papss-PD. We then generated 3D reconstructed images of these signals using Imaris software (Figure 3E-H) and quantified the number and intensity of puncta. Statistical analysis of these data confirms the reduction of the number and intensity of WGA puncta in Papss mutants (Figure 3I, I'). The number of WGA puncta was restored by expressing WT Papss but not the mutant form. By using 3D visualization and quantification, we have ensured that our results are not limited to a single confocal section and account for potential variations in Z-plane localization of the dots.

      A colocalization analysis (statistics) should be shown for the overlap of WGA with ManII-GFP.

      Since WGA labels multiple structures, including the nuclear envelope and ECM structures, we focused on assessing the colocalization of the cytoplasmic WGA punctate signals and ManII-GFP signals. Standard colocalization analysis methods, such as Pearson's correlation coefficient or Mander's overlap coefficient, would be confounded by WGA signals in other tissues. Therefore, we used a fluorescent intensity line profile to examine the spatial relationship between WGA and ManII-GFP signals in WT and Papss mutants (Figure 3L, L').

      I do not understand how the authors describe "statistics of secretory vesicles" as an axis in Figure 3p. The TEM images do not show labeled secretory vesicles but empty structures that could be vesicles.

      Previous studies have analyzed "filled" electron-dense secretory vesicles in TEM images of SG cells (Myat and Andrew, 2002, Cell; Fox et al., 2010, J Cell Biol; Chung and Andrew, 2014, Development). Consistent with these studies, our WT TEM images show these vesicles. In contrast, Papss mutants show a mix of filled and empty structures. For quantification, we specifically counted the filled electron-dense vesicles (now Figure 3W). A clear description of our analysis is provided in the figure legend.

      1. The quality of the presented TEM images is too low to judge any difference between control and mutants. Therefore, the supplement must present them in better detail (higher pixel number?).

      We disagree that the quality of the presented TEM images is too low. Our TEM images have sufficient resolution to reveal details of many subcellular structures, such as mitochondrial cisternae. The pdf file of the original submission may not have been high resolution. To address this concern, we have provided several original high-quality TEM images of both WT and Papss mutants at various magnifications in Figure 2-figure supplement 2. Additionally, we have included low-magnification TEM images of WT and Papss mutants in Figure 2H and I to provide a clearer view of the overall SG lumen morphology.

      Line 266: the conclusion that apical trafficking is "significantly impaired" does not hold. This implies that Papss is essential for apical trafficking, but the analyzed ECM proteins (Pio, Dumpy) are found apically enriched in the mutants, and Dumpy is even secreted. Moreover, they analyze only one marker, Sec15, and don't provide data about the quantification of the secretion of proteins.

      We agree and have revised our statement to "defective sulfation affects Golgi structures and multiple routes of intracellular trafficking".

      DCP-1 was used to detect apoptosis in the glands to analyze acellular regions. However, the authors compare ST16 control with ST15 mutant salivary glands, which is problematic. Further, it is not commented on how many embryos were analyzed and how often they detect the dying cells in control and mutant embryos. This part must be improved.

      Thank you for the comment. We agree and have included quantification. We used stage 16 samples from WT and Papss mutants to quantify acellular regions. Since DCP-1 signals are only present at a specific stage of apoptosis, some acellular regions do not show DCP-1 signals. Therefore, we counted acellular regions regardless of DCP-1 signals. We also quantified this in rescue embryos with WT and mutant forms of Papss, which show complete rescue with WT and no rescue with the mutant form, respectively. The graph with a statistical analysis is included (Figure 4D, E).

      WGA and Dumpy show similar condensed patterns within the tube lumen. The authors show that dumpy is enriched from stage 14 onwards. How is it with WGA? Does it show the same pattern from stage 14 to 16? Papss mutants can suffer from a developmental delay in organizing the ECM or lack of internalization of luminal proteins during/after tube expansion, which is the case in the trachea.

      Dpy-YFP and WGA show overlapping signals in the SG lumen throughout morphogenesis. Dpy-YFP is SG enriched in the lumen from stage 11, not stage 14 (Figure 5-figure supplement 2). WGA is also detected in the lumen throughout SG morphogenesis, similar to Dpy. In the original supplemental figure, only a stage 16 SG image was shown for co-localization of Dpy-YFP and WGA signals in the SG lumen. We have now included images from stage 14 and 15 in Figure 5-figure supplement 2C.

      Given that luminal Pio signals are lost at stage 16 only and that Dpy signals appear as condensed structures in the lumen of Papss mutants, it suggests that the internalization of luminal proteins is not impaired in Papss mutants. Rather, these proteins are secreted but fail to organize properly.

      Line 366. Luminal morphology is characterized by bulging and constrictions. In the trachea, bulges indicate the deformation of the apical membrane and the detachment from the aECM. I can see constrictions and the collapsed tube lumen in Fig. 6C, but I don't find the bulges of the apical membrane in pio and Np mutants. Maybe showing it more clearly and with better quality will be helpful.

      Since the bulging phenotype appears to vary from sample to sample, we have revised the description of the phenotype to "constrictions" to more accurately reflect the consistent observations. We quantified the number of constrictions along the entire lumen in pio and Np mutants and included the graph in Figure 6F.

      The authors state that Papss controls luminal secretion of Pio and Dumpy, as they observe reduced luminal staining of both in papss mutants. However, the mCh-Pio and Dumpy-YFP are secreted towards the lumen. Does papss overexpression change Pio and Dumpy secretion towards the lumen, and could this be another explanation for the multiple phenotypes?

      Thank you for the comment. To clarify, we did not observe reduced luminal staining of Pio and Dpy in Papss mutants, nor did we state that Papss controls luminal secretion of Pio and Dpy. In Papss mutants, Pio luminal signals are absent specifically at stage 16 (Figure 5H), whereas strong luminal Pio signals are present until stage 15 (Figure 5G). For Dpy-YFP, the signals are not reduced but condensed in Papss mutants from stages 14-16 (Figure 5D, H).

      It remains unclear whether the apparent loss of Pio signals is due to a loss of Pio protein in the lumen or due to epitope masking resulting from protein aggregation or condensation. As noted in our response to Comment 11 internalization of luminal proteins seems unaffected in Papss mutants; proteins like Pio and Dpy are secreted into the lumen but fail to properly organize. Therefore, we have not tested whether Papss overexpression alters the secretion of Pio or Dpy.

      In our original submission, we incorrectly stated that uniform luminal mCh-Pio signals were unchanged in Papss mutants. Upon closer examination, we found these signals are absent in the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly.

      Regulation of luminal ZP protein level is essential to modulate the tube expansion; therefore, Np releases Pio and Dumpy in a controlled manner during st15/16. Thus, the analysis of Pio and Dumpy in NP overexpression embryos will be critical to this manuscript to understand more about the control of luminal ZP matrix proteins.

      Thanks for the insightful suggestion. We overexpressed both the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. It is important to note that these overexpression experiments were done in the presence of the endogenous WT Np.

      Overexpression of Np.WT led to increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. In contrast, overexpression of Np.S990A resulted in a near complete loss of luminal mCh-Pio signals. Pio antibody signals remained strong at the apical membrane but was weaker in the luminal filamentous structures compared to WT.

      Due to the GFP tag present in the UAS-Np.S990A line, we could not reliably analyze Dpy-YFP signals because of overlapping fluorescent signals in the same channel. However, the filamentous Pio signals in the lumen co-localized with GFP signals, suggesting that these structures might also include Dpy-YFP, although this cannot be confirmed definitively.

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I.

      Minor: Fig. 5 C': mChe-Pio and Dumpy-YFP are mixed up at the top of the images.

      Thanks for catching this error. It has been corrected.

      Sup. Fig7. A shows Pio in purple but B in green. Please indicate it correctly.

      It has been corrected.

      Reviewer #1 (Significance (Required)):

      In 2023, the functions of Pio, Dumpy, and Np in the tracheal tubes of Drosophila were published. The study here shows similar results, with the difference that the salivary glands do not possess chitin, but the two ZP proteins Pio and Dumpy take over its function. It is, therefore, a significant and exciting extension of the known function of the three proteins to another tube system. In addition, the authors identify papss as a new protein and show its essential function in forming the luminal matrix in the salivary glands. Considering the high degree of conservation of these proteins in other species, the results presented are crucial for future analyses and will have further implications for tubular development, including humans.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: There is growing appreciation for the important of luminal (apical) ECM in tube development, but such matrices are much less well understood than basal ECMs. Here the authors provide insights into the aECM that shapes the Drosophila salivary gland (SG) tube and the importance of PAPSS-dependent sulfation in its organization and function.

      The first part of the paper focuses on careful phenotypic characterization of papss mutants, using multiple markers and TEM. This revealed reduced markers of sulfation (Alcian Blue staining) and defects in both apical and basal ECM organization, Golgi (but not ER) morphology, number and localization of other endosomal compartments, plus increased cell death. The authors focus on the fact that papss mutants have an irregular SG lumen diameter, with both narrowed regions and bulged regions. They address the pleiotropy, showing that preventing the cell death and resultant gaps in the tube did not rescue the SG luminal shape defects and discussing similarities and differences between the papss mutant phenotype and those caused by more general trafficking defects. The analysis uses a papss nonsense mutant from an EMS screen - I appreciate the rigorous approach the authors took to analyze transheterozygotes (as well as homozygotes) plus rescued animals in order to rule out effects of linked mutations.

      The 2nd part of the paper focuses on the SG aECM, showing that Dpy and Pio ZP protein fusions localize abnormally in papss mutants and that these ZP mutants (and Np protease mutants) have similar SG lumen shaping defects to the papss mutants. A key conclusion is that SG lumen defects correlate with loss of a Pio+Dpy-dependent filamentous structure in the lumen. These data suggest that ZP protein misregulation could explain this part of the papss phenotype.

      Overall, the text is very well written and clear. Figures are clearly labeled. The methods involve rigorous genetic approaches, microscopy, and quantifications/statistics and are documented appropriately. The findings are convincing, with just a few things about the fusions needing clarification.

      minor comments 1. Although the Dpy and Qsm fusions are published reagents, it would still be helpful to mention whether the tags are C-terminal as suggested by the nomenclature, and whether Westerns have been performed, since (as discussed for Pio) cleavage could also affect the appearance of these fusions.

      Thanks for the comment. Dpy-YFP is a knock-in line in which YFP is inserted into the middle of the dpy locus (Lye et al., 2014; the insertion site is available on Flybase). mCh-Qsm is also a knock-in line, with mCh inserted near the N-terminus of the qsm gene using phi-mediated recombination using the qsmMI07716 line (Chu and Hayashi, 2021; insertion site available on Flybase). Based on this, we have updated the nomenclature from Qsm-mCh to mCh-Qsm throughout the manuscript to accurately reflect the tag position. To our knowledge, no western blot has been performed on Dpy-YFP or mCh-Qsm lines. We have mentioned this explicitly in the Discussion.

      The Dpy-YFP reagent is a non-functional fusion and therefore may not be a wholly reliable reporter of Dpy localization. There is no antibody confirmation. As other reagents are not available to my knowledge, this issue can be addressed with text acknowledgement of possible caveats.

      Thanks for raising this important point. We have added a caveat in the Discussion noting this limitation and the need for additional tools, such as an antibody or a functional fusion protein, to confirm the localization of Dpy.

      TEM was done by standard chemical fixation, which is fine for viewing intracellular organelles, but high pressure freezing probably would do a better job of preserving aECM structure, which looks fairly bad in Fig. 2G WT, without evidence of the filamentous structures seen by light microscopy. Nevertheless, the images are sufficient for showing the extreme disorganization of aECM in papss mutants.

      We agree that HPF is a better method and intent to use the HPF system in future studies. We acknowledge that chemical fixation contributes to the appearance of a gap between the apical membrane and the aECM, which we did not observe in the HPF/FS method (Chung and Andrew, 2014). Despite this, the TEM images still clearly reveal that Papss mutants show a much thinner and more electron-dense aECM compared to WT (Figure 2H, I), consistent to the condensed WGA, Dpy, and Pio signals in our confocal analyses. As the reviewer mentioned, we believe that the current TEM data are sufficient to support the conclusion of severe aECM disorganization and Golgi defects in Papss mutants.

      The authors may consider citing some of the work that has been done on sulfation in nematodes, e.g. as reviewed here: https://pubmed.ncbi.nlm.nih.gov/35223994/ Sulfation has been tied to multiple aspects of nematode aECM organization, though not specifically to ZP proteins.

      Thank you for the suggestion. Pioneering studies in C. elegans have highlighted the key role of sulfation in diverse developmental processes, including neuronal organization, reproductive tissue development, and phenotypic plasticity. We have now cited several works.

      Reviewer #2 (Significance (Required)):

      This study will be of interest to researchers studying developmental morphogenesis in general and specifically tube biology or the aECM. It should be particularly of interest to those studying sulfation or ZP proteins (which are broadly present in aECMs across organisms, including humans).

      This study adds to the literature demonstrating the importance of luminal matrix in shaping tubular organs and greatly advances understanding of the luminal matrix in the Drosophila salivary gland, an important model of tubular organ development and one that has key matrix differences (such as no chitin) compared to other highly studied Drosophila tubes like the trachea.

      The detailed description of the defects resulting from papss loss suggests that there are multiple different sulfated targets, with a subset specifically relevant to aECM biology. A limitation is that specific sulfated substrates are not identified here (e.g. are these the ZP proteins themselves or other matrix glycoproteins or lipids?); therefore it's not clear how direct or indirect the effects of papss are on ZP proteins. However, this is clearly a direction for future work and does not detract from the excellent beginning made here.

      My expertise: I am a developmental geneticist with interests in apical ECM

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work Woodward et al focus on the apical extracellular matrix (aECM) in the tubular salivary gland (SG) of Drosophila. They provide new insights into the composition of this aECM, formed by ZP proteins, in particular Pio and Dumpy. They also describe the functional requirements of PAPSS, a critical enzyme involved in sulfation, in regulating the expansion of the lumen of the SG. A detailed cellular analysis of Papss mutants indicate defects in the apical membrane, the aECM and in Golgi organization. They also find that Papss control the proper organization of the Pio-Dpy matrix in the lumen. The work is well presented and the results are consistent.

      Main comments

      • This work provides a detailed description of the defects produced by the absence of Papss. In addition, it provides many interesting observations at the cellular and tissular level. However, this work lacks a clear connection between these observations and the role of sulfation. Thus, the mechanisms underlying the phenotypes observed are elusive. Efforts directed to strengthen this connection (ideally experimentally) would greatly increase the interest and relevance of this work.

      Thank you for this thoughtful comment. To directly test whether the phenotypes observed in Papss mutants are due to the loss of sulfation activity, we generated transgenic lines expressing catalytically inactive forms of Papss, UAS-PapssK193A, F593P, in which key residues in the APS kinase and ATP sulfurylase domains are mutated. Unlike WT UAS-Papss (both the Papss-PD or Papss-PE isoforms), the catalytically inactive UAS-Papssmut failed to rescue any of the phenotypes, including the thin lumen phenotype (Figure 1I-L), altered WGA signals (Figure I, I') and the cell death phenotype (Figure 4D, E). These findings strongly support the conclusion that the enzymatic sulfation activity of Papss is essential for the developmental processes described in this study.

      • A main issue that arises from this work is the role of Papss at the cellular level. The results presented convincingly indicate defects in Golgi organization in Papss mutants. Therefore, the defects observed could stem from general defects in the secretion pathway rather than from specific defects on sulfation. This could even underly general/catastrophic cellular defects and lead to cell death (as observed). This observation has different implications. Is this effect observed in SGs also observed in other cells in the embryo? If Papss has a general role in Golgi organization this would be expected, as Papss encodes the only PAPs synthatase in Drosophila. Can the authors test any other mutant that specifically affect Golgi organization and investigate whether this produces a similar phenotype to that of Papss?

      Thank you for the comment. To address whether the defects observed in Papss mutants stem from general disruption of the secretory pathway due to Golgi disorganization, we examined mutants of two key Golgi components: Grasp65 and GM130.

      In Grasp65 mutants, we observed significant defects in SG lumen morpholgy, including highly irregular SG lumen shape and multiple constrictions (100%; n=10/10). However, the lumen was not uniformly thin as in Papss mutants. In contrast, GM130 mutants-although this line was very sick and difficult to grow-showed relatively normal salivary glands morphology in the few embryos that survived to stage 16 (n=5/5). It is possible that only embryos with mild phenotypes progressed to this stages, limiting interpretation. These data have now been included in Figure 3-figure supplement 2. Overall, while Golgi disruption can affect SG morphology, the specific phenotypes seen in Papss mutants are not fully recapitulated by Grasp65 or GM130 loss.

      • A model that conveys the different observations and that proposes a function for Papss in sulfation and Golgi organization (independent or interdependent?) would help to better present the proposed conclusions. In particular, the paper would be more informative if it proposed a mechanism or hypothesis of how sulfation affects SG lumen expansion. Is sulfation regulating a factor that in turn regulates Pio-Dpy matrix? Is it regulating Pio-Dpy directly? Is it regulating a product recognized by WGA? For instance, investigating Alcian blue or sulfotyrosine staining in pio, dpy mutants could help to understand whether Pio, Dpy are targets of sulfation.

      Thank you for the comment. We're also very interested in learning whether the regulation of the Pio-Dpy matrix is a direct or indirect consequence of the loss of sulfation on these proteins. One possible scenario is that sulfation directly regulates the Pio-Dpy matrix by regulating protein stability through the formation of disulfide bonds between the conserved Cys residues responsible for ZP module polymerization. Additionally, the Dpy protein contains hundreds of EGF modules that are highly susceptible to O-glycosylation. Sulfation of the glycan groups attached to Dpy may be critical for its ability to form a filamentous structure. Without sulfation, the glycan groups on Dpy may not interact properly with the surrounding materials in the lumen, resulting in an aggregated and condensed structure. These possibilities are discussed in the Discussion.

      We have not analyzed sulfation levels in pio or dpy mutants because sulfation levels in mutants of single ZP domain proteins may not provide much information. A substantial number of proteoglycans, glycoproteins, and proteins (with up to 1% of all tyrosine residues in an organism's proteins estimated to be sulfated) are modified by sulfation, so changes in sulfation levels in a single mutant may be subtle. Especially, the existing dpy mutant line is an insertion mutant of a transposable element; therefore, the sulfation sites would still remain in this mutant.

      • Interpretation of Papss effects on Pio and Dpy would be desired. The results presented indicate loss of Pio antibody staining but normal presence of cherry-Pio. This is difficult to interpret. How are these results of Pio antibody and cherry-Pio correlating with the results in the trachea described recently (Drees et al. 2023)?

      In our original submission, we stated that the uniform luminal mCh-Pio signals were not changed in Papss mutants, but after re-analysis, we found that these signals were actually absent from the expanded luminal region in stage 16 SG (where Dpy-YFP is also absent), and weak mCh-Pio signals colocalize with the condensed Dpy-YFP signals (Figure 5C, D). We have revised the text accordingly.

      After cleavages by Np and furin, the Pio protein should have three fragments. The N-terminal region contains the N-terminal half of the ZP domain, and mCh-Pio signals show this fragment. The very C-terminal region should localize to the membrane as it contains the transmembrane domain. We think the middle piece, the C-terminal ZP domain, is recognized by the Pio antibody. The mCh-Pio and Pio antibody signals in the WT trachea (Drees et al., 2023) are similar to those in the SG. mCh-Pio signals are detected in the tracheal lumen as uniform signals, at the apical membrane, and in cytoplasmic puncta. Pio antibody signals are exclusively in the tracheal lumen and show more heterogenous filamentous signals.

      In Papss mutants, the middle fragment (the C-terminal ZP domain) seems to be most affected because the Pio antibody signals are absent from the lumen. The loss of Pio antibody signals could be due to protein degradation or epitope masking caused by aECM condensation and protein misfolding. This fragment seems to be key for interacting with Dpy, since Pio antibody signals always colocalize with Dpy-YFP. The N-terminal mCh-Pio fragment does not appear to play a significant role in forming a complex with Dpy in WT (but still aggregated together in Papss mutants), and this can be tested in future studies.

      In response to Reviewer 1's comment, we performed an additional experiment to test the role of Np in cleaving Pio to help organize the SG aECM. In this experiment, we overexpressed the WT and mutant form of Np using UAS-Np.WT and UAS-Np.S990A lines (Drees et al., 2019) and analyzed mCh-Pio, Pio antibody, and Dpy-YFP signals. Np.WT overexpression resulted in increased levels of mCh-Pio, Pio, and Dpy-YFP signals in the lumen and at the apical membrane. However, overexpression of Np.S990A resulted in the absence of luminal mCh-Pio signals. Pio antibody signals were strong at the apical membrane but rather weak in the luminal filamentous structures. Since the UAS-Np.S990A line has the GFP tag, we could not reliably analyze Dpy-YFP signals due to overlapping Np.S990A.GFP signals in the same channel. However, the luminal filamentous Pio signals co-localized with GFP signals, and we assume that these overlapping signals could be Dpy-YFP signals.

      These results suggest that overexpressed Np.S990A may act in a dominant-negative manner, competing with endogenous Np and impairing proper cleavage of Pio (and mCh-Pio). Nevertheless, some level of cleavage by endogenous Np still appears to occur, as indicated by the residual luminal filamentous Pio signals. These new findings have been incorporated into the revised manuscript and are shown in Figure 6H and 6I.

      A proposed model of the Pio-Dpy aECM in WT, Papss, pio, and Np mutants has now been included in Figure 7.

      • What does the WGA staining in the lumen reveal? This staining seems to be affected differently in pio and dpy mutants: in pio mutants it disappears from the lumen (as dpy-YFP does), but in dpy mutants it seems to be maintained. How do the authors interpret these findings? How does the WGA matrix relate to sulfated products (using Alcian blue or sulfotyrosine)?

      WGA binds to sialic acid and N-acetylglucosamine (GlcNAc) residues on glycoproteins and glycolipids. GlcNAc is a key component of the glycosaminoglycan (GAG) chains that are covalently attached to the core protein of a proteoglycan, which is abundant in the ECM. We think WGA detects GlcNAc residues in the components of the aECM, including Dpy as a core component, based on the following data. 1) WGA and Dpy colocalize in the lumen, both in WT (as thin filamentous structures) and Papss mutant background (as condensed rod-like structures), and 2) are absent in pio mutants. WGA signals are still present in a highly condensed form in dpy mutants. That's probably because the dpy mutant allele (dpyov1) has an insertion of a transposable element (blood element) into intron 11 and this insertion may have caused the Dpy protein to misfold and condense. We added the information about the dpy allele to the Results section and discussed it in the Discussion.

      Minor points:

      • The morphological phenotypic analysis of Papss mutants (homozygous and transheterozygous) is a bit confusing. The general defects are higher in Papss homozygous than in transheterozygotes over a deficiency. Maybe quantifying the defects in the heterozygote embryos in the Papss mutant collection could help to figure out whether these defects relate to Papss mutation.

      We analyzed the morphology of heterozygous Papss mutant embryos. They were all normal. The data and quantifications have now been added to Figure 1-figure supplement 3.

      • The conclusion that the apical membrane is affected in Papss mutants is not strongly supported by the results presented with the pattern of Crb (Fig 2). Further evidences should be provided. Maybe the TEM analysis could help to support this conclusion

      We quantified Crb levels in the sub-apical and medial regions of the cell and included this new quantification in Figure 2D. TEM images showed variation in the irregularity of the apical membrane, even in WT, and we could not draw a solid conclusion from these images.

      • It is difficult to understand why in Papss mutants the levels of WGA increase. Can the authors elaborate on this?

      We think that when Dpy (and many other aECM components) are condensed and aggregated into the thin, rod-like structure in Papss mutants, the sugar residues attached to them must also be concentrated and shown as increased WGA signals.

      • The explanation about why Pio antibody and mcherry-Pio show different patterns is not clear. If the antibody recognizes the C-t region, shouldn't it be clearly found at the membrane rather than the lumen?

      The Pio protein is also cleaved by furin protease (Figure 5B). We think the Pio fragment recognized by the antibody should be a "C-terminal ZP domain", which is a middle piece after furin + Np cleavages.

      • The qsm information does not seem to provide any relevant information to the aECM, or sulfation.

      Since Qsm has been shown to bind to Dpy and remodel Dpy filaments in the muscle tendon (Chu and Hayashi, 2021), we believe that the different behavior of Qsm in the SG is still informative. As mentioned briefly in the Discussion, the cleaved Qsm fragment may localize differently, like Pio, and future work will need to test this. We have shortened the description of the Qsm localization in the manuscript and moved the details to the figure legend of Figure 5-figure supplement 3.

      Reviewer #3 (Significance (Required)):

      Previous reports already indicated a role for Papss in sulfation in SG (Zhu et al 2005). Now this work provides a more detailed description of the defects produced by the absence of Papss. In addition, it provides relevant data related to the nature and requirements of the aECM in the SG. Understanding the composition and requirements of aECM during organ formation is an important question. Therefore, this work may be relevant in the fields of cell biology and morphogenesis.

    1. Reviewer #2 (Public review):

      The authors aimed to develop an animal model of temporal lobe epilepsy (TLE) that will generate "on-demand" seizures and an improved platform to advance our ability to find new anti-seizure drugs (ASDs) for drug-resistant epilepsy (DRE). Unlike some of the work in this field, the authors are studying actual seizures, and hopefully events that are similar to actual epileptic seizures. To develop an optimized screening tool, however, one also needs high-throughput systems with actual seizures as a quantitative, rigorous, and reproducible outcome measures. The authors aim to provide such a model; however, this approach may be over-stated here and seems unlikely to address the critical issue of drug resistance, which is their most important claim.

      Strengths:

      - The authors have generated an animal model of "on demand" seizures, which could be used to screen new ASDs and potentially other therapies. The authors and their model make a good-faith effort to emulate the epileptic condition and to use seizure susceptibility or probability as a quantitative output measure.

      - The events considered to be seizures appear to be actual seizures, with some evidence that the seizures are different from seizures in the naïve brain. Their effort to determine how different ASDs raise seizure probability or threshold to an optogenetic stimulus to the CA1 area of the rodent hippocampus is focused on an important problem, as many if not most ASD screening uses surrogate measures that may not be as well linked to actual epileptic seizures.

      - Another concern is their stimulation of dorsal hippocampus, while ventral hippocampus would seem more appropriate.

      - Use of optogenetic techniques allows specific stimulation of the targeted CA1 pyramidal cells, and it appears that this approach is reproducible and reliable with quantitative rigor.

      - The authors have taken on a critically important problem, and have made a good-faith effort to address many of the technical concerns raised in the reviews, but the underlying problem of DRE remains.

      Weaknesses:

      - Although the model has potential advantages, it also has disadvantages. As stated by the authors, the pre-test work-load to prepare the model may not be worth the apparent advantages. And most important, the paper frequently mentions DRE but does not directly address it, and yet drug resistance is the critical issue in this field.

      - Although the paper shows examples of actual seizures, there remains some concern that some of the events might not be seizures - or a homogeneous population of seizures. More quantitative assessment of the electrical properties (e.g., duration) of the seizures and their probability is likely to be more useful than the proposed quantification in the future of the behavioral seizure stages, because the former could be both more objective and automated, while the behavioral analysis of the seizures will likely be more subjective and less reliable (and also fraught with subjectivity and analytical problems). Nonetheless, the authors point that the presence of "Racine 3 or above" behavioral seizures (in addition to their electrical data) is a good argument that many (if not all) of the "seizures" are actual epileptic seizures.

      - Optogenetic stimulation of CA1 provides cell-specificity for the stimulation, but it is not clear that this method would actually be better than electrical stimulation of a kindled rodent with superimposed hippocampal injury. The reader is unfortunately left with the concern of whether this model would be easier and more efficacious than kindling.

      - Although the authors have taken on a critically important problem, and have combined a variety of technologies, this approach may facilitate more rapid screening of ASDs against actual seizures (beneficial), but it does not really address the fundamentally critical yet difficult problem of DRE. A critical issue for DRE that is not well-addressed relates to adverse effects, which is often why many ASDs are not well tolerated by many patients (e.g., LEV). Thus, we are left with: how does this address anti-seizure DRE?

      - The focus of this paper seems to be more on seizures more than on epilepsy. In the absence of seizure spontaneity, the work seems to primarily address the issues of seizure spread and duration. Although this is useful, it does not seem to be addressing the question of what trips the system to generate a seizure.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      - The authors seem to have developed a new and useful model; however, it is not clear how this will address that core problem of DRE, which was their stated aim.

      - A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      - As stated before in the original review, the potential impact would primarily be aimed at the ETSP or a drug-testing CRO; however, much more work will be required to convince the epilepsy community that this approach will actually identify new ASDs for DRE. The approach is potentially time-consuming with a steep and potentially difficult optimization curve, and thus may not be readily adaptable to the typical epilepsy-models neuroscience laboratory.

      Any additional context you think would help readers interpret or understand the significance of the work:

      - The problem of DRE is much more complicated than described by the authors here; however, the paper could end up being more useful than is currently apparent. Although this work could be seen as technically - and maybe conceptually - elegant and a technical tour de force, will it "deliver on the promise"? Is it better than kindling for DRE? In attempting to improve the discovery process, how will the model move us to another level? Will this model really be any better than others, such as kindling?

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their constructive comments and the Editor for the possibility to address the Reviewers’ points in this rebuttal. We 

      (1) Conducted new experiments with NP6510-Gal4 and TH-Gal4 lines to address potential behavioral differences due to targeting dopaminergic vs. both dopaminergic and serotonergic neurons

      (2) Conducted novel data analyses to emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies

      (3) Provided Supplementary Movies

      (4) Calculated additional statistics

      (5) Edited and added text to address all points of the Reviewers.

      Please see our point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioral assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable, and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioral data is detailed, and the analysis parameters are well-explained.

      We thank the Reviewer for the positive assessment of our study.

      Weaknesses:

      While the abstract promises to give us an assay to accelerate fly-to-human translation, the authors need to provide evidence to show that this is indeed the case. They have used PD lines extensively characterized by other groups, often with cheaper and easier-to-setup assays like negative geotaxis, and do not offer any new insights into them. The conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression is enormous, and the paper does not make any attempt to bridge it. It needs to be clarified how this assay provides a new understanding of the fly PD models, as the authors do not explore the cellular/circuit basis of the phenotypes. Similarly, they have assumed that the behavior they are looking at is an escape-from-predator response modulated by the central complex- is there any evidence to support these assumptions? Because of their rather superficial approach, the paper does not go beyond providing us with a collection of interesting but preliminary observations.

      We thank the Reviewer for pointing out some limitations of our study. We would like to emphasize that what we perceive as the main advantage of performing single-fly and single-trial analyses is the access to rich data distributions that provide more fine-scale information compared to bulk assays. We think that this is exactly going one step closer to ‘bridging the enormous conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression’, and we showcase this in our study by comparing the distributions over the entire repertoire of behavioral responses across fly mutants. Nevertheless, we agree with the Reviewer that many more steps in this direction are needed to improve translatability. Therefore, we toned down the corresponding statements in the Abstract and in the Introduction. Moreover, to further emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies, we complemented our comparisons of central tendencies with testing for potential differences in data dispersion, demonstrated in the novel Supplementary Figure S4.

      Looming stimuli have been used to characterize flies’ escape behaviors. These studies uncovered a surprisingly rich behavioral repertoire (Zacarias et al., 2018), which was modulated by both sensory and motor context, e.g. walking speed at time of stimulus presentation (Card and Dickinson, 2008; Oram and Card, 2022; Zacarias et al., 2018). The neural basis of these behaviors was also investigated, revealing loom-sensitive neurons in the optic lobe and the giant fiber escape pathway (Ache et al., 2019; de Vries and Clandinin, 2012). Although less frequently, passing shadows were also employed as threat-inducing stimuli in flies (Gibson et al., 2015). We opted for this variant of the stimulus so that we could ensure that the shadow reached the same coordinates in all linear track concurrently, aiding data analysis and scalability. Similar to the cited study, we found the same behavioral repertoire as in studies with looming stimuli, with an equivalent dependence on walking speed, confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli. We added a discussion on this topic to the main text.

      Reviewer #2 (Public Review):

      In this study, Kajtor et al investigated the use of a single-animal trial-based behavioral assay for the assessment of subtle changes in the locomotor behavior of different genetic models of Parkinson's disease of Drosophila. Different genotypes used in this study were Ddc-GAL4>UASParkin-275W and UAS- α-Syn-A53T. The authors measured Drosophila's response to predatormimicking passing shadow as a threatening stimulus. Along with these, various dopamine (DA) receptor mutants, Dop1R1, Dop1R2 and DopEcR were also tested.

      The behavior was measured in a custom-designed apparatus that allows simultaneous testing of 13 individual flies in a plexiglass arena. The inter-trial intervals were randomized for 40 trials within 40 minutes duration and fly responses were defined into freezing, slowing down, and running by hierarchical clustering. Most of the mutant flies showed decreased reactivity to threatening stimuli, but the speed-response behavior was genotype invariant.

      These data nicely show that measuring responses to the predator-mimicking passing shadows could be used to assess the subtle differences in the locomotion parameters in various genetic models of Drosophila.

      The understanding of the manifestation of various neuronal disorders is a topic of active research. Many of the neuronal disorders start by presenting subtle changes in neuronal circuits and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The data from the present study nicely uses the behavioral response to predator-mimicking passing shadows to measure subtle changes in behavior. However, there are a few important points that would help establish the robustness of this study.

      We thank the Reviewer for the constructive comments and the positive assessment of our study.

      (1) The visual threat stimulus for measuring response behavior in Drosophila is previously established for both single and multiple flies in an arena. A comparative analysis of data and the pros and cons of the previously established techniques (for example, Gibson et al., 2015) with the technique presented in this study would be important to establish the current assay as an important advancement.

      We thank the Reviewer for this suggestion. We included the following discussion on measuring response behavior to visual threat stimuli in the revised manuscript.

      Many earlier studies used looming stimulus, that is, a concentrically expanding shadow, mimicking the approach of a predator from above, to study escape responses in flies (Ache et al., 2019; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Oram and Card, 2022; Zacarias et al., 2018) as well as rodents (Braine and Georges, 2023; Heinemans and Moita, 2024; Lecca et al., 2017). These assays have the advantage of closely resembling naturalistic, ecologically relevant threatinducing stimuli, and allow a relatively complete characterization of the fly escape behavior repertoire. As a flip side of their large degree of freedom, they do not lend themselves easily to provide a fully standardized, scalable behavioral assay. Therefore, Gibson et al. suggested a novel threat-inducing assay operating with moving overhead translational stimuli, that is, passing shadows, and demonstrated that they induce escape behaviors in flies akin to looming discs (Gibson et al., 2015). This assay, coined ReVSA (repetitive visual stimulus-induced arousal) by the authors, had the advantage of scalability, while constraining flies to a walking arena that somewhat restricted the remarkably rich escape types flies otherwise exhibit. Here we carried this idea one step further by using a screen to present the shadows instead of a physically moving paddle and putting individual flies to linear corridors instead of the common circular fly arena. This ensured that the shadow reached the same coordinates in all linear tracks concurrently and made it easy to accurately determine when individual flies encountered the stimulus, aiding data analysis and scalability. We found the same escape behavioral repertoire as in studies with looming stimuli and ReVSA (Gibson et al., 2015; Zacarias et al., 2018), with a similar dependence on walking speed (Oram and Card, 2022; Zacarias et al., 2018), confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli.  

      (2) Parkinson's disease mutants should be validated with other GAL-4 drivers along with DdcGAL4, such as NP6510-Gal4 (Riemensperger et al., 2013). This would be important to delineate the behavioral differences due to dopaminergic neurons and serotonergic neurons and establish the Parkinson's disease phenotype robustly.

      We thank the Reviewer for point out this limitation. To address this, we repeated our key experiments in Fig.3. with both TH-Gal4 and NP6510-Gal4 lines, and their respective controls. These yielded largely similar results to the Ddc-Gal4 lines reported in Fig.3., reproducing the decreased speed and decreased overall reactivity of PD-model flies. Nevertheless, TH-Gal4 and NP6510-Gal4 mutants showed an increased propensity to stop. Stop duration showed a significant increase not only in α-Syn but also in Parkin fruit flies. These novel results have been added to the text and are demonstrated in Supplementary Figure S3.

      (3) The DopEcR mutant genotype used for behavior analysis is w1118; PBac{PB}DopEcRc02142TM6B, Tb1. Balancer chromosomes, such as TM6B,Tb can have undesirable and uncharacterised behavioral effects. This could be addressed by removing the balancer and testing the DopEcR mutant in homozygous (if viable) or heterozygous conditions.

      We appreciate the Reviewer's comment and acknowledge the potential for the DopEcR balancer chromosome to produce unintended behavioral effects. However, given that this mutant was not essential to our main conclusions, we opted not to repeat the experiment. Nevertheless, we now discuss the possible confounds associated with using the PBac{PB}DopEcRc02142 mutant allele over the balancer chromosome. “We recognize a limitation in using PBac{PB}DopEcRc02142 over the  TM6B, Tb<sup>1</sup> balancer chromosome, as the balancer itself may induce behavioral deficits in flies. We consider this unlikely, as the PBac{PB}DopEcRc02142 mutation demonstrates behavioral effects even in heterozygotes (Ishimoto et al., 2013). Additionally, to our knowledge, no studies have reported behavioral deficits in flies carrying the TM6B, Tb<sup>1</sup> balancer chromosome over a wild-type chromosome.”

      (4) The height of the arena is restricted to 1mm. However, for the wild-type flies (Canton-S) and many other mutants, the height is usually more than 1mm. Also, a 1 mm height could restrict the fly movement. For example, it might not allow the flies to flip upside down in the arena easily. This could introduce some unwanted behavioral changes. A simple experiment with an arena of height at least 2.5mm could be used to verify the effect of 1mm height.

      We thank the Reviewer for this comment, which prompted us to reassess the dimensions of the apparatus. The height of the arena was 1.5 mm, which we corrected now in the text. We observed that the arena did not restrict the flies walking and that flies could flip in the arena. We now include two Supplementary Movies to demonstrate this.

      (5) The detailed model for Monte Carlo simulation for speed-response simulation is not described. The simulation model and its hyperparameters need to be described in more depth and with proper justification.

      We thank the Reviewer for pointing out a lack of details with respect to Monte Carlo simulations. We used a nested model built from actual data distributions, without any assumptions. Accordingly, the stimulation did not have hyperparameters typical in machine learning applications, the only external parameter being the number of resamplings (3000 for each draw). We made these modeling choices clearer and expanded this part as follows.

      “The effect of movement speed on the distribution of behavioral response types was tested using a nested Monte Carlo simulation framework (Fig. S5). This simulation aimed to model how different movement speeds impact the probability distribution of response types, comparing these simulated outcomes to empirical data. This approach allowed us to determine whether observed differences in response distributions are solely due to speed variations across genotypes or if additional behavioral factors contribute to the differences. First, we calculated the probability of each response type at different specific speed values (outer model). These probabilities were derived from the grand average of all trials across each genotype, capturing the overall tendency at various speeds. Second, we simulated behavior of virtual flies (n = 3000 per genotypes, which falls within the same order of magnitude as the number of experimentally recorded trials from different genotypes) by drawing random velocity values from the empirical velocity distribution specific to the given genotype and then randomly selecting a reaction based on the reaction probabilities associated with the drawn velocity (inner model). Finally, we calculated reaction probabilities for the virtual flies and compared it with real data from animals of the same genotype.

      Differences were statistically tested by Chi-squared test.”

      (6) The statistical analysis in different experiments needs revisiting. It wasn't clear to me if the authors checked if the data is normally distributed. A simple remedy to this would be to check the normality of data using the Shapiro-Wilk test or Kolmogorov-Smirnov test. Based on the normality check, data should be further analyzed using either parametric or non-parametric statistical tests. Further, the statistical test for the age-dependent behavior response needs revisiting as well. Using two-way ANOVA is not justified given the complexity of the experimental design. Again, after checking for the normality of data, a more rigorous statistical test, such as split-plot ANOVA or a generalized linear model could be used.

      We thank the Reviewer for this comment. We performed Kolmogorov-Smirnov test for normality on the data distributions underlying Figure 3, and normality was rejected for all data distributions at p = 0.05, which justifies the use of the non-parametric Mann-Whitney U-test. Regarding ANOVA, we would like to point out that the ANOVA hypothesis test design is robust to deviations from normality (Knief and Forstmeier, 2021; Mooi et al., 2018). While the Kruskal-Wallis test is considered a reasonable non-parametric alternative of one-way ANOVA, there is no clear consensus for a non-parametric alternative of two-way ANOVA. Therefore, we left the two-way ANOVA for Figure 5 in place; however, to increase the statistical confidence in our conclusions, we performed Kruskal-Wallis tests for the main effect of age and found significant effects in all genotypes in accordance with the ANOVA, confirming the results (Stop frequency, DopEcR p = 0.0007; Dop1R1, p = 0.004; Dop1R2, p = 9.94 × 10<sup>-5</sup>; w<sup>1118</sup>, p = 9.89 × 10<sup>-13</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 2.54 × 10<sup>-5</sup>; Slowing down frequency, DopEcR, p = 0.0421; Dop1R1, p = 5.77 x 10<sup>-6</sup>; Dop1R2, p = 0.011; w<sup>1118</sup>, p = 2.62 x 10<sup>-5</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 0.0382; Speeding up frequency, DopEcR, p = 0.0003; Dop1R1, p = 2.06 x 10<sup>-7</sup>; Dop1R2, p = 2.19 x 10<sup>-6</sup>; w<sup>1118</sup>, p = 0.0044; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 1.36 x 10<sup>-5</sup>). We also changed the post hoc Tukey-tests to post hoc Mann-Whitney tests in the text to be consistent with the statistical analyses for Figure 3. These resulted in very similar results as the Tukey-tests. Of note, there isn’t a straightforward way of correcting for multiple comparisons in this case as opposed to the Tukey’s ‘honest significance’ approach, we thus report uncorrected p values and suggest considering them at p = 0.01, which minimizes type I errors. These notes have been added to the ‘Data analysis and statistics’ Methods section.

      (7) The dopamine receptor mutants used in this study are well characterized for learning and memory deficits. In the Parkinson's disease model of Drosophila, there is a loss of DA neurons in specific pockets in the central brain. Hence, it would be apt to use whole animal DA receptor mutants as general DA mutants rather than the Parkinson's disease model. The authors may want to rework the title to reflect the same.

      We thank the Reviewer for this comment, which suggests that we were not sufficiently clear on the Drosophila lines with DA receptor mutations. We used Mi{MIC} random insertion lines for dopamine receptor mutants, namely y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R1<sup>MI04437</sup> (BDSC 43773), y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R2<sup>MI08664</sup> (BDSC 51098) (Harbison et al., 2019; Pimentel et al., 2016), and w<sup>1118</sup>; PBac{PB}DopEcR<sup>c02142</sup>/TM6B, Tb<sup>1</sup> (BDSC 10847) (Ishimoto et al., 2013; Petruccelli et al., 2020, 2016). These lines carried reported mutations in dopamine receptors, most likely generating partial knock down of the respective receptors. We made this clearer by including the full names at the first occurrence of the lines in Results (beyond those in Methods) and adding references to each of the lines.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please think about focusing the manuscript either on the escape response or the PD pathology and provide additional evidence to demonstrate that you indeed have a novel system to address open questions in the field.

      As detailed above, we now emphasize more that the main advantage of our single-trial-based approach lies in the appropriate statistical comparison of rich distributions of behavioral data. Please see our response to the ‘Weaknesses’ section for more details.

      (2) Please explain the rationale for choosing the genetic lines and provide appropriate genetic controls in the experiments, e.g. trans-heterozygotes. Why use Ddc-Gal4 instead of TH or other specific Split-Gal4 lines?

      We thank the Reviewer for this suggestion. We repeated our key experiments with TH-Gal4 and NP6510-Gal4 lines. Please see our response to Point #2 of Reviewer #2 for details.

      (3) Please proofread the manuscript for ommissions. e.g. there's no legend for Fig 4b.

      We respectfully point out that the legend is there, and it reads “b, Proportion of a given response type as a function of average fly speed before the shadow presentation. Top, Parkin and α-Syn flies. Bottom, Dop1R1, Dop1R2 and DopEcR mutant flies.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In figure 2(c), representing the average walking speed data for different mutants would be useful to visually correlate the walking differences.

      We thank the Reviewer for this suggestion. The average walking speed was added in a scatter plot format, as suggested in the next point of the Reviewer. 

      (2) The data could be represented more clearly using scatter plots. Also, the color scheme could be more color-blindness friendly.

      We thank the Reviewer for this suggestion. We added scatter plots to Fig.2c that indeed represent the distribution of behavioral responses better. We also changed the color scheme and removed red/green labeling.

      (3) The manuscript should be checked for typos such as in line 252, 449, 484.

      Thank you. We fixed the typos.

      References

      Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, Bock DD, von Reyn CR, Card GM. 2019. Neural Basis for Looming Size and Velocity Encoding in the Drosophila Giant Fiber Escape Pathway. Curr Biol 29:1073-1081.e4. doi:10.1016/j.cub.2019.01.079

      Braine A, Georges F. 2023. Emotion in action: When emotions meet motor circuits. Neurosci Biobehav Rev 155:105475. doi:10.1016/j.neubiorev.2023.105475

      Card G, Dickinson MH. 2008. Visually Mediated Motor Planning in the Escape Response of Drosophila. Curr Biol 18:1300–1307. doi:10.1016/j.cub.2008.07.094

      de Vries SEJ, Clandinin TR. 2012. Loom-Sensitive Neurons Link Computation to Action in the Drosophila Visual System. Curr Biol 22:353–362. doi:10.1016/j.cub.2012.01.007

      Gibson WT, Gonzalez CR, Fernandez C, Ramasamy L, Tabachnik T, Du RR, Felsen PD, Maire MR, Perona P, Anderson DJ. 2015. Behavioral Responses to a Repetitive Visual Threat Stimulus Express a Persistent State of Defensive Arousal in Drosophila. Curr Biol 25:1401– 1415. doi:10.1016/j.cub.2015.03.058

      Harbison ST, Kumar S, Huang W, McCoy LJ, Smith KR, Mackay TFC. 2019. Genome-Wide Association Study of Circadian Behavior in Drosophila melanogaster. Behav Genet 49:60–82. doi:10.1007/s10519-018-9932-0

      Heinemans M, Moita MA. 2024. Looming stimuli reliably drive innate defensive responses in male rats, but not learned defensive responses. Sci Rep 14:21578. doi:10.1038/s41598-02470256-2

      Ishimoto H, Wang Z, Rao Y, Wu C, Kitamoto T. 2013. A Novel Role for Ecdysone in Drosophila Conditioned Behavior: Linking GPCR-Mediated Non-canonical Steroid Action to cAMP Signaling in the Adult Brain. PLoS Genet 9:e1003843. doi:10.1371/journal.pgen.1003843

      Knief U, Forstmeier W. 2021. Violating the normality assumption may be the lesser of two evils. Behav Res Methods 53:2576–2590. doi:10.3758/s13428-021-01587-5

      Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. 2017. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife 6:1–16. doi:10.7554/eLife.30697

      Mooi E, Sarstedt M, Mooi-Reci I. 2018. Market Research, Springer Texts in Business and Economics. Singapore: Springer Singapore. doi:10.1007/978-981-10-5218-7

      Oram TB, Card GM. 2022. Context-dependent control of behavior in Drosophila. Curr Opin Neurobiol 73:102523. doi:10.1016/j.conb.2022.02.003

      Petruccelli E, Lark A, Mrkvicka JA, Kitamoto T. 2020. Significance of DopEcR, a G-protein coupled dopamine/ecdysteroid receptor, in physiological and behavioral response to stressors. J Neurogenet 34:55–68. doi:10.1080/01677063.2019.1710144

      Petruccelli E, Li Q, Rao Y, Kitamoto T. 2016. The Unique Dopamine/Ecdysteroid Receptor Modulates Ethanol-Induced Sedation in Drosophila. J Neurosci 36:4647–4657. doi:10.1523/JNEUROSCI.3774-15.2016

      Pimentel D, Donlea JM, Talbot CB, Song SM, Thurston AJF, Miesenböck G. 2016. Operation of a homeostatic sleep switch. Nature 536:333–337. doi:10.1038/nature19055

      Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. 2018. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nat Commun 9:1–11. doi:10.1038/s41467-018-05875-1

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Metheringham et al. reports on interesting new characterizations of phenotypes caused by genetic inactivation of subunits of the methyl transferase complex responsible for N6-adenosine methylation in (pre)-mRNA ("the m6A writer") in the plant Arabidopsis thaliana. The main claim of the paper is that mutants in these subunits exhibit autoimmunity, a claim that is supported by the following lines of evidence:

      • Transcriptome profiling by mRNA-seq shows a gene expression profile with differential expression of many stress- and defense-related genes.
      • The immunity-like gene expression profile is observed under growth at 17{degree sign}C but not at 27{degree sign}C, consistent with the well-known temperature-sensitivity of some (but not all) innate immunity signaling systems in plants.
      • m6A writer mutants show increased resistance to infection by the virulent Pseudomonas syringae DC3000 strain.
      • The primary biochemical defect in m6A writing is not temperature sensitive, excluding the trivial possibility that the mutant alleles chosen for study are simply ts.

      The observations are important and the manuscript is very well written, a pleasure to read: the problem is clearly presented, the experimental results are presented in a clear, logical succession, and the discussion treats important points.

      The study is valuable pending some manuscript revision on the autoimmunity interpretation of the results obtained, and a few suggested edits that can be included if the authors agree that they would improve the paper.

      The finding that an autoimmune-like state is activated in m6A writer mutants is significant because it provides a warning flag on how such mutants should be used for studying the role of m6A in stress response signaling, including reassessment of previously published work. Whether the stress state really is autoimmunity is subject to some debate, particularly because no genetic evidence to support it has been obtained. The results are nonetheless interesting and constitute an important contribution to the community, even if they remain descriptive and with nearly no insight into molecular mechanisms. My suggestions for improvement are summarized below.

      1. Although the authors do a lot to support the claim that autoimmunity is an element of m6A writer mutant phenotypes, the study does not include genetic evidence to support this claim. This is important, because if the stress/defense gene activation causes some of the morphological phenotypes of m6A writer mutants, one should be able to suppress such defects by mutation of know immune signaling components such as the appropriate nucleotide-binding leucine-rich repeat proteins, or more generic signaling components such as EDS1, PAD4 and SAG1, common to a subset of such intracellular immune receptors. Resistance to pathogens can be observed in mutants with constitutive stress response signaling, and defense-like gene expression can be induced as a secondary of other primary defects, for instance DNA damage. Similarly, while it is true that some types of immune activation are temperature sensitive, others are not 1, and clearly, elevated temperature changes so much of the physiology of the plant that sensitivity to elevated temperature cannot be used as proof of immune activation. Thus, each of the lines of evidence presented is suggestive, not conclusive. Together, they constitute a good argument, but still not a completely satisfactory proof of the main claim. I do not think that this concern means that a lot of genetic work must be undertaken to make this paper publishable, but I think that the authors should be even more careful about how they interpret their observations. I understand that they favor more or less direct activation of autoimmunity, although even if that were true, it would be unclear what the biochemical triggers of such autoimmunity would be (unmethylated RNA, absence or writer components, excess of free m6A-binding proteins etc). However, given the concerns above, I think the authors should dedicate a small paragraph in the discussion to the possibility that the primary cause of stress/defense-gene expression is unclear and may not result from innate immune surveillance of unmethylated mRNA or components of the m6A pathway as favoured by the authors.
      2. It may be of relevance to search promoters of differentially expressed genes for enrichment of cis-elements. This simple approach identified the W-box in the first papers using transcriptome profiling to characterize the immune state in Arabidopsis 2,3, and could perhaps reveal whether a WRKY-driven transcriptional program drives differential expression or whether several other transcription factor classes may also contribute substantially, as may be expected if a more complex stress-related transcriptional program is activated. I do not think that this is a deal breaker, but some additional useful information from the existing data might be gathered in this way.
      3. Stress response activation has also been clearly described in ect2 ect3 ect4 mutants4 and even if the authors find no evidence for PR1 expression in this mutant, it is still of relevance to include a mention of this result in the discussion, together with the discussion of stress response activation seen in writer mutants in earlier reports 5,6. I would not mind the authors being a bit more explicit about what their results mean for studies that try to conclude on the biological relevance of m6A in different types of stress signaling, using phenotypes writer mutants as their primary line of evidence. But this is of course up to the authors to decide on that.
      4. In the introduction on preferred m6A sequence contexts, please clarify that m6A in plants occurs both DRACH in (G)GAU contexts 7,8.
      5. When mentioning convergence on shared signaling components from immune receptors, please include a tiny bit more information for the reader. For instance, EDS1 is mentioned, but this protein is only required for signaling from (some) TIR-NBS-LRRs, not the class of CC-NBS-LRRs. Indeed, signaling by this latter class may not converge on just one to a few components, as their multimerization appears to form the ion channels required for signaling-inducing ion currents.
      6. Please clarify in the introduction and in later parts that only some forms of autoimmunity can be suppressed by elevated temperature. Sentences like "A hallmark of Arabidopsis autoimmunity is temperature sensitivity..." are a bit misleading. Temperature sensitivity has clearly been used to study some forms of EDS1-dependent immunity, to great effect in the TMV-N interaction for instance, but it is not accurate to call temperature sensitivity a "hallmark of autoimmunity".
      7. In the discussion of possible biochemical triggers of autoimmunity in m6A mutants, please consider the following:

      (A) Mention the possibility that the primary trigger may not be immune receptor-surveillance of some defect induced by lack of m6A in mRNA (as discussed above).

      (B) In connection with the consideration that lack of m6A writer components, not m6A in mRNA, may be a signal, you could include the observation from yeast that Ime4 knockouts have a much stronger phenotype than Ime4 catalytically dead mutants or knockouts of the sole yeast YTH-domain Pho92 9. Indeed, it is a bit of an embarrassment to the plant m6A community that we have not yet examined phenotypes of MTA and MTB catalytically dead mutants, and the present report should further urge the community to finally do this important experiment. 8. Just a tiny typo on page 15, Pst DC3000, not Pst D3000 (of no relevance to the overall assessment, just a help to eliminate annoying errors before final submission).

      REFERENCES

      1. Demont, H. et al. Downstream signaling induced by several plant Toll/interleukin-1 receptor-containing immune proteins is stable at elevated temperature. Cell Reports 44(2025).
      2. Petersen, M. et al. Arabidopsis MAP kinase 4 negatively regulates systemic acquired resistance. Cell 103, 1111-1120 (2000).
      3. Maleck, K. et al. The transcriptome of Arabidopsis thaliana during systemic acquired resistance. Nature Genetics 26, 403-410 (2000).
      4. Arribas-Hernández, L. et al. The YTHDF proteins ECT2 and ECT3 bind largely overlapping target sets and influence target mRNA abundance, not alternative polyadenylation. eLife 10, e72377 (2021).
      5. Bodi, Z. et al. Adenosine Methylation in Arabidopsis mRNA is Associated with the 3' End and Reduced Levels Cause Developmental Defects. Front Plant Sci 3, 48 (2012).
      6. Prall, W. et al. Pathogen-induced m6A dynamics affect plant immunity. The Plant Cell 35, 4155-4172 (2023).
      7. Arribas-Hernández, L. et al. Principles of mRNA targeting via the Arabidopsis m6A-binding protein ECT2. eLife 10, e72375 (2021).
      8. Wang, G. et al. Quantitative profiling of m6A at single base resolution across the life cycle of rice and Arabidopsis. Nature Communications 15, 4881 (2024).
      9. Ensinck, I. et al. The yeast RNA methylation complex consists of conserved yet reconfigured components with m6A-dependent and independent roles. eLife 12, RP87860 (2023).

      Significance

      The finding that an autoimmune-like state is activated in m6A writer mutants is significant because it provides a warning flag on how such mutants should be used for studying the role of m6A in stress response signaling, including reassessment of previously published work. Whether the stress state really is autoimmunity is subject to some debate, particularly because no genetic evidence to support it has been obtained. The results are nonetheless interesting and constitute an important contribution to the community, even if they remain descriptive and with nearly no insight into molecular mechanisms. I wish to congratulate the authors on another valuable contribution to the plant m6A field.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

      We would like to thank the editors for agreeing to review our work at eLife. We greatly appreciate them assessing this study as important and of general interest to multiple fields, as well as the opportunity to respond to reviewer comments. Please find our responses to each reviewer below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

      We thank the reviewer for deeming our work significant and noting the importance of the study. We appreciate the referee’s point regarding the lack of specific cellular functional data for innate immune cells and have modified the conclusions stated in text to more accurately reflect the results presented.

      Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

      We thank the reviewer for agreeing that the data presented support the stated conclusions and noting the experimental rigor.  The referee highlights two important areas for future mechanistic investigation that we agree are of great importance and relevant to the submitted study. We have included further discussion of the potential role cytokines and the microbiota might play in our model.

      Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

      We thank the reviewer for considering our work of interest and noting the rigor with which it was conducted. The referee raises several excellent mechanistic hypotheses and follow-up studies to perform. We agree that defining the specific dietary deficiency driving the phenotypes is of great interest. The relative contribution of calories versus macro- and micronutrients is an area we are interested in exploring in future studies, especially given the literature on the role of micronutrients in malnutrition driven wasting as the referee notes. We also agree that it will be key to determine whether non-hematopoietic cells contribute as well as the role of soluble factors such G-CSF and Leptin in mediating the immunodeficiency all warrant further study. Likewise, it will be important to evaluate how malnutrition impacts other models of infection to determine how generalizable these phenomena are. We have added these points to the discussion section as limitations of this study.

      Regarding how the phenotypes correspond to the timing of the immunosuppression relative to weight loss, we have performed new kinetics studies to provide some insight into this area. We now find that neutropenia in peripheral blood can be detected after as little as one week of dietary restriction, with neutropenia continuing to decline after prolonged restriction. These findings indicate that the impact on myeloid cell production are indeed rapid and proceed maximum weight loss, though the severity of these phenotypes does increase as malnutrition persists. We wholeheartedly agree with the reviewer that it will be interesting to explore whether starting weight impacts these phenotypes and whether similar findings can be made in obese animals as they are treated for weight loss.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I could not find any cellular mechanism defining the link between nutritional state and immunocompetency. The assays on myeloid cells are limited, and the study is descriptive and overstated.

      Major concerns:

      (1) Malnutrition has entirely different effects on adults and children. In this study, 6-8 weeks old C57/Bl6 mice were used that mimic adult malnutrition. I do not understand then why the refeeding strategy for inpatient treatment of severely malnourished children was utilized here.

      (2) Figure 1g shows BM cellularity is reduced, but the authors claim otherwise in the text.

      (3) What is the basis of the body condition score in Figure 1d? It will be good to have it in the supplement.

      (4) Listeria monocytogenes cause systemic infection, so bioload was not determined in tissues beyond the liver.

      (5) Figure 3; T cell functional assays were limited to CD8 T cells and lymphocytes isolated from the spleen.

      (6) Why was peripheral cell count not considered? Discrepancies exist with the absolute cell number and relative abundance data, except for the neutrophil and monocyte data, which makes the data difficult to interpret. For example, for B cells, CD4 and CD8 cells.

      (7) Also, if mice exhibit thymic atrophy, why does % abundance data show otherwise? Overall, the data is confusing to interpret.

      (8) No functional tests for neutrophil or monocyte function exist to explain the higher bacterial burden in the liver or to connect the numbers with the overall pathogen load

      The rationale for examining both innate and adaptive immunity is not clear-it is even more unclear since the exact timelines for examining both innate and adaptive immunity (D0 and D5) were used.

      (9) Figure 2e doesn't make sense - why is spleen cellularity measured when bacterial load is measured in the liver?

      (10) Although it is claimed that emergency myelopoiesis is affected, no specific marker for emergency myelopoiesis other than cell numbers was studied.

      (11) I suggest including neutrophil effector functions and looking for real markers of granulopoiesis, such as Cebp-b. Since the authors attempted to examine the entirety of immune responses, it is better to measure cell abundance, types, and functions beyond the spleen. Consider the systemic spread of m while measuring bioload.

      (12) Minor grammatical errors - please re-read the entire text and correct grammatical errors to improve the flow of the text.

      (13) Sample size details missing

      (14) Be clear on which marks were used to identify monocytes. Using just CD11b and Ly6G is insufficient for neutrophil quantification.

      (15) Also, instead of saying "undernourished patients," say "patients with undernutrition" - change throughout the text. I would recommend numbering citations (as is done for Nature citations) to ease in following the text, as there are areas when there are more than ten citations with author names.

      (16) No line numbers are provided

      (17) Abstract

      -  What does accelerated contraction mean?

      -  "In" is repeated in a sentence

      -  Be clear that the study is done in a mouse model - saying just "animals" is not sufficient

      -  Indicate how malnutrition is induced in these mice

      (18) Introduction

      -  "restriction," "immune organs," - what is this referring to?

      -  You mention lymphoid tissue and innate and adaptive immunity, which doesn't make sense.

      Please correct this.

      -  You mention a lot of lymphoid tissues, i.e. lymphoid mass gain, but how about the bone marrow and spleen, which are responsible for most innate immune compartments?

      (19) Results

      a) Figure 1

      -  Why 40% reduced diet?

      -  It would be interesting to report if the organs are smaller relative to body weight. It makes sense that the organ weight is lower in the 40RD mice, especially since they are smaller, so the novelty of this data is not apparent (Figure 1f).

      -  You say, "We observed a corresponding reduction in the cellularity of the spleen and thymus, while the cellularity of the bone marrow was unaffected (Fig. 1g)." however, your BM data is significant, so this statement doesn't reflect the data you present, please correct.

      b) Figure 2

      - Figure 2d - what tissue is this from, mentioned in the figure? And measure cellularity there. The rationale for why you look only at the spleen here is weak. Also, we would benefit from including the groups without infection here for comparison purposes.

      c) Figure 3

      - The rationale for why you further looked at T cells is weak, mainly because of the following sentence. "Despite this overall loss in lymphocyte number, the relative frequency of each population was either unchanged or elevated, indicating that while malnutrition leads to a global reduction in immune cell numbers, lymphocytes are less impacted than other immune cell populations (Supplemental 1)." Please explain in the main text.

      d) Figure 4

      -  You say the peak of the adaptive immune response, but you never looked at the peak of adaptive immune - when is this? If you have the data, please show it. You also only show d0 and d5 post-infection data for adaptive immunity, so I am unsure where this statement comes from.

      -  How did you identify neutrophils and monocytes through flow cytometry? Indicate the markers used. Also, your text does not match your data; please correct it. i.e. monocyte numbers reduced, and relative abundance increased, but your text doesn't say this.

      -  Show the flow graph first then, followed by the quantification.

      -  The study would benefit from examining markers of emergency myelopoiesis such as Cebpb through qPCR.

      -  Although the number of neutrophils is lower in the BM and spleen, how does this relate to increased bacterial load in the liver? This is especially true since you did not quantify neutrophil numbers in the liver.

      e) Figure 6

      -  Some figures are incorrectly labelled.

      -  For the refeeding data, also include the data from the 40RD group to compare the level of recovery in the outcome measures.

      (20) Discussion

      -  You claim that monocytes are reduced to the same extent as neutrophils, but this is not true.

      Please correct.

      -  Indicate some limitations of your work.

      We thank the reviewer for offering these recommendations and the constructive comments. 

      Several comments raised concerns over the rationale or reasoning behind aspects of the experimental design or the data presented, which we would like to clarify:

      • Regarding the refeeding protocol, we apologize for the confusion for the rationale. We based our methodology on the general guidelines for refeeding protocols for malnourished people. We elected to increase food intake 10% daily to avoid risk of refeeding syndrome or other complications. Our method is by no means replicates the administration of specific vitamins, minerals, electrolytes, nor precise caloric content as would be given to a human patient. The citation provided offers information from the WHO regarding the complications that can arise during refeeding syndrome, which while it is from a document on pediatric care, we did not mean to imply that our method modeled refeeding intervention for children. We have modified the text to avoid this confusion.

      • The reviewer requested more clarity on why we studied both the innate and adaptive immune system as well as why we chose the time points studied. As referenced in the manuscript, prior work has observed that caloric restriction, fasting, and malnutrition all can impact the adaptive immune system. Given these previous findings, we felt it important to evaluate how malnutrition affected adaptive immune cell populations in our model. To this end, we provide data tracking the course of T-cell responses from the start of infection through day 14 at the time that the response undergoes contraction. However, since we find that bacterial burden is not properly controlled at earlier time points (day 5), when it is understood the innate immune system is more critical for mediating pathogen clearance, we elected to better characterize the effect malnutrition had on innate immune populations, something less well described in the literature. As phenotypes both in bacterial burden and within innate immune populations were observable as early as day 5, we chose to focus on that time point rather than later time points when readouts could be further confounded by secondary or compounding effects by the lack of early control of infection. We have tried to make this rationale clear in the text and have made changes to further emphasize this reasoning.

      • The reviewer also requested an explaination over why bacterial burden was measured in the liver and the immune response was measured in the spleen. While the reviewer is correct that our model is a systemic infection, it is well appreciated that bacteria rapidly disseminate to the liver and spleen and these organs serve as major sites of infection. Given the central role the spleen plays in organizing both the innate and adaptive immune response in this model, it is common practice in the field to phenotype immune cell populations in the spleen, while using the liver to quantify bacterial burden (see PMID: 37773751 as one example of many). We acknowledge this does not provide the full scope of bacterial infection or the immune response in every potentially affected tissue, but nonetheless believe the interpretation that malnourished and previously malnourished animals do not properly control infection and their immune responses are blunted compared to controls still stands.

      The reviewer raised several points about di3erences in the results for cell frequency and absolute number and why these may deviate in some circumstances. For example, the reviewer notes that we observe thymic atrophy yet the frequency of peripheral T-cells does not decline. It should be noted that absolute number can change when frequency does not and vice versa, due to changes in other cell types within the studied population of cells. As in the case of peripheral lymphocytes in our study, the frequency can stay the same or even increase when the absolute number declines (Supplemental 1). This can occur if other populations of cells decrease further, which is indeed the case as the loss of myeloid cells is greater than that of lymphocytes. Hence, we find that the frequency of T and B cells is unchanged or elevated, despite the loss in absolute number of peripheral cell, which is our stated interpretation. We believe this is consistent with our overall observations and is why it is important to report both frequency and absolute number, as we have done. 

      We have made the requested changes to the text to address the reviewers concerns as noted to improve clarity and accuracy for the description of experiments, results, and overall conclusions drawn in the manuscript. We have also included a discussion of the limitations of our work as well as additional areas for future investigation that remain open. 

      Reviewer #2 (Recommendations for the authors):

      Regarding the known drivers of myelopoiesis, can the authors quantify circulating levels of relevant immune cytokines (e.g. type I and II IFNs, GM-CSF, etc.)?

      Regarding the microbiota (point #2), how dramatically does this undernutrition modulate the microbiota both in terms of absolute load and community composition, and how effectively/quickly is this rescued by refeeding?

      We thank the reviewer for raising these recommendations. We agree that the role of circulating factors like cytokines and growth factors in contributing to the defects in myelopoiesis is of interest and is the focus of future work. Similarly, the impact of malnutrition on the microbiota is of great interest and has been evaluated by other groups in separate studies. How the known impact of malnutrition on the microbiota affects the phenotypes we observe in myelopoiesis is unclear and warrants future investigation. We have added these points to the discussion section as limitations of this study.

    1. Author response:

      Reviewer #1 (Public Review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which will help us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we intend to implement in the revised version. We acknowledge the reviewer’s concern regarding potential over-interpretation of certain findings, and we will take particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We will clarify this in the revised manuscript. As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We will clarify in the manuscript that this was a proof-of-concept phase 2 study, designed to generate biological signals rather than confirm efficacy in a powered comparison. The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We will improve the Methods section to clarify how safety and tolerability were assessed during the study. Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4<sup>+</sup> T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4<sup>+</sup> T cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir.

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern may reflect the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harboring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. This explains why the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). We will highlight this result more clearly in the revised manuscript, as it directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes.

      In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a modest but biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across these independent measures suggests a signal indicative of ongoing replication in at least some individuals, and at specific timepoints.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA Shearing Index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity and weak signals, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we will adopt a more measured tone in the discussion, clearly stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a potentially meaningful biological signal that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We will revise the manuscript to reflect this more accurately. We will also note that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We will explicitly acknowledge these limitations and interpret the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We will expand the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of post-intervention follow-up. While we acknowledge that open-label designs can introduce behavioral biases, including potential changes in adherence, we will now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research.

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we will highlight this limitation and future direction in the revised manuscript.

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs-part2/cancer/cancer-screening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors. We will clarify these points in the revised manuscript.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA / Total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA / Total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA / Total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public Review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public Review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We will revise the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We will acknowledge these limitations in the revised manuscript. We will provide both the full range and the IQR in Table 1 in the revised manuscript.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values. These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/Total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size.

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4<sup>+</sup> T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We will temper the language accordingly and add commentary on the limited and modest nature of these changes. Similarly, we will expand our discussion of counterintuitive findings such as the CD4:CD8 ratio and sCD14 changes.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We will highlight this more clearly in the revised Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      For clarity, the methods would benefit from further detail of task framing to participants. I.e. were there explicit instructions regarding volatility/task contingencies? Or were participants told nothing?

      We have added in the following explanatory text to the methods section (page 20), clarifying the limited instructions provided to participants:

      “Participants were informed that the task would be split into 6 blocks, that they had to learn which was the best option to choose, and that this option may change over time. They were not informed about the different forms of uncertainty we were investigating or of the underlying structure of the task (that uncertainty varied between blocks).”

      In the results, it would be useful to report the general task behavior of participants to get a sense of how they performed across different parts of the task. Also, were participants excluded if they didn't show evidence of learning adaptation to volatility?

      We have added the following text reporting overall performance to the results (page 6):

      “Participants were able to learn the best option to choose in the task, selecting the most highly rewarded option on an average of 71% of trials (range 65% - 74%).”

      And the following text to the methods, confirming that participants were not excluded if they didn’t respond to volatility/noise (the failure in this adaptation is the focus of the current study) (page 19):

      “No exclusion criteria related to task performance were used.”

      The results would benefit from a more intuitive explanation of what the lesioning is trying to recapitulate; this can get quite technical and the objective is not necessarily clear, especially for the less computationally-minded reader.

      We have amended the relevant section of the results to clarify this point (page 9):

      “Having shown that an optimal learner adjusts its learning rate to changes in volatility and noise as expected, we next sought to understand the relative noise insensitivity of participants. In these analyses we “lesion” the BOM, to reduce its performance in some way, and then assess whether doing so recapitulates the pattern of learning rate adaptation observed for participants (Fig 3e). In other words, we damage the model so it performs less well and then assess whether this damage makes the behaviour of the BOM (shown in Fig 3f) more closely resemble that seen in participants (Fig 3e).”

      The modelling might be improved by the inclusion of another class of model. Specifically, models that adapt learning rates in response to the estimation of latent states underlying the current task outcomes would be very interesting to see. In a sense, these are also estimating volatility through changeability of latent states, and it would be interesting to explore whether the findings could also be explained by an incorrect assumption that the latent state has changed when outcomes are noisy.

      Thank you for this suggestion. We have added additional sections to the supplementary materials in which we use a general latent state model and a simple RL model to try to recapitulate the behaviour of participants (and to compare with the BOM). These additional sections are extensive, so are not reproduced here. We have also added in a section to the discussion in the main paper covering this interesting question in which we confirm that we were unable to reproduce participant behaviour (or the normative effect of the lesioned BOMs) using these models but suggest that alternative latent state formulations would be interesting to explore in future work (page 18):

      “A related question is whether other, non-Bayesian model formulations may be able to account for participants’ learning adaptation in response to volatility and noise. Of note, the reinforcement learning model used to measure learning rates in separate blocks does not achieve this goal—as this model is fitted separately to each block rather than adapting between blocks (NB the simple reinforcement learning model that is fitted across all blocks does not capture participant behaviour, see supplementary information). One candidate class of model that has potential here is latent-state models (Cochran & Cisler, 2019), in which the variance and unexpected changes in the process being learned (which have a degree of similarity with noise and volatility respectively) is estimated and used to alter the model’s rates of updating as well as the estimated number of states being considered. Using the model described by Cochran and Cisler, we were unable to replicate the learning rate adaptation demonstrated by participants in the current study (see supplementary information) although it remains possible that other latent state formulations may be more successful. “

      The discussion may benefit from a little more discussion of where this work leads us - what is the next step?

      As above, we have added in a suggestion about future modelling work. We have also added in a section about the outstanding interesting questions concerning the neural representation of these quantities, reproduced in response to the suggestion by reviewer #2 below.

      Reviewer #2 (Recommendations for the authors):

      The study presents an opportunity to explore potential neural coding models that could account for the cognitive processes underlying the task. In the field of neural coding, noise correlation is often measured to understand how a population of neurons responds to the same stimulus, which could be related to the noise signal in this task. Since the brain likely treats the stimulus as the same, with noise representing minor changes, this aspect could be linked to the participants' difficulty distinguishing noise from volatility. On the other hand, signal correlation is used to understand how neurons respond to different stimuli, which can be mapped to the volatility signal in the task. It would be highly beneficial if the authors could discuss how these established concepts from neural population coding might relate to the Bayesian behavior model used in the study. For instance, how might neurons encode the distinction between noise and volatility at a population level? Could noise correlation lead to the misattribution of noise as volatility at a neural level, mirroring the behavioral findings? Discussing possible neural models that could explain the observed behavior and relating it to the existing literature on neural population coding would significantly enrich the discussion. It would also open up avenues for future research, linking these behavioral findings to potential neural mechanisms.

      We thank the reviewer for this interesting suggestion. We have added in the following paragraph to the discussion section which we hope does justice to this interesting questions (page 18):

      Previous work examining the neural representations of uncertainty have tended to report correlations between brain activity and some task-based estimate of one form of uncertainty at a time (Behrens et al., 2007; Walker et al., 2020, 2023). We are not aware of work that has, for example, systematically varied volatility and noise and reported distinct correlations for each. An interesting possibility as to how different forms of uncertainty may be encoded is suggested by parallels with the neuronal decoding literature. One question addressed by this literature is how the brain decodes changes in the world from the distributed, noisy neural responses to those changes, with a particular focus on the influence of different forms of between-neuron correlation (Averbeck et al., 2006; Kohn et al., 2016). Specifically, signal-correlation, the degree to which different neurons represent similar external quantities (required to track volatility) is distinguished from, and often limited by, noise-correlation, the degree to which the activity of different neurons covaries independently of these external quantities. One possibility relevant to the current study, which resembles the underlying logic of the BOM, is that a population of neurons represents the estimated mean of the generative process that produces task outcomes. In this case, volatility would be tracked as the signal-correlation across this population, whereas noise would be analogous to the noise-correlation and, crucially, misestimation of noise as volatility might arise as misestimation of these two forms of correlation. While the current study clearly cannot adjudicate on the neural representation of these processes, our finding of distinct behavioural and physiological responses to the two forms of uncertainty, does suggest that separable neural representations of uncertainty are maintained. “

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors report a study on how stimulation of receptive-field surround of V1 and LGN neurons affects their firing rates. Specifically, they examine stimuli in which a grey patch covers the classical RF of the cell and a stimulus appears in the surround. Using a number of different stimulus paradigms they find a long latency response in V1 (but not the LGN) which does not depend strongly on the characteristics of the surround grating (drifting vs static, continuous vs discontinuous, predictable grating vs unpredictable pink noise). They find that population responses to simple achromatic stimuli have a different structure that does not distinguish so clearly between the grey patch and other conditions and the latency of the response was similar regardless of whether the center or surround was stimulated by the achromatic surface. Taken together they propose that the surround-response is related to the representation of the grey surface itself. They relate their findings to previous studies that have put forward the concept of an ’inverse RF’ based on strong responses to small grey patches on a full-screen grating. They also discuss their results in the context of studies that suggest that surround responses are related to predictions of the RF content or figure-ground segregation. Strengths:

      I find the study to be an interesting extension of the work on surround stimulation and the addition of the LGN data is useful showing that the surround-induced responses are not present in the feedforward path. The conclusions appear solid, being based on large numbers of neurons obtained through Neuropixels recordings. The use of many different stimulus combinations provides a rich view of the nature of the surround-induced responses.

      Weaknesses:

      The statistics are pooled across animals, which is less appropriate for hierarchical data. There is no histological confirmation of placement of the electrode in the LGN and there is no analysis of eye or face movements which may have contributed to the surround-induced responses. There are also some missing statistics and methods details which make interpretation more difficult.

      We thank the reviewer for their positive and constructive comments, and have addressed these specific issues in response to the minor comments. For the statistics across animals, we refer to “Reviewer 1 recommendations” point 1. For the histological analysis, we refer to “Reviewer 1 recommendations point 2”. For the eye and facial movements, we refer to “Reviewer 1 recommendations point 5”. Concerning missing statistics and methods details, we refer to various responses to “Reviewer 1 recommendations”. We thoroughly reviewed the manuscript and included all missing statistical and methodological details.

      Reviewer #2 (Public review):

      Cuevas et al. investigate the stimulus selectivity of surround-induced responses in the mouse primary visual cortex (V1). While classical experiments in non-human primates and cats have generally demonstrated that stimuli in the surround receptive field (RF) of V1 neurons only modulate activity to stimuli presented in the center RF, without eliciting responses when presented in isolation, recent studies in mouse V1 have indicated the presence of purely surround-induced responses. These have been linked to prediction error signals. In this study, the authors build on these previous findings by systematically examining the stimulus selectivity of surround-induced responses.

      Using neuropixels recordings in V1 and the dorsal lateral geniculate nucleus (dLGN) of head-fixed, awake mice, the authors presented various stimulus types (gratings, noise, surfaces) to the center and surround, as well as to the surround only, while also varying the size of the stimuli. Their results confirm the existence of surround-induced responses in mouse V1 neurons, demonstrating that these responses do not require spatial or temporal coherence across the surround, as would be expected if they were linked to prediction error signals. Instead, they suggest that surround-induced responses primarily reflect the representation of the achromatic surface itself.

      The literature on center-surround effects in V1 is extensive and sometimes confusing, likely due to the use of different species, stimulus configurations, contrast levels, and stimulus sizes across different studies. It is plausible that surround modulation serves multiple functions depending on these parameters. Within this context, the study by Cuevas et al. makes a significant contribution by exploring the relationship between surround-induced responses in mouse V1 and stimulus statistics. The research is meticulously conducted and incorporates a wide range of experimental stimulus conditions, providing valuable new insights regarding center-surround interactions.

      However, the current manuscript presents challenges in readability for both non-experts and experts. Some conclusions are difficult to follow or not clearly justified.

      I recommend the following improvements to enhance clarity and comprehension:

      (1) Clearly state the hypotheses being tested at the beginning of the manuscript.

      (2) Always specify the species used in referenced studies to avoid confusion (esp. Introduction and Discussion).

      (3) Briefly summarize the main findings at the beginning of each section to provide context.

      (4) Clearly define important terms such as “surface stimulus” and “early vs. late stimulus period” to ensure understanding.

      (5) Provide a rationale for each result section, explaining the significance of the findings.

      (6) Offer a detailed explanation of why the results do not support the prediction error signal hypothesis but instead suggest an encoding of the achromatic surface.

      These adjustments will help make the manuscript more accessible and its conclusions more compelling.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      • We explicitly mentioned the species used in the referenced studies.

      • We provided a clearer rationale for each experiment in the Results section.

      We have also always clearly stated the species that previous studies used, both in the Introduction and Discussion section.

      Reviewer #3 (Public review):

      Summary:

      This paper explores the phenomenon whereby some V1 neurons can respond to stimuli presented far outside their receptive field. It introduces three possible explanations for this phenomenon and it presents experiments that it argues favor the third explanation, based on figure/ground segregation.

      Strengths:

      I found it useful to see that there are three possible interpretations of this finding (prediction error, interpolation, and figure/ground). I also found it useful to see a comparison with LGN responses and to see that the effect there is not only absent but actually the opposite: stimuli presented far outside the receptive field suppress rather than drive the neurons. Other experiments presented here may also be of interest to the field.

      Weaknesses:

      The paper is not particularly clear. I came out of it rather confused as to which hypotheses were still standing and which hypotheses were ruled out. There are numerous ways to make it clearer.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      ** Recommendations for the Authors:**

      Reviewer #1 (Recommendations for the Authors):

      (1) Given the data is hierarchical with neurons clustered within 6 mice (how many recording sessions per animal?) I would recommend the use of Linear Mixed Effects models. Simply pooling all neurons increases the risk of false alarms.

      To clarify: We used the standard method for analyzing single-unit recordings, by comparing the responses of a population of single neurons between two different conditions. This means that the responses of each single neuron were measured in the different conditions, and the statistics were therefore based on the pairwise differences computed for each neuron separately. This is a common and standard procedure in systems neuroscience, and was also used in the previous studies on this topic (Keller et al., 2020; Kirchberger et al., 2023). We were not concerned with comparing two groups of animals, for which hierarchical analyses are recommended. To address the reviewer’s concern, we did examine whether differences between baseline and the gray/drift condition, as well as the gray/drift compared to the grating condition, were consistent across sessions, which was indeed the case. These findings are presented in Supplementary Figure 6.

      (2) Line 432: “The study utilized three to eight-month-old mice of both genders”. This is confusing, I assume they mean six mice in total, please restate. What about the LGN recordings, were these done in the same mice? Can the authors please clarify how many animals, how many total units, how many included units, how many recording sessions per animal, and whether the same units were recorded in all experiments?

      We have now clarified the information regarding the animals used in the Methods section.

      • We state that “We included female and male mice (C57BL/6), a total of six animals for V1 recordings between three and eight months old. In two of those animals, we recorded simultaneously from LGN and V1.”

      • We state that“For each animal, we recorded around 2-3 sessions from each hemisphere, and we recorded from both hemispheres.”

      • We noted that the number of neurons was not mentioned for each figure caption. We apologize for this omission. We have now added the number for all of the figures and protocols to the revised manuscript. We note that the same neurons were recorded for the different conditions within each protocol, however because a few sessions were short we recorded more units for the grating protocol. Note that we did not make statistical comparisons between protocols.

      (3) I see no histology for confirmation of placement of the electrode in the LGN, how can they be sure they were recording from the LGN? There is also little description of the LGN experiments in the methods.

      For better clarity, we have included a reconstruction of the electrode track from histological sections of one animal post-experiment (Figure S4). The LGN was targeted via stereotactical surgery, and the visual responses in this area are highly distinct. In addition, we used a flash protocol to identify the early-latency responses typical for the LGN, which is described in the Methods section: “A flash stimulus was employed to confirm the locations of LGN at the beginning of the recording sessions, similar to our previous work in which we recorded from LGN and V1 simultaneously (Schneider et al., 2023). This stimulus consisted of a 100 ms white screen and a 2 s gray screen as the inter-stimulus interval, designed to identify visually responsive areas. The responses of multi-unit activity (MUA) to the flash stimulus were extracted and a CSD analysis was then performed on the MUA, sampling every two channels. The resulting CSD profiles were plotted to identify channels corresponding to the LGN. During LGN recordings, simultaneous recordings were made from V1, revealing visually responsive areas interspersed with non-responsive channels.”

      (4) Many statements are not backed up by statistics, for example, each time the authors report that the response at 90degree sign is higher than baseline (Line 121 amongst other places) there is no test to support this. Also Line 140 (negative correlation), Line 145, Line 180.

      For comparison purposes, we only presented statistical analyses across conditions. However, we have now added information to the figure captions stating that all conditions show values higher than the baseline.

      (5) As far as I can see there is no analysis of eye movements or facial movements. This could be an issue, for example, if the onset of the far surround stimuli induces movements this may lead to spurious activations in V1 that would be interpreted as surround-induced responses.

      To address this point, we have included a supplementary figure analyzing facial movements across different sessions and comparing them between conditions (Supplementary Figure 5). A detailed explanation of this analysis has been added to the Methods section. Overall, we observed no significant differences in face movements between trials with gratings, trials with the gray patch, and trials with the gray screen presented during baseline. Animals exhibited similar face movements across all three conditions, supporting the conclusion that the observed neural firing rate increases for the gray-patch condition are not related to face movements.

      (6) The experiments with the rectangular patch (Figure 3) seem to give a slightly different result as the responses for large sizes (75, 90) don’t appear to be above baseline. This condition is also perceptually the least consistent with a grey surface in the RF, the grey patch doesn’t appear to occlude the surface in this condition. I think this is largely consistent with their conclusions and it could merit some discussion in the results/discussion section.

      While the effect is maybe a bit weaker, the total surround stimulated also covers a smaller area because of the large rectangular gray patch. Furthermore, the early responses are clearly elevated above baseline, and the responses up to 70 degrees are still higher than baseline. Hence we think this data point for 90 degrees does not warrant a strong interpretation.

      Minor points:

      (1) Figure 1h: What is the statistical test reported in the panel (I guess a signed rank based on later figures)? Figure 4d doesn’t appear to be significantly different but is reported as so. Perhaps the median can be indicated on the distribution?

      We explained that we used a signed rank test for Figure 1h and now included the median of the distributions in Figure 4d.

      (2) What was the reason for having the gratings only extend to half the x-axis of the screen, rather than being full-screen? This creates a percept (in humans at least) that is more consistent with the grey patch being a hole in the grating as the grey patch has the same luminance as the background outside the grating.

      We explained in the Methods section that “We presented only half of the x-axis due to the large size of our monitor, in order to avoid over-stimulation of the animals with very large grating stimuli.”. Perceptually speaking, the gray patch appears as something occluding the grating, not as a “hole”.

      (3) Line 103: “and, importantly, had less than 10degree sign (absolute) distance to the grating stimulus’ RF center.” Re-phrase, a stimulus doesn’t have an RF center.

      We corrected this to “We included only single units into the analysis that met several criteria in terms of visual responses (see Methods) and, importantly, the RF center had less than 10(absolute) distance to the grating stimulus’ center. ”.

      (4) Line 143: “We recorded single neurons LGN” - should be “single LGN neurons”.

      We corrected this to “we recorded single LGN neurons”.

      (5) Line 200: They could spell out here that the latency is consistent with the latency observed for the grey patch conditions in the previous experiments. (6) Line 465: This is very brief. What criteria did they use for single-unit assignation? Were all units well-isolated or were multi-units included?

      We clarified in the Methods section that “We isolated single units with Kilosort 2.5 (Steinmetz et al., 2021) and manually curated them with Phy2 (Rossant et al., 2021). We included only single units with a maximum contamination of 10 percent.”

      (7) Line 469: “The experiment was run on a Windows 10”. Typo.

      We corrected this to “The experiment was run on Windows 10”.

      (9) Line 481: “We averaged the response over all trials and positions of the screen”. What do they mean by ’positions of the screen’?

      We changed this to “We computed the response for each position separately right, by averaging the response across all the trials where a square was presented at a given position.”

      (9) Line 483: “We fitted an ellipse in the center of the response”. How?

      We additionally explain how we preferred the detection of the RF using an ellipse fitting: “A heatmap of the response was computed. This heatmap was then smoothed, and we calculated the location of the peak response. From the heatmap we calculated the centroid of the response using the function regionprops.m that finds unique objects, we then selected the biggest area detected. Using the centroids provided as output. We then fitted an ellipse centered on this peak response location to the smoothed heatmap using the MATLAB function ellipse.m.“

      (10) Line 485 “...and positioned the stimulus at the response peak previously found”. Unclear wording, do you mean the center of the ellipse fit to the MUA response averaged across channels or something else? (11) Line 487: “We performed a permutation test of the responses inside the RF detected vs a circle from the same area where the screen was gray for the same trials.”. The wording is a bit unclear here, can they clarify what they mean by the ’same trials’, what is being compared to what here?

      We used a permutation test to compare the neuron’s responses to black and white squares inside the RF to the condition where there was no square in the RF (i.e. the RF was covered by the gray background).

      (12) Was the pink noise background regenerated on each trial or as the same noise pattern shown on each trial?

      We explain that “We randomly presented one of two different pink noise images”

      (13) Line 552: “...used a time window of the Gaussian smoothing kernel from-.05 to .05”. Missing units.

      We explained that “we used a time window of the Gaussian smoothing kernel from -.05 s to .05 s, with a standard deviation of 0.0125 s.”

      (14) Line 565: “Additionally, for the occluded stimulus, we included patch sizes of 70 degree sign and larger.”. Not sure what they’re referring to here.

      We changed this to: “For the population analyses, we analyzed the conditions in which the gray patch sizes were 70 degrees and 90 degrees”.

      (15) Line 569: What is perplexity, and how does changing it affect the t-SNE embeddings?

      Note that t-SNE is only used for visualization purposes. In the revised manuscript, we have expanded our explanation regarding the use of t-SNE and the choice of perplexity values. Specifically, we have clarified that we used a perplexity value of 20 for the Gratings with circular and rectangular occluders and 100 for the black-and-white condition. These values were empirically selected to ensure that the groups in the data were clearly separable while maintaining the balance between local and global relationships in the projected space. This choice allowed us to visually distinguish the different groups while preserving the meaningful structure encoded in the dissimilarity matrices. In particular, varying the perplexity values would not alter the conclusions drawn from the visualization, as t-SNE does not affect the underlying analytical steps of our study.

      (16) Line 572: “We trained a C-Support Vector Classifier based on dissimilarity matrices”. This is overly brief, please describe the construction of the dissimilarity matrices and how the training was implemented. Was this binary, multi-class? What conditions were compared exactly?

      In the revised manuscript, we have expanded our explanation regarding the construction of the dissimilarity matrices and the implementation of the C-Support Vector Classification (C-SVC) model (See Methods section).

      The dissimilarity matrices were calculated using the Euclidean distance between firing rate vectors for all pairs of trials (as shown in Figure 6a-b). These matrices were used directly as input for the classifier. It is important to note that t-SNE was not used for classification but only for visualization purposes. The classifier was binary, distinguishing between two classes (e.g., Dr vs St). We trained the model using 60% of the data for training and used 40% for testing. The C-SVC was implemented using sklearn, and the classification score corresponds to the average accuracy across 20 repetitions.

      Reviewer #2 (Recommendations for the Authors):

      The relationship between the current paper and Keller et al. is challenging to understand. It seems like the study is critiquing the previous study but rather implicitly and not directly. I would suggest either directly stating the criticism or presenting the current study as a follow-up investigation that further explores the observed effect or provides an alternative function. Additionally, defining the inverse RF versus surround-induced responses earlier than in the discussion would be beneficial. Some suggestions:

      (1) The introduction is well-written, but it would be helpful to clearly define the hypotheses regarding the function of surround-induced responses and revisit these hypotheses one by one in the results section.

      Indeed, we have generally improved the Introduction of the manuscript, and stated the hypotheses and their relationships to the Experiments more clearly.

      (2) Explicitly mention how you compare classic grating stimuli of varying sizes with gray patch stimuli. Do the patch stimuli all come with a full-field grating? For the full-field grating, you have one size parameter, while for the patch stimuli, you have two (size of the patch and size of the grating).

      We now clearly describe how we compare grating stimuli of varying sizes with gray patch stimuli.

      (3) The third paragraph in the introduction reads more like a discussion and might be better placed there.

      We have moved content from the third paragraph of the Introduction to the Discussion, where it fits more naturally.

      (4) Include 1-2 sentences explaining how you center RFs and detail the resolution of your method.

      We have added an explanation to the Methods: “To center the visual stimuli during the recording session, we averaged the multiunit activity across the responsive channels and positioned the stimulus at the center of the ellipse fit to the MUA response averaged across channels.”.

      (5) Motivate the use of achromatic stimuli. This section is generally quite hard to understand, so try to simplify it.

      We explained better in the Introduction why we performed this particular experiment.

      (6) The decoding analysis is great, but it is somewhat difficult to understand the most important results. Consider summarizing the key findings at the beginning of this section.

      We now provide a clearer motivation at the start of the Decoding section.

      Reviewer #3 (Recommendations for the Authors):

      I have a few suggestions to improve the clarity of the presentation.

      Abstract: it lists a series of observations and it ends with a conclusion (“based on these findings...”). However, it provides little explanation for how this conclusion would arise from the observations. It would be more helpful to introduce the reasoning at the top and show what is consistent with it.

      We have improved the abstract of the paper incorporating this feedback.

      To some extent, this applies to Results too. Sometimes we are shown the results of some experiment just because others have done a similar experiment. Would it be better to tell us which hypotheses it tests and whether the results are consistent with all 3 hypotheses or might rule one or more out? I came out of the paper rather confused as to which hypotheses were still standing and which hypotheses were ruled out.

      We have strongly improved our explanation of the hypotheses and the relationships to the experiments in the Introduction.

      It would be best if the Results section focused on the results of the study, without much emphasis on what previous studies did or did not measure. Here, instead, in the middle of Results we are told multiple times what Keller et al. (2020) did or did not measure, and what they did or did not find. Please focus on the questions and on the results. Where they agree or disagree with previous papers, tell us briefly that this is the case.

      We have revised the Results section in the revised manuscript, and ensured that there is much less focus on what previous studies did in the Results. Differences to previous work are now discussed in the Discussion section.

      The notation is extremely awkward. For instance “Gc” stands for two words (Gray center) but “Gr” stands for a single word (Grating). The double meaning of G is one of many sources of confusion.

      This notation needs to be revised. Here is one way to make it simpler: choose one word for each type of stimulus (e.g. Gray, White, Black, Drift, Stat, Noise) and use it without abbreviations. To indicate the configuration, combine two of those words (e.g. Gray/Drift for Gray in the center and Drift in the surround).

      We have corrected the notation in the figures and text to enhance readability and improve the reader’s understanding.

      Figure 1e and many subsequent ones: it is not clear why the firing rate is shown in a logarithmic scale. Why not show it in a linear scale? Anyway, if the logarithmic scale is preferred for some reason, then please give us ticks at numbers that we can interpret, like 0.1,1,10,100... or 0.5,1,2,4... Also, please use the same y-scale across figures so we can compare.

      To clarify: it is necessary to normalize the firing rates relative to baseline, in order to pool across neurons. However such a divisive normalization would be by itself problematic, as e.g. a change from 1 to 2 is the same as a change from 1 to 0.5, on a linear scale. Furthermore such division is highly outlier sensitive. For this reason taking the logarithm (base 10) of the ratio is an appropriate transformation. We changed the tick labels to 1, 2, 4 like the reviewer suggested.

      Figure 3: it is not clear what “size” refers to in the stimuli where there is no gray center. Is it the horizontal size of the overall stimulus? Some cartoons might help. Or just some words to explain.

      Figure 3: if my understanding of “size” above is correct, the results are remarkable: there is no effect whatsoever of replacing the center stimulus with a gray rectangle. Shouldn’t this be remarked upon?

      We have added a paragraph under figure 3 and in the Methods section explaining that the sizes represent the varying horizontal dimensions of the rectangular patch. In this protocol, the classical condition (i.e. without gray patch) was shown only as full-field gratings, which is depicted in the plot as size 0, indicating no rectangular patch was present.

      DETAILS The word “achromatic” appears many times in the paper and is essentially uninformative (all stimuli in this study are achromatic, including the gratings). It could be removed in most places except a few, where it is actually used to mean “uniform”. In those cases, it should be replaced by “uniform”.

      Ditto for the word “luminous”, which appears twice and has no apparent meaning. Please replace it with “uniform”.

      We have replaced the words achromatic and luminous with “uniform” stimuli to improve the clarity when we refer to only black or white stimuli.

      Page 3, line 70: “We raise some important factors to consider when describing responses to only surround stimulation.” This sentence might belong in the Discussion but not in the middle of a paragraph of Results.

      We removed this sentence.

      Neuropixel - Neuropixels (plural)

      “area LGN” - LGN

      We corrected for misspellings.

      References

      Keller, A.J., Roth, M.M., Scanziani, M., 2020. Feedback generates a second receptive field in neurons of the visual cortex. Nature 582, 545–549. doi:10.1038/s41586-020-2319-4.

      Kirchberger, L., Mukherjee, S., Self, M.W., Roelfsema, P.R., 2023. Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science Advances 9, eadd2498. doi:10. 1126/sciadv.add2498.

      Rossant, C., et al., 2021. phy: Interactive analysis of large-scale electrophysiological data. https://github.com/cortex-lab/phy.

      Schneider, M., Tzanou, A., Uran, C., Vinck, M., 2023. Cell-type-specific propagation of visual flicker. Cell Reports 42.

      Steinmetz, N.A., Aydin, C., Lebedeva, A., Okun, M., Pachitariu, M., Bauza, M., Beau, M., Bhagat, J., B¨ohm, C., Broux, M., Chen, S., Colonell, J., Gardner, R.J., Karsh, B., Kloosterman, F., Kostadinov, D., Mora-Lopez, C., O’Callaghan, J., Park, J., Putzeys, J., Sauerbrei, B., van Daal,R.J.J., Vollan, A.Z., Wang, S., Welkenhuysen, M., Ye, Z., Dudman, J.T., Dutta, B., Hantman, A.W., Harris, K.D., Lee, A.K., Moser, E.I., O’Keefe, J., Renart, A., Svoboda, K., H¨ausser, M., Haesler, S., Carandini, M., Harris, T.D., 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588. doi:10.1126/science.abf4588.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Thank you for the positive assessment of our work and the nice summary of the manuscript!

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

      We agree that experimental manipulation would allow us to extend our work. Unfortunately, this is not possible with wild, endangered gorillas.

      We have now added more references (Watts 1994; Watts 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influence the likelihood to receive aggression.

      We have now clearly stated that reproductive state is an indirect proxy for energetic needs. We agree with your point about energy intake and expenditure, but unfortunately, we do not have data on energy expenditure or caloric intake to allow us to delve into more fine-grained analyses.

      Overall, we have tried to enrich the justification and empirical support to strengthen our conclusions by clarifying the text and adding more examples and references.

      Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      We did not use group size as a main predictor, as has been commonly done in other species, because of potentially conflating opposing effects of males and females. To further stress this point, we have specifically added in the introduction: “group size, the overall number of individuals in the group, might not be a good predictor of aggression heuristics, as it can conflate the effects of different kinds of individuals on aggression (see Smit & Robbins 2024 for an example of opposing effects of the number of females and number of males on female gorilla aggression).”

      We also “ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, [and] its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      We rephrased accordingly: “We used all avoidance and displacement interactions throughout the study period and we used the function elo.seq from R package EloRating to infer daily individual female Elo-scores”. We also clarified that “This method takes into account the temporal sequence of interactions and updates an individual’s Elo-scores each day the individual interacted with another...”

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      We have shown in Reference 25 (Smit & Robbins 2025) after Reference 11 (Smit & Robbins 2024) that females form highly stable hierarchies, and that dyadic dominance relationships are not influenced by dispersal or death of third individuals. Notably, new immigrant females usually start at and remain low ranking, without large fluctuations in rank. Therefore, the presence of any fluctuation periods have limited influence in the aggressive interactions in our study system.

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      In fact, the females involved in the present analysis and the analysis of Smit & Robbins 2025 are the same. Our present analysis is based on the hierarchies of Smit & Robbins 2025. Note that female gorillas disperse and occasionally immigrate to another study group. This is why some females may appear in the hierarchies of more than one group, giving the impression that there are more females involved in the analysis of Smit & Robbins 2025 (e.g. by counting the lines in the Elo-rating plots). We now specifically state that “We present these interactions and hierarchies in detail in Smit & Robbins 2025”, to clarify that the hierarchies are the same.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

      Thank you for pointing this out. First, when we considered one pregnancy stage, pregnant females showed indeed a significantly greater interaction score than females in other reproductive stages. We have now included that in the manuscript. However, we still find relevant to test for the different stages of pregnancy, given the difference of energetic needs in these stages. We have now included the pairwise comparisons in a new table (Table 2).

      Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically:

      • ATA: 6 years

      • BIT: 8 years

      • KYA: 18 years

      • MUK: 6 years

      • ORU: 8 years

      I recommend that the authors clarify how the 25-year duration was derived.

      Indeed none of the five study “groups” has been studied for 25 years in a row. However, MUK emerged from a fission of group KYA in early 2016. So, from the start of group KYA in October 1998 to the end of group MUK in December 2023, there are 25 years and 2 months. We have now rephrased to “...starting in 1998 in one of the mountain gorilla groups” in the introduction, and to “We use a long-term behavioural dataset on five wild groups of the two gorilla species, starting in 1998” in the abstract.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      We have now added the suggested extra test: “When we ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      Regarding species differences: In our analysis, we test for species (mountain vs western) and we find no significant differences between the two. This is stated in the results.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      We now added short explanations into brackets for behaviours that are not obvious. We also added a sentence in the text to clarify the difference with the behaviours used to calculate Elo scores: “These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

      The sentence we added above (“These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”) and the first paragraph of the results hopefully clarify that ritualized agonistic interactions are generally directionally consistent and more reliably capture the highly stable dominance relationships of female gorillas. This approach has been used to calculate dominance rank in gorillas in all studies that have considered it, dating back to the 1970s (namely in studies by Harcourt and Watts). On the other hand, aggression can be context dependent (we now clearly note that in the beginning of the Methods paragraph on aggressive interactions). Therefore, we use Eloscores inferred from ritualized interactions as base and a reliable proxy of power relationships; then we test if the direction of aggression within these relationships is driven also by energetic needs or the social environment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. It is clear from the reviews that we’ve had the privilege to have our work extensively and thoroughly checked by knowledgeable experts, for which we are very grateful. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have included detailed responses below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: Yield data has been collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases this resulted in variations in how data were collected or processed (e.g. taxonomic level of species identification). We have added more details to the sections on sampling design and data analysis to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This gives better insight in the variation of responses among ground beetle taxa.

      (4) Restrict findings to our system: We nuanced our findings further and focused more on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      Below we also respond to the editor and reviewers in more detail.

      Reviewing Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or cite now (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We revised the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We revised the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we included a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analyses, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were often made up of several species). We illustrate these analyses still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our reasoning about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and have carefully checked the text to avoid overstatements.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight variations in how data were collected or processed (e.g. taxonomic level of species identification).

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia & van Apeldoorn, 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data (line 203-207, 217-221).

      Reviwer #1 (Recommendations for the authors):

      This is a well-written study on the effects of strip cropping on ground-beetle diversity. As stated above the study is well analyzed, presented, and written but you should not pretend that you analyzed yield e.g. lines 25-27 "We show that strip cropping...enhance ground beetle biodiversity without incurring major yield loss.

      We understand the confusion caused by this sentence, and it was never our intention to give the impression that we analyzed yield losses. These findings were based on previous research by ourselves and colleagues, and we have now changed the sentence to reflect this (line 25-27).

      I think you assume that yield does not differ between strip cropping and monoculture. I am not sure this is correct as one crop might attract pests or predators spilling over to the other crop. I am also not sure if the sowing and harvest of the crop will come with the same costs. So if you assume this, you should only do it in the main manuscript and not the abstract, to justify this better.

      With three peer-reviewed papers on the same fields as we studied, we can convincingly state that strip cropping in organic agriculture generally does not result in major yield loss, although exceptions exist, which we refer to in the discussion.

      In the introduction lines 28-43, you refer to insect biomass decline. I wonder if you would like to add the study of Loboda et al. 2017 in Ecography. It seems not fitting as it is from the Artic but also the other studies you cite are not only coming from agricultural landscapes and this study is from the same time as the Hallmann et al. 2017 study and shows a decline in flies of 80%

      We have removed the sentence that this comment refers to, to streamline the introduction more.

      Lines 50-51. You only have one citation for biodiversity strategies in agricultural systems. I suggest citing Mupepele et al. 2021 in TREE. This study refers to management but also the policies and societal pressures behind it.

      We have added this citation and a recent paper by Cozim-Melges et al. (2024) here (line 49-52).

      In the methods, I am missing a section on species identifications. This would help to understand why you used "taxonomic richness".

      Thanks for pointing this out. We have now included a new section on ground beetle identification (line 304-309 in methods).

      Figure 1 is great and I like that you separated the field and crop-level data, although there is no statistical power for the crop-specific data. I personally would move k to the supplements. It is very detailed and small and therefore hard to read

      We chose to keep figure 1k, as in our view it gives a good impression of the scale of the experiment, the number of crops included and the absolute numbers of caught species.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      We did not intend to investigate the effect of strip cropping on crop yields, but rather place our work in the framework of earlier studies that already studied yield. All the monocultures and strip cropping fields were organic farms. Our findings thus compare crop diversity effects within the context of organic farming.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we have explained this more clearly (line 297-300).

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields. We present a more detailed description of the sampling design in the methods of the revised manuscript (line 294-297).

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      To be clear, this limitation relates to the comparison that we did for the community compositions of ground beetles in two crops either in strip cropping or monocultures. In this case, it was impossible to avoid potential autocorrelation due to our field design. We also acknowledge this limitation in the results section (line 130-133). However, for our other analyses we corrected for spatial autocorrelation by including variables per location, year and crop. This grouped samples that were spatially autocorrelated. Therefore, we don’t see this as a discrepancy of our other analyses.

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      We agree and acknowledge that crop type can influence carabid richness and density, which is why we have included variables to account for differences caused by crops. However, we did not observe consistent differences between crops in how strip cropping affected ground beetle richness and density. Therefore, we don’t think that crop types would have influenced our conclusions on the overall effect of strip cropping.

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We have clarified this further in the revised manuscript (line 294-301, 318-322). As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. This approach allowed us to get a good impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we further clarified this point (line 365-366, 373-374).

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we nuanced our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields (line 33-35).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points stated under 'Weaknesses' above, I provide smaller comments and recommendations:

      Overall comments:

      (i) The carabid images used in the figures were created by Ortwin Bleich and are copyrighted. I could not find him accredited in the acknowledgements; the figure legends simply state that the images were taken from his webpage. Was his permission obtained? This should be stated.

      We have received written permission from Ortwin Bleich for using his pictures in our figures, and have accredited him for this in the acknowledgements (line 455-456).

      (ii) There is a great confusion in the field concerning terminology. The authors here use intercropping and strip cropping, a specific form of intercropping, interchangeably. I advise the authors to stick to strip cropping as it is more precise and avoids confusion with other forms of intercropping.

      We agree with the definitions given by reviewer 2 and had already used them as such in the text. We defined strip cropping in the first paragraph of the introduction and do not use the term “intercropping” after this definition to avoid confusion.

      Comments to specific lines:

      Line 19: While this is likely true, there is so far not enough compelling evidence for such a strong statement blaming agriculture. Please rephrase.

      Changed the sentence to indicate more clearly that it is one of the major drivers, but that the “blame” is not solely on agriculture (line 18-19).

      Line 22: Is this the case? I am aware of strip cropping being used in other countries, many of them in Europe. Why the focus on 'Dutch'?

      Indeed, strip cropping is now being pioneered by farmers throughout Europe. However in the Netherlands, some farmers have been pioneering strip cropping already since 2014. We have added this information to indicate that our setting is in the Netherlands, and as in our opinion it gives a bit more context to our manuscript.

      Line 24: I would argue that carabids are actually not good indicators for overall biodiversity in crop fields as they respond in a very specific way, contrasting with other taxa. It is commonly observed that carabids prefer more disturbed habitats and richness often increases with management intensity and in more agriculturally dominated landscapes - in stark contrast to other taxa like wild bees or butterflies.

      We have reworded this sentence to reflect that they are not necessarily indicators of wide agricultural biodiversity, but that they do hold keystone positions within food webs in agricultural systems (line 23-25).

      Line 31: This statement here is also too strong - carabids are not overall biodiversity and patterns found for carabids likely differ strongly from patterns that would be observed in other taxa. This study is on carabids and the conclusion should thus also refer to these in order to avoid such over-simplified generalizations.

      We agree and have nuanced this sentence to indicate that our findings are only on ground beetles (line 33-35). However, we would like to point out that the statement that “patterns found for carabids likely differ strongly from patterns that would be observed in other taxa” assumes a disassociation between carabids and other taxa.

      Line 41: I am sure the authors are aware of the various methodological shortcomings of the dataset used in Hallmann et al. (2017) which likely led to an overestimation of the actual decline. Analysing the same data, Müller et al. (2023) found that weather can explain fluctuations in biomass just as well as time. I thus advise not putting too much focus on these results here as they seem questionable.

      We have removed this sentence to streamline the introduction, thus no longer mentioning the percentages given by Hallmann et al. (2017).

      Line 46: Surely likely but to my knowledge this is actually remarkably hard to prove. Instead of using the IPBES report here that simply states this as a fact, it would be better to see some actual evidence referenced.

      We removed IPBES as a source and changed this for Dirzo et al. (2014), a review that shows the consequences of biodiversity decline on a range of different ecosystem services and ecological functions (line 45-47).

      Line 52ff: I am not sure whether this old land-sparing vs. land-sharing debate is necessary here. The authors could simply skip it and directly refer to the need of agricultural areas, the dominating land-use in many regions, to become more biodiversity-friendly. It can be linked directly to Line 61 in my opinion which would result in a more concise and arguably stronger introduction.

      After reconsidering, we agree with reviewer 2 that this section was redundant and we have removed the lines on land-sparing vs land-sharing.

      Line 59: Just a note here: this argument is not meaningful when talking about strip cropping in the Netherlands as there is virtually no land left that could be converted (if anything, agricultural land is lost to construction). The debate on land-use change towards agriculture is nowadays mostly focused on the tropics and the Global South.

      We argue that strip cropping could play an important role as a measure that does not necessarily follow the trade-off between biodiversity and agriculture for a context beyond the Netherlands (line 52-58).

      Line 69: Does this statement really need 8 references?

      Line 71: ... and this one 5 additional ones?

      We have removed excess references in these two lines (line 62-66).

      Line 74: But also likely provides the necessary crop continuity for many crop pests - the authors should keep in mind that when practitioners read agricultural biodiversity, they predominantly think of weeds and insect pests.

      We agree with reviewer 2 that agricultural biodiversity is still a controversial topic. However, as the focus in this manuscript is more on biodiversity conservation, rather than pest management, we prefer to keep this sentence as is. In other published papers and future work we focus more on the role of strip cropping for pest management.

      Line 83: Consider replacing 'moments' maybe - phenological stages or development stages?

      Although we understand the point of reviewer 2, we prefer to keep it at moments, as we did not focus on phenological stages and we only wanted to say that we set pitfall traps at several moments throughout the year. However, by placing the pitfall traps at several moments throughout the year, we did capture several phenological stages.

      Line 86: Not only farming practices - there are also massive fluctuations between years in the same crop with the same management due to effects of the weather in the previous reproductive season. Interpreting carabid assemblage changes is therefore not straightforward.

      We absolutely agree that interpreting carabid assemblage is not straightforward, but as we did not study year or crop legacy effects we chose to keep this sentence to maintain focus on our research goals.

      Line 88: 'ecolocal'?

      Typo, should have been ecological. Changed (line 81).

      Line 90: 'As such, they are often used as indicator group for wider insect diversity in agroecosystems' - this is the third repetition of this statement and the second one in this paragraph - please remove. Having worked on carabids extensively myself, I also think that this is not the true reason - they are simply easy to collect passively.

      We agree with the reviewer and have removed this sentence.

      Line 141: I have doubts about the value of the ISA looking at the results. Anchomenus dorsalis is a species extremely common in cereal monoculture fields in large parts of Europe, especially in warmer and drier conditions (H. griseus was likely only returned as it is generally rare and likely only occurred in few plots that, by chance, were strip-cropped). It can hardly be considered an indicator for diverse cropping systems but it was returned as one here (which I do not doubt). This often happens with ISA in my experience as they are very sensitive to the specific context of the data they are run on. The returned species are, however, often not really useable as indicators in other contexts. I thus believe they actually have very limited value. Apart from this, we see here that both monocultures and strip cropping have their indicators, as would likely all crop types. I wonder what message we would draw from this ...

      On close reconsideration, we agree with the reviewer that the ISAs might have been too sensitive to rare species that by chance occur in one of two crop configurations. To still get an idea on what happens with specific ground beetle groups, we chose to replace the ISAs with analyses on the 12 most common ground beetle genera. For this purpose we have added new sections to the methods (line 368-374) and results (line 135-143), replaced figure 2 and table S5, and updated the discussion (line 182-200).

      Line 165: Carabid activity is high when carabids are more active. Carabids can be more active either when (i) there are simply more carabid individuals or /and (ii) when they are starved and need to search more for prey. More carabid activity does thus not necessarily indicate more individuals, it can indicate that there is less prey. This aspect is missing here and should be discussed. It is also not true that crop diversification always increases prey biomass - especially strip cropping has previously been shown to decrease pest densities (Alarcón-Segura et al., 2022). Of course, this is a chicken-egg problem (less pests => less carabids or more carabids => less pests ?) ... this should at least be discussed.

      We have rewritten this paragraph to further discuss activity density in relation to food availability (line 175-185).

      Line 178: These species are not exclusively granivorous - this speculation may be too strong here.

      Line 185: true for all but C. melanocephalus - this species is usually more associated with hedgerows, forests etc.

      After removing the ISA’s, we also chose to remove this paragraph and replace it with a paragraph that is linked to the analyses on the 12 most common genera (line 182-200).

      Line 202: These statements are too strong for my taste - the authors should add an 'on average' here. The data show that they likely do not always enhance richness by 15 % and as the authors state, some monocultures still had higher richness and densities.

      “on average” added (line 211)

      Line 203: 'can lead' - the authors cannot tell based on their results if this is always true for all taxa.

      Changed to “can lead” (line 213)

      Line 205: What is 'diversification' here?

      This concerns measures like hedgerows or flower strips. We altered the sentence to make this clearer (line 215-216).

      Line 208: Does this statement need 5 references? (as in the introduction, the reader gets the impression the authors aimed to increase the citation count of other articles here).

      We have removed excess references (line 219-221).

      Line 222: How many are 'a few'? Maybe state a proportion.

      We only found two species, we’ve changed the sentence accordingly (line 232-233).

      Line 224: As stated above, I would not overstress the results of the ISAs - the authors stated themselves that the result for A. dorsalis is likely only based on one site ...

      We removed this sentence after removing the ISAs.

      Line 305: I think there is an additional nested random level missing - the transect or individual plot the traps were located in (or was there only one replicate for each crop/strip in each experiment)? Hard to tell as the authors provide no information on the actual sample sizes.

      Indeed, there was one field or plot per cropping system per crop per location per year from which all the samples were taken. Therefore the analysis does not miss a nested random level. We provided information on sample sizes in Table S7.

      Line 314ff: The authors describe that they basically followed a (slightly extended) Chao-Hill approach (species richness, Shannon entropy & inverse Simpson) without the sampling effort / sample completeness standardization implemented in this approach and as a reader I wonder why they did not simply just use the customary Chao-Hill approach.

      We were not aware of the Chao-Hill approach, and we see it as a compliment that we independently came up with an approach similar to a now accepted approach.

      Line 329: Unclear what was nested in what here - location / year / crop or year / location / crop ?

      For the crop-level analyses, the nested structure was location > year > crop. This nested structure was chosen as every location was sampled across different years and (for some locations) the crops differed among years. However, as we pooled the samples from the same field in the field-level analyses, using the same random structure would have resulted in each individual sampling unit being distinguished as a group. Therefore, the random structure here was only location > year. We explain this now more clearly in lines 329 and 355-357.

      Line 334: I can see why the authors used these distributions but it is presented here without any justification. As a side note: Gamma (with log link) would likely be better for the Shannon model as well (I guess it cannot be 0 or negative ...).

      We explain this now better in lines 360-364.

      Line 341: Why Hellinger and not simply proportions?

      We used Hellinger transformation to give more weight to rarer species. Our pitfall traps were often dominated by large numbers of a few very abundant / active species. If we had used proportions, these species would have dominated the community analyses. We clarified this in the text (line 379-381).

      Line 348: An RDA is constrained by the assumptions / model the authors proposed and "forces" the data into a spatial ordination that resembles this model best. As the authors previously used an unconstrained PERMANOVA, it would be better to also use an NMDS that goes along with the PERMANOVA.

      The initial goal of the RDA was not to directly visualize the results of the PERMANOVA, but to show whether an overall crop configuration effect occurred, both for the whole dataset and per location. We have now added NMDS figures to link them to the PERMANOVA and added these to the supplementary figures (fig S6-S8). We also mention this approach in the methods section (line 387-390).

      Line 355f: This is also a clear indication of the strong annual fluctuations in carabid assemblages as mentioned above.

      Indeed.

      Line 361: 'pairwise'.

      Typo, we changed this.

      Line 362: reference missing.

      Reference added (line 405)

      References

      Alarcón-Segura, V., Grass, I., Breustedt, G., Rohlfs, M., Tscharntke, T., 2022. Strip intercropping of wheat and oilseed rape enhances biodiversity and biological pest control in a conventionally managed farm scenario. J. Appl. Ecol. 59, 1513-1523.

      Boetzl, F.A., Sponsler, D., Albrecht, M., Batáry, P., Birkhofer, K., Knapp, M., Krauss, J., Maas, B., Martin, E.A., Sirami, C., Sutter, L., Bertrand, C., Baillod, A.B., Bota, G., Bretagnolle, V., Brotons, L., Frank, T., Fusser, M., Giralt, D., González, E., Hof, A.R., Luka, H., Marrec, R., Nash, M.A., Ng, K., Plantegenest, M., Poulin, B., Siriwardena, G.M., Tscharntke, T., Tschumi, M., Vialatte, A., Van Vooren, L., Zubair-Anjum, M., Entling, M.H., Steffan-Dewenter, I., Schirmel, J., 2024. Distance functions of carabids in crop fields depend on functional traits, crop type and adjacent habitat: a synthesis. Proceedings of the Royal Society B: Biological Sciences 291, 20232383.

      Hallmann, C.A., Sorg, M., Jongejans, E., Siepel, H., Hofland, N., Schwan, H., Stenmans, W., Müller, A., Sumser, H., Hörren, T., Goulson, D., de Kroon, H., 2017. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS One 12, e0185809.

      Knapp, M., Seidl, M., Knappová, J., Macek, M., Saska, P., 2019. Temporal changes in the spatial distribution of carabid beetles around arable field-woodlot boundaries. Scientific Reports 9, 8967.

      Müller, J., Hothorn, T., Yuan, Y., Seibold, S., Mitesser, O., Rothacher, J., Freund, J., Wild, C., Wolz, M., Menzel, A., 2023. Weather explains the decline and rise of insect biomass over 34 years. Nature.

      Toivonen, M., Huusela, E., Hyvönen, T., Marjamäki, P., Järvinen, A., Kuussaari, M., 2022. Effects of crop type and production method on arable biodiversity in boreal farmland. Agriculture, Ecosystems & Environment 337, 108061.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we nuanced the text in the revised version to reflect this. See below for more details on how we revised the manuscript to reflect this point.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on the same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission (line 161-164).

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We revised the methods section to better explain this (line 318-322). Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. This approach allowed us to make a better estimation whenever more samples were available, as we did not always have an equal number of samples available between both cropping systems. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture fields, and we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive to changes in common species. Therefore, we have replaced the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles in the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species (line 135-143, 182-200 in discussion). Furthermore, we have added information on rarity and habitat preference to the table that shows species abundances per location (Table S2), and mention these aspects briefly in the results (line 145-153).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion (line 160-162).

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We revised the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and clarified this research direction clearer in the introduction of the revised manuscript (line 66-72).

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we further explored the species/genera that respond to cropping systems and discuss these findings in more detail in the revised manuscript (line 182-200 in discussion).

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we added a column for rarity (according to waarneming.nl) in the table showing abundances of species per location (table S2). We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We discuss this more in depth in the revised results (line 145-153).

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We nuanced the implications of our study in the revised manuscript (line 30-35, 232-237).

      Reviewer #3 (Recommendations for the authors):

      General comments:

      (1) I am missing the R script and data files in the manuscript. This is a serious drawback in assessing the quality of the work.

      Datasets and R scripts will be made available upon completion of the manuscript.

      (2) I have doubts about the clarity of the title. It more or less states that strip cropping is designed in order to maintain productivity. However, the main objective of strip cropping is to achieve ecological goals without losing productivity. I suggest a rethink of the title and what it is the authors want to convey.

      As the title lead to false expectations for multiple reviewers regarding analyses on yield, we chose to alter the title and removed any mention of yield in the title.

      (3) Line 22: I would add something along the lines of: "As an alternative to intercropping, strip cropping is pioneerd by Dutch farmers... " This makes the distinction and the connection between the two more clear.

      In our opinion, strip cropping is a form of intercropping. We have changed this sentence to reflect this point better. (line 21-22)

      (4) Line 24: "these" should read "they"

      After changing this sentence, this typo is no longer there (line 24).

      (5) Line 34-48. I think this introduction is too long. The paper is not directly about insect decline, so the authors could consider starting with line 43 and summarising 34-42 in one or two sentences.

      Removed a sentence on insect declines here to make the introduction more streamlined.

      (6) Line 51-59. I am not convinced the land sparing - land sharing idea adds anything to the paper. It is not used in the discussion and solicits much discussion in and of itself unnecessary in this paper. The point the authors want to make is not arable fields compared to natural biodiversity, but with increases in biodiversity in an already heavily degraded ecosystem; intensive agriculture. I think the introduction should focus on that narrative, instead of the land sparing-sharing dichotomy, especially because too little attention is spent on this narrative.

      We removed the section on land-sparing vs land-sharing as it was indeed off-topic.

      (7) Line 85. Dynamics is not correctly used here. It should read Ground beetle communities are sensitive.

      Changed accordingly (line 78-79).

      (8) Line 90-91. Here, it should be added that ground beetles are used as indicators for ground-dwelling insect diversity, not wider insect diversity in agricultural systems. In fact, Gerlach et al., the reference included, clearly warn against using indicator groups in a context that is too wide for a single indicator group to cover and Van Klink (2022) has recently shown in a meta-analysis that the correlation between trends in insect groups is often rather poor.

      We removed the sentence that claimed ground beetles to be indicators of general biodiversity, and have focused the text in general more on ground beetle biodiversity, rather than general biodiversity.

      (9) Line 178: was there a high weed abundance measured in the stripcropping fields? Or has there been reports on higher weed abundance in general? The references provided do not appear to support this claim.

      To our knowledge, there is only one paper on the effect of strip cropping on weeds (Ditzler et al., 2023). This paper shows strip cropping (and more diverse cropping systems) reduce weed cover, but increase weed richness and diversity. We mistakenly mentioned that crop diversification increases weed seed biomass, but have changed this accordingly to weed seed richness. The paper from Carbonne et al. (2022) indeed doesn’t show an effect of crop diversification on weeds. However, it does show a positive relation between weed seed richness and ground beetle activity density. We have moved this citation to the right place in the sentence (line 172-175).

      (10) Line 279-288. The description of sampling with pitfalls is inadequate. Please follow the guidelines for properly incorporating sufficient detail on pitfall sampling protocols as described in Brown & Matthews 2016,

      We were sadly not aware of this paper prior to the experiments, but have at least added information on all characteristics of the pitfall traps as mentioned in the paper (line 290-294).

      (11) Lines 307-310. What reasoning lies behind the choice to focus on the most beetle-rich monocultures? Do the authors have references for this way of comparing treatments? Is there much variation in the monocultures that solicits this approach? It would be preferable if the authors could elaborate on why this method is used, provide references that it is a generally accepted statistical technique and provide additional assesments of the variation in the data so it can be properly related to more familiar exploratory data analysis techniques.

      We ran two analyses for the field-level richness and abundance. First we used all combinations of monocultures and strip cropping. However, as strip cropping is made up of (at least) 2 crops, we had 2 constituent monocultures. As we would count a comparison with the same strip cropped field twice when we included both monocultures, we also chose to run the analyses again with only those monocultures that had the highest richness and abundance. This choice was done to get a conservative estimate of ground beetle richness increases through strip cropping. We explained this methodology further in the statistical analysis section (line 329-335).

      In Figure S6 the order of crop combinations is altered between 2021 on the left and 2022 on the right. This is not helpful to discover any possible patterns.

      We originally chose this order as it represented also the crop rotations, but it is indeed not helpful without that context. Therefore, we chose to change the order to have the same crop combinations within the rows.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Why was V1 separated from the rest of the visual cortex, and why the rest of the areas were simply lumped into an EVC ROI? It would be helpful to understand the separation into ROIs.

      We thank the reviewer for raising the concerns regarding the definition of ROI. Our approach to analyze V1 separately was based on two key considerations. First, previous studies consistently identify V1 as the main locus of sensory-like templates during featurespecific preparatory attention (Kok et al., 2014; Aitken et al., 2020). Second, V1 shows the strongest orientation selectivity within the visual hierarchy (Priebe, 2016). In contrast, the extrastriate visual cortex (EVC; comprising V2, V2, V3AB and V4) demonstrates broader selectivity, such as complex features like contour and texture (Grill-Spector & Malach, 2004). Thus, we think it would be particularly informative to analyze V1 data separately as our experiment examines orientation-based attention. We should also note that we conducted MVPA separately for each visual ROIs (V2, V3, V3AB and V4). After observing similar patterns of results across these regions, we averaged the decoding accuracies into a single value and labeled it as EVC. This approach allowed us to simplify data presentation while preserving the overall data pattern in decoding performance. We now added the related explanations on the ROI definition in the revised texts (Page 26; Line 576-581).

      (2) It would have been helpful to have a behavioral measure of the "attended" orientation to show that participants in fact attended to a particular orientation and were faster in the cued condition. The cue here was 100% valid, so no such behavioral measure of attention is available here.

      We thank the reviewer for the comments. We agree that including valid and neutral cue trials would have provided valuable behavioral measures of attention; Yet, our current design was aimed at maximizing the number of trials for decoding analysis due to fMRI time constraints. Thus, we could not fit additional conditions to measure the behavioral effects of attention. However, we note that in our previous studies using a similar feature cueing paradigm, we observed benefits of attentional cueing on behavioral performance when comparing valid and neutral conditions (Liu et al., 2007; Jigo et al., 2018). Furthermore, our neural data indeed demonstrated attention-related modulation (as indicated by MVPA results, Fig. 2 in the main texts) so we are confident that on average participants followed the instruction and deployed their attention accordingly. We now added the related explanations on this point in the revised texts (Page 23; Line 492-498).

      (3) As I was reading the manuscript I kept thinking that the word attention in this manuscript can be easily replaced with visual working memory. Have the authors considered what it is about their task or cognitive demand that makes this investigation about attention or working memory?

      We thank the reviewer for this comment. We added the following extensive discussion on this point in the revised texts (Page 18; Line 363-381).

      “It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance. While these functions are intuitively similar and likely overlap, there is also evidence indicating that they can be dissociated (Battistoni et al., 2017). In particular, we note that in our task, attention is guided by symbolic cues (color-orientation associations), while working memory tasks typically present the actual visual stimulus as the memorandum. A central finding in working memory studies is that neural signals during WM maintenance are sensory in nature, as demonstrated by generalizable neural activity patterns from stimulus encoding to maintenance in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019). However, in our task, neural signals during preparation were nonsensory, as demonstrated by a lack of such generalization in the No-Ping session (see also Gong et al., 2022). We believe that the differences in cue format and task demand in these studies may account for such differences. In addition to the difference in the sensory nature of the preparatory versus delay-period activity, our ping-related results also exhibited divergence from working memory studies (Wolff et al., 2017; 2020). While these studies used the visual impulse to differentiate active and latent representations of different items (e.g., attended vs. unattended memory item), our study demonstrated the active and latent representations of a single item in different formats (i.e., non-sensory vs. sensory-like). Moreover, unlike our study, the impulse did not evoke sensory-like neural patterns during memory retention (Wolff et al., 2017). These observations suggest that the cognitive and neural processes underlying preparatory attention and working memory maintenance could very well diverge. Future studies are necessary to delineate the relationship between these functions both at the behavioral and neural level.”

      (4) If I understand correctly, the only ROI that showed a significant difference for the crosstask generalization is V1. Was it predicted that only V1 would have two functional states? It should also be made clear that the only difference where the two states differ is V1.

      We thank the reviewer for this comment. We would like to clarify that our analyses revealed similar patterns of preparatory attentional representations in V1 and EVC. During the Ping session, the cross-task generalization analyses revealed decodable information in both V1 and EVC (ps < 0.001), significantly higher than that in the No-Ping session for V1 (independent t-test: t(38) = 3.145, p = 0.003; Cohen’s d = 0.995) and EVC (independent t-test: t(38) = 2.153, p = 0.038, Cohen’s d = 0.681) (Page 10; Line 194-196). While both areas maintained similar representations, additional measures (Mahalanobis distance, neural-behavior relationship and connectivity changes) showed more robust ping-evoked changes in V1 compared to EVC. This differential pattern likely reflects the primary role of V1 in orientation processing, with EVC showing a similar but weaker response profile. We have revised the text to clarity this point (Page 16; Line 327-329).

      (5) My primary concern about the interpretation of the finding is that the result, differences in cross-task decoding within V1 between the ping and no-ping condition might simply be explained by the fact that the ping condition refocuses attention during the long delay thus "resharpening" the template. In the no-ping condition during the 5.5 to 7.5 seconds long delay, attention for orientation might start getting less "crisp." In the ping condition, however, the ping itself might simply serve to refocus attention. So, the result is not showing the difference between the latent and non-latent stages, rather it is the difference between a decaying template representation and a representation during the refocused attentional state. It is important to address this point. Would a simple tone during the delay do the same? If so, the interpretation of the results will be different.

      We thank the reviewer for this comment. The reviewer proposed an alternative account suggesting that visual pings may function to refocus attention, rather than reactivate latent information during the preparatory period. If this account holds (i.e., attention became weaker in the no-ping condition and it was strengthened by the ping due to re-focusing), we would expect to observe a general enhancement of attentional decoding during the preparatory period. However, our data reveal no significant differences in overall attention decoding between two conditions during this period (ps > 0.519; BF<sub>excl</sub> > 3.247), arguing against such a possibility.

      The reviewer also raised an interesting question about whether an auditory tone during preparation could produce effects similar to those observed with visual pings. Although our study did not directly test this possibility, existing literature provides some relevant evidence. In particular, prior studies have shown that latent visual working memory contents are selectively reactivated by visual impulses, but not by auditory stimuli (Wolff et al., 2020). This finding supports the modality-specificity for visually encoded contents, suggesting that sensory impulses must match the representational domain to effectively access latent visual information, which also argues against the refocusing hypothesis above. However, we do think that this is an important question that merits direct investigation in future studies. We now added the related discussion on this point in the revised texts (Page 10, Line 202-203; Page 19, Line 392395).

      (6) The neural pattern distances measured using Mahalanobis values are really great! Have the authors tried to use all of the data, rather than the high AMI and low AMI to possibly show a linear relationship between response times and AMI?

      We thank the reviewer for this comment. We took the reviewer’s suggestion to explore the relationship between attentional modulation index (AMI) and RTs across participants for each session (see Figure 3). In the No-Ping session, we observed no significant correlation between AMI and RT (r = -0.366, p = 0.113). By contrast, the same analysis in the Ping condition revealed a significantly negative correlation (r = -0.518, p = 0.019). These results indicate that the attentional modulations evoked by visual impulse was associated with faster RTs, supporting the functional relevance of activating sensory-like representations during preparation. We have now included these inter-subject correlations in the main texts (Page 13, Line 258-264; Fig 3D and 3E) along with within-subject correlations in the Supplementary Information (Page 6, Line, 85-98; S3 Fig).

      (7) After reading the whole manuscript I still don't understand what the authors think the ping is actually doing, mechanistically. I would have liked a more thorough discussion, rather than referencing previous papers (all by the co-author).

      We thank the reviewer for this comment regarding the mechanistic basis of visual pings. We agree that this warrants deeper discussion. One possibility, as informed by theoretical studies of working memory, is that the sensory-like template could be maintained via an “activity-silent” mechanism through short-term changes in synaptic weights (Mongillo et al., 2008). In this framework, a visual impulse may function as nonspecific inputs that momentarily convert latent traces into detectable activity patterns (Rademaker & Serences, 2017). Related to our findings, it is unlikely that the orientation-specific templates observed during the Ping session emerged from purely non-sensory representations and were entirely induced by an exogenous ping, which was devoid of any orientation signal. Instead, the more parsimonious explanation is that visual impulse reactivated pre-existing latent sensory signals. To our knowledge, the detailed circuit-level mechanism of such reactivation is still unclear; existing evidence only suggests a relationship between ping-evoked inputs and the neural output (Wolff et al., 2017; Fan et al., 2021; Duncan et al., 2023). We now included the discussion on this point in the main texts (Page 19, Line 383-401).

      Reviewer #2 (Public review):

      (1) The origin of the latent sensory-like representation. By 'pinging' the neural activity with a high-contrast, task-irrelevant visual stimulus during the preparation period, the authors identified the representation of the attentional feature target that contains the same information as perceptual representations. The authors interpreted this finding as a 'sensory-like' template is inherently hosted in a latent form in the visual system, which is revealed by the pinging impulse. However, I am not sure whether such a sensory-like template is essentially created, rather than revealed, by the pinging impulses. First, unlike the classical employment of the pinging technique in working memory studies, the (latent) representation of the memoranda during the maintenance period is undisputed because participants could not have performed well in the subsequent memory test otherwise. However, this appears not to be the case in the present study. As shown in Figure 1C, there was no significant difference in behavioral performance between the ping and the no-ping sessions (see also lines 110-125, pg. 5-6). In other words, it seems to me that the subsequent attentional task performance does not necessarily rely on the generation of such sensory-like representations in the preparatory period and that the emergence of such sensory-like representations does not facilitate subsequent attentional performance either. In such a case, one might wonder whether such sensory-like templates are really created, hosted, and eventually utilized during the attentional process. Second, because the reference orientations (i.e. 45 degrees and 135 degrees) have remained unchanged throughout the experiment, it is highly possible that participants implicitly memorized these two orientations as they completed more and more trials. In such a case, one might wonder whether the 'sensory-like' templates are essentially latent working memory representations activated by the pinging as was reported in Wolff et al. (2017), rather than a functional signature of the attentional process.

      We thank the reviewer for this comment. We agree that the question of whether the sensory-like template is created or merely revealed by visual pinging is crucial for the understanding our findings. First, we acknowledge that our task may not be optimized for detecting changes in accuracy, as the task difficulty was controlled using individually adjusted thresholds (i.e., angular difference). Nevertheless, we observed some evidence supporting the neural-behavioral relationships. In particular, the impulse-driven sensory-like template in V1 contributed to facilitated faster RTs during stimulus selection (Page 12, Fig. 3D and 3E in the main texts; also see our response to R1, Point 6).

      Second, the reviewer raised an important concern about whether the attended feature might be stored in the memory system due to the trial-by-trial repetition of attention conditions (attend 45º or attend 135º). Although this is plausible, we don’t think it is likely. We note that neuroimaging evidence shows that attended working memory contents maintain sensory-like representations in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019), with generalizable neural activity patterns from perception to working memory delay-period, whereas unattended items in multi-item working memory tasks are stored in a latent state for prospective use (Wolff et al., 2017). Importantly, our task only required maintaining a single attentional template at a time. Thus, there was no need to store it via latent representations, if participants simply used a working memory mechanism for preparatory attention. Had they done so, we should expect to find evidence for a sensory template, i.e., generalizable neural pattern between perception and preparation in the No-Ping condition, which was not what we found. We have mentioned this point in the main texts (Page 18, Line 367-372).

      (2) The coexistence of the two types of attentional templates. The authors interpreted their findings as the outcome of a dual-format mechanism in which 'a non-sensory template' and a latent 'sensory-like' template coexist (e.g. lines 103-106, pg. 5). While I find this interpretation interesting and conceptually elegant, I am not sure whether it is appropriate to term it 'coexistence'. First, it is theoretically possible that there is only one representation in either session (i.e. a non-sensory template in the no-ping session and a sensory-like template in the ping session) in any of the brain regions considered. Second, it seems that there is no direct evidence concerning the temporal relationship between these two types of templates, provided that they commonly emerge in both sessions. Besides, due to the sluggish nature of fMRI data, it is difficult to tell whether the two types of templates temporally overlap.

      We thank the reviewer for the comment regarding our interpretation of the ‘coexistence’ of non-sensory and sensory-like attentional template. While we acknowledge the limitations of fMRI in resolving temporal relationships between these two types of templates, several aspects of our data support a dual-format interpretation.

      First, our key findings remained consistent for the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. It thus seems improbable that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the task-irrelevant, uninformative visual impulse. Second, while we agree with the reviewer that the temporal dynamics between these two templates remain unclear, it is difficult to imagine that orientation-specific templates observed during the Ping session emerged de novo from a purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? It seems to us that the more parsimonious explanation is that there was already some orientation signal in a latent format, and it was activated by the ping, in line with the models of “activity-silent” working memory. To address these concerns, we have added the related discussion of these alternative interpretations in the main texts (Page 19, Line 387-391)

      (3) The representational distance. The authors used Mahalanobis distance to quantify the similarity of neural representation between different conditions. According to the authors' hypothesis, one would expect greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session. However, this appears not to be the case. As shown in Figures 3B and C, there was no major difference in Mahalanobis distance between the two sessions in either ROI and the authors did not report a significant main effect of the session in any of the ANOVAs. Besides, in all the ANOVAs, the authors reported only the statistic term corresponding to the interaction effect without showing the descriptive statistics related to the interaction effect. It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective and intuitive understanding of their data.

      We thank the reviewer for this comment. We expected greater pattern similarity between 'attend leftward' and 'perceived leftward' in the Ping session in comparison to the Noping session. This prediction was supported by a significant three-way interaction effect between session × attended orientation × perceived orientation (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116). In particular, there was a significant interaction between attended orientation × perceived orientation (F(1,19) = 9.335, p = 0.007, η<sub>p</sub><sup>2</sup> = 0.329) in the Ping session, but not in the No-Ping session (F(1,19) = 0.017, p = 0.898, η<sub>p</sub><sup>2</sup> = 0.001). These above-mentioned statistical results were reported in the original texts. In addition, this three-way mixed ANOVA (session × attended orientation × perceived orientation) on Mahalanobis distance in V1 revealed no significant main effects (session: F(1,38) = 0.009, p = 0.923, η<sub>p</sub><sup>2</sup> < 0.001; attended orientation: F(1,38) = 0.116, p = 0.735, η<sub>p</sub><sup>2</sup> = 0.003; perceived orientation: (F(1,38) = 1.106, p = 0.300, η<sub>p</sub><sup>2</sup> = 0.028). We agree with the reviewer that a complete reporting of analyses enhances understanding of the data. Therefore, we have now included the main effects in the main texts (Page 11, Line 233).

      We thank the reviewer for the suggestion regarding the inclusion of descriptive statistics for interaction effects. However, since the data were already visualized in Fig. 3B and 3C in the main texts, to maintain conciseness and consistency with the reporting style of other analyses in the texts, we have opted to include these statistics in the Supplementary Information (Page 5, Table 1).

      Reviewer #3 (Public review):

      (1) The title is "Dual-format Attentional Template," yet the supporting evidence for the nonsensory format and its guiding function is quite weak. The author could consider conducting further generalization analysis from stimulus selection to preparation stages to explore whether additional information emerges.

      We thank the reviewer for this comment. Our approach to investigate whether preparatory attention is encoded in sensory or non-sensory format - by training classifier using separate runs of perception task – closely followed methods from previous studies (Stokes et al., 2009; Peelen et al., 2011; Kok et al., 2017). Following the reviewer’s suggestion, we performed generalization analyses by training classifiers on activity during the stimulus selection period and testing them preparatory activity. However, we observed no significant generalization effects in either No-Ping and Ping sessions (ps > 0.780). This null result may stem from a key difference in the neural representations: classifiers trained on neural activity from stimulus selection period necessarily encode both target and distractor information, thus relying on somewhat different information than classifier trained exclusively on isolated target information in the perception task.

      (2) In Figure 2, the author did not find any decodable sensory-like coding in IPS and PFC, even during the impulse-driven session, indicating that these regions do not represent sensory-like information. However, in the final section, the author claimed that the impulse-driven sensorylike template strengthens informational connectivity between sensory and frontoparietal areas. This raises a question: how can we reconcile the lack of decodable coding in these frontoparietal regions with the reported enhancement in network communication? It would be helpful if the author provided a clearer explanation or additional evidence to bridge this gap.

      We thank the reviewer for this comment. We would like to clarity that although we did not observe sensory-like coding during preparation in frontoparietal areas, we did observe attentional signals in these regions, as evidenced by the above-chance within-task attention decoding performance (Fig. 2 in the main texts). This could reflect different neural codes in different areas, and suggests that inter-regional communication does not necessarily require identical representational formats. It seems plausible that the representation of a non-sensory attentional template in frontoparietal areas supports top-down attentional control, consistent with theories suggesting increasing abstraction as the cortical hierarchy ascends (Badre, 2008; Brincat et al., 2018), and their interaction with the sensory representation in the visual areas is enhanced by the visual impulse.

      (3) Given that the impulse-driven sensory-like template facilitated behavior, the author proposed that it might also enhance network communication. Indeed, they observed changes in informational connectivity. However, it remains unclear whether these changes in network communication have a direct and robust relationship with behavioral improvements.

      We thank the reviewer for the suggestion. To examine how network communication relates to behavior, we performed a correlation analysis between information connectivity (IC) and RTs across participants (see Figure S5). We observed a trend of correlations between V1-PFC connectivity and RTs in the Ping session (r = -0.394, p = 0.086), but not in the NoPing session (r = -0.046, <i.p\</i> = 0.846). No significant correlations were found between V1-IPS and RTs (\ps\ > 0.400) or between ICs and accuracy (ps > 0.399). These results suggests that ping-enhanced connectivity might contributed to facilitated responses. Although we may not have sufficient statistical power to warrant a strong conclusion, we think this result is still highly suggestive, so we now added the texts in the Supplementary Information (Page 8, Line 116121; S5 Fig) and mentioned this result in the main texts (Page 14, Line 292-293).

      (4) I'm uncertain about the definition of the sensory-like template in this paper. Is it referring to the Ping impulse-driven condition or the decodable performance in the early visual cortex? If it is the former, even in working memory, whether pinging identifies an activity-silent mechanism is currently debated. If it's the latter, the authors should consider whether a causal relationship - such as "activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas" - is reasonable.

      We apologize for the confusions. The sensory-like template by itself does not directly refer to representations under Ping session or the attentional decoding in early visual cortex. Instead, it pertains to the representational format of attentional signals during preparation. Specifically, its existence is inferred from cross-task generalization, where neural patterns from a perception task (perceive 45º or perceive 135º) generalize to an attention task (attend 45 º or attend 135º). We think this is a reasonable and accepted operational definition of the representational format. Our findings suggest that the sensory-like template likely existed in a latent state and was reactivated by visual pings, aligning more closely with the first account raised by the reviewer.

      We agree with the reviewer that whether ping identifies an activity-silent mechanism is currently debated (Schneegans & Bays, 2017; Barbosa et al., 2021). It is possible that visual impulse amplified a subtle but active representation of the sensory template during attentional preparation and resulted in decodable performance in visual cortex. Distinguishing between these two accounts likely requires neurophysiological measurements, which are beyond the scope of the current study. We have explicitly addressed this limitation in our Discussion (Page 19, Line 395-399).

      Nevertheless, the latent sensory-like template account remains plausible for three reasons. First, our interpretation aligns with theoretical framework proposing that the brain maintains more veridical, detailed target templates than those typically utilized for guiding attention (Wolfe, 2021; Yu et al., 2023). Second, this explanation is consistent with the proposed utility of latent working memory for prospective use, as maintaining a latent sensory-like template during preparation would be useful for subsequent stimulus selection. The latter point was further supported by the reviewer’s suggestion about whether “activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas is reasonable”. Our additional analyses (also refer to our response to Reviewer 3, Point 3) suggested that impulse-enhanced V1-PFC connectivity was associated with a trend of faster behavioral responses (r = -0.394, p = 0.086; see Supplementary Information, Page 8, Line 116-121; S5 Fig). Considering these findings in totality, we think it is reasonable to suggest that visual impulse may strengthen information flow among areas to enhance attentional control.

      Recommendation for the Authors:

      Reviewer #1 (Recommendation for the authors):

      I hate to suggest another fMRI experiment, but in order to make strong claims about two states, I would want to see the methodological and interpretation confounds addressed. Ping condition - would a tone lead to the same result of sharpening the template? If so, then why? Can a ping be manipulated in its effectiveness? That would be an excellent manipulation condition.

      We thank the reviewer for the comments. Please refer to our reply to Reviewer 1, Point 5 for detailed explanation.

      Reviewer #2 (Recommendation for the authors):

      It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective understanding of their data.

      We thank the reviewer for the comments. We now included the relevant descriptive statistics in the Supplementary Information, Table 1.

      Reviewer #3 (Recommendation for the authors):

      In addition to p-values, I see many instances of 'ps'. Does this indicate the plural form of p?

      We used ‘ps’ to denote the minimal p-value across multiple statistical analyses, such as when applying identical tests to different region groups.

      References

      Aitken, F., Menelaou, G., Warrington, O., Koolschijn, R. S., Corbin, N., Callaghan, M. F., & Kok, P. (2020). Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortex. PLoS Biology, 18(12), e3001023.

      Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193-200.

      Barbosa, J., Lozano-Soldevilla, D., & Compte, A. (2021). Pinging the brain with visual impulses reveals electrically active, not activity-silent, working memories. PLoS Biology, 19(10), e3001436.

      Battistoni, E., Stein, T., & Peelen, M. V. (2017). Preparatory attention in visual cortex. Annals of the New York Academy of Sciences, 1396(1), 92-107.

      Brincat, S. L., Siegel, M., von Nicolai, C., & Miller, E. K. (2018). Gradual progression from sensory to task-related processing in cerebral cortex. Proceedings of the National Academy of Sciences, 115(30), E7202-E7211.

      Duncan, D. H., van Moorselaar, D., & Theeuwes, J. (2023). Pinging the brain to reveal the hidden attentional priority map using encephalography. Nature Communications, 14(1), 4749.

      Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuroscience, 27(1), 649-677.

      Gong, M., Chen, Y., & Liu, T. (2022). Preparatory attention to visual features primarily relies on nonsensory representation. Scientific Reports, 12(1), 21726.

      Fan, Y., Han, Q., Guo, S., & Luo, H. (2021). Distinct Neural Representations of Content and Ordinal Structure in Auditory Sequence Memory. Journal of Neuroscience, 41(29), 6290–6303.

      Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632-635.

      Jigo, M., Gong, M., & Liu, T. (2018). Neural determinants of task performance during feature-based attention in human cortex. eNeuro, 5(1).

      Kok, P., Failing, M. F., & de Lange, F. P. (2014). Prior expectations evoke stimulus templates in the primary visual cortex. Journal of Cognitive Neuroscience, 26(7), 1546-1554.

      Kok, P., Mostert, P., & De Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, 114(39), 10473-10478.

      Liu, T., Stevens, S. T., & Carrasco, M. (2007). Comparing the time course and efficacy of spatial and feature-based attention. Vision Research, 47(1), 108-113.

      Mongillo, G., Barak, O., & Tsodyks, M. (2008). Synaptic theory of working memory. Science, 319(5869), 1543-1546.

      Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences, 108(29), 12125-12130. Priebe, N. J. (2016). Mechanisms of orientation selectivity in the primary visual cortex. Annual Review of Vision Science, 2(1), 85-107.

      Rademaker, R. L., & Serences, J. T. (2017). Pinging the brain to reveal hidden memories. Nature Neuroscience, 20(6), 767-769.

      Rademaker, R. L., Chunharas, C., & Serences, J. T. (2019). Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience, 22(8), 1336-1344.

      Serences, J. T., Ester, E. F., Vogel, E. K., & Awh, E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20(2), 207-214.

      Schneegans, S., & Bays, P. M. (2017). Restoration of fMRI decodability does not imply latent working memory states. Journal of Cognitive Neuroscience, 29(12), 1977-1994.

      Stokes, M., Thompson, R., Nobre, A. C., & Duncan, J. (2009). Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proceedings of the National Academy of Sciences, 106(46), 19569-19574.

      Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060-1092.

      Wolff, M. J., Jochim, J., Akyürek, E. G., & Stokes, M. G. (2017). Dynamic hidden states underlying working-memory-guided behavior. Nature Neuroscience, 20(6), 864 – 871.

      Wolff, M. J., Kandemir, G., Stokes, M. G., & Akyürek, E. G. (2020). Unimodal and bimodal access to sensory working memories by auditory and visual impulses. Journal of Neuroscience, 40(3), 671-681.

      Yu, X., Zhou, Z., Becker, S. I., Boettcher, S. E., & Geng, J. J. (2023). Good-enough attentional guidance. Trends in Cognitive Sciences, 27(4), 391-403.

    1. Reviewer #1 (Public review):

      This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration (CND) on auditory processing in listeners with normal audiograms. This effort is important because ~10% of people who seek help for hearing difficulties have normal audiograms and current hearing healthcare has nothing to offer them.

      The authors identify two shortcomings in previous work that they intend to fix. The first is a lack of cross-species studies that make direct comparisons between animal models in which CND can be confirmed and humans for which CND must be inferred indirectly. The second is the low sensitivity of purely perceptual measures to subtle changes in auditory processing. To fix these shortcomings, the authors measure envelope following responses (EFRs) in gerbils and humans using the same sounds, while also performing histological analysis of the gerbil cochleae, and testing speech perception while measuring pupil size in the humans.

      The study begins with a comprehensive assessment of the hearing status of the human listeners. The only differences found between the young adult (YA) and middle aged (MA) groups are in thresholds at frequencies > 10 kHz and DPOAE amplitudes at frequencies > 5 kHz. The authors then present the EFR results, first for the humans and then for the gerbils, showing that amplitudes decrease more rapidly with increasing envelope frequency for MA than for YA in both species. The histological analysis of the gerbil cochleae shows that there were, on average, 20% fewer IHC-AN synapses at the 3 kHz place in MA relative to YA, and the number of synapses per IHC was correlated with the EFR amplitude at 1024 Hz.

      The study then returns to the humans to report the results of the speech perception tests and pupillometry. The correct understanding of keywords decreased more rapidly with decreasing SNR in MA than in YA, with a noticeable difference at 0 dB, while pupillary slope (a proxy for listening effort) increased more rapidly with decreasing SNR for MA than for YA, with the largest differences at SNRs between 5 and 15 dB. Finally, the authors report that a linear combination of audiometric threshold, EFR amplitude at 1024 Hz, and a few measures of pupillary slope is predictive of speech perception at 0 dB SNR.

      I only have two questions/concerns about the specific methodologies used:

      (1) Synapse counts were made only at the 3 kHz place on the cochlea. But the EFR sounds were presented at 85 dB SPL, which means that a rather large section of the cochlea will actually be excited. Do we know how much of the EFR actually reflects AN fibers coming from the 3 kHz place? And are we sure that this is the same for gerbils and humans given the differences in cochlear geometry, head size, etc.?

      [Note added after revision: the authors have added new data, references, and discussion that have answered my initial questions].

      (2) Unless I misunderstood, the predictive power of the final model was not tested on held out data. The standard way to fit and test such model would be to split the data into two segments, one for training and hyperparameter optimization, and one for testing. But it seems that the only spilt was for training and hyperparameter optimization.

      [Note added after revision: the authors now make it clear in their response that the modeling tells us how much of the current data can be explained but not necessary about generalization to other datasets.]

      While I find the study to be generally well executed, I am left wondering what to make of it all. The purpose of the study with respect to fixing previous methodological shortcomings was clear, but exactly how fixings these shortcomings has allowed us to advance is not. I think we can be more confident than before that EFR amplitude is sensitive to CND, and we now know that measures of listening effort may also be sensitive to CND. But where is this leading us?

      I think what this line of work is eventually aiming for is to develop a clinical tool that can be used to infer someone's CND profile. That seems like a worthwhile goal but getting there will require going beyond exploratory association studies. I think we're ready to start being explicit about what properties a CND inference tool would need to be practically useful. I have no idea whether the associations reported in this study are encouraging or not because I have no idea what level of inferential power is ultimately required.

      [Note added after revision: the authors have added to the Discussion to put their work into a broader perspective.]

      That brings me to my final comment: there is an inappropriate emphasis on statistical significance. The sample size was chosen arbitrarily. What if the sample had been half the size? Then few, if any, of the observed effects would have been significant. What if the sample had been twice the size? Then many more of the observed effects would have been significant (particularly for the pupillometry). I hope that future studies will follow a more principled approach in which relevant effect sizes are pre-specified (ideally as the strength of association that would be practically useful) and sample sizes are determined accordingly.

      [Note added after revision: my intention with this comment was not to make a philosophical or nitty-gritty point about statistics. It was more of a follow on to the previous point. Because I don't know what sort of effect size is big enough to matter (for whatever purpose), I don't find the statistical significance (or lack thereof) of the effect size observed to be informative. But I don't think there is anything more that the authors can or should do in this regard.]

      So, in summary, I think this study is a valuable but limited advance. The results increase my confidence that non-invasive measures can be used to infer underlying CND, but I am unsure how much closer we are to anything that is practically useful.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, Dop1R1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that Dop1R1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest that different internal states (courtship failure vs. starvation) modulate the peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We have discussed these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      This is an important consideration. To rule out potential confound from food intake during courtship conditioning, we have now also conducted courtship conditioning in vials absent of food. In the absence of any feeding opportunity over the 4.5-hour courtship conditioning period, sexually rejected males still exhibited a robust decrease in sweet taste sensitivity compared with Naïve and Satisfied controls (Figure 1-supplement 1C). These data confirm that the suppression of PER is driven by courtship failure per se, rather than by differences in feeding during the conditioning phase.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Our initial description of the experimental setup might be a bit confusing. Here is a brief clarification of our experimental design and we have further clarified the details in the revised manuscript, which should resolve the reviewer’s concerns:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the dose-response curve). On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005).

      In the initial submission, we used 400 mM sucrose for the MAFE assay. When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding, as a natural consequence of decreased sugar sensitivity (Figure 1B). We were able to quantify the actual volume of food consumed of these flies showing PER responses towards 400 mM sucrose and observed no change (Figure 1-supplement 1A, left). To avoid potential confusion, we have now repeated the MAFE assay with 800 mM sucrose, which elicited feeding in ~100% of flies among all three groups, as shown in Figure 1C. Again, we observed no change in food intake (Figure 1-supplement 1A, right).

      These experiments in combination suggest that sexual failure suppresses sweet sensitivity of the Failed males. Meanwhile, as long as they still responded to a certain food stimulus and initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We have now quantified the effect of TH<sup>+</sup> neuron activation on Gr5a<sup>+</sup> neuron calcium responses. in Naïve males, dTRPA1-mediated activation of TH<sup>+</sup> cells significantly enhanced sucrose-induced calcium responses (Figure 3-supplement 1C); while in Failed males, the baseline activity of Gr5a<sup>+</sup> neurons was lower (Figure 3C), the same activation also produced significant (even slightly larger) effect on the calcium responses of Gr5a<sup>+</sup> neurons (Figure 3-supplement 1D).

      Taken together, we would argue that these experiments using both Naïve and Failed males were adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3-supplement 1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuitry.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We have added detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. For specificity, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. We have now addressed the issues by re-conducting these calcium imaging experiments again with a head-to-head comparison with the controls (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi).

      In these new experiments, Dop1R1 or Dop2R knockdown completely prevented the suppression of Gr5a<sup>+</sup> neuron responsiveness by courtship failure (Figure 6E), whereas the activities of Gr5a<sup>+</sup> neurons in Naïve/Satisfied groups were not altered. These results demonstrate that Dop1R1 and Dop2R are specifically required to mediate the decrease in sweet sensitivity following courtship failure.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We have changed the wording in the revised manuscript. In brief, we think that these manipulations have two consequences: suppressing the overall sweet sensitivity, and eliminating the effect of sexual failure on sweet sensitivity.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine the exact mechanism of how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds Figure 1-supplement 1D-E) upon courtship failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We have further discussed this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Please refer to our responses to the Public Review (Reviewer 1, Point 2) for details.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The label for the y-axis in Figure 1B should be "fraction", not "percentage".

      We have revised the figure as suggested.

      (2) I suggest that the authors indicate the ROIs they used to quantify the signal intensity in Figure 3E and G.

      We have revised the figures as suggested.

      (3) There is a typo in Figure 4A: it should be "Wilde type", not "Wide type".

      We have revised the figure as suggested.

      (4) The elav-GAL4/+ data in Figure 4-S1B, C, and D appears to be reused across these panels. However, the number of asterisks indicating significance in the MAT plots differs between them (three in panels B and C, and four in panel D). Is this a typo?

      It is indeed a typo, and we have revised the figure accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional comments:

      The authors should add this missing literature about dopamine and neuromodulation in courtship:

      Boehm et al., 2022 (eLife) - this study shows that mating affects olfactory behavior in females.

      Cazalé-Debat et al., 2024 (Nature) - Mating proximity blinds threat perception.

      Gautham et al., 2024 (Nature) - A dopamine-gated learning circuit underpins reproductive state-dependent odor preference in Drosophila females.

      We have added these references in the introduction section.

      Has the mating behavior been quantified? How often did males copulate with mated and virgin females?

      We tried to examine the copulation behavior based on our video recordings. In the “Failed” group (males paired with mated females), we observed virtually no successful copulation events at all, confirming that nearly 100% of those males experienced sexual failure. In contrast, males in the “Satisfied” group (paired with virgin females) mated on average 2-3 times during the 4.5-hour conditioning period. We have added some explanations in the manuscript.

      Do the rejected males live shorter? Is the effect also visible when they are fed with normal fly food, or is it only working with sugar?

      We did not directly measure the lifespan of these males. But we conducted a relevant assay (starvation resistance), in which “Failed” males died significantly faster than both Naïve and Satisfied controls, indicating a clear reduction in their ability to endure food deprivation (Figure 1-supplement 1B). Since sweet taste is a primary cue for food detection in Drosophila, and sugar makes up a large portion of their standard diet, the drop in sugar sensitivity we observed in Failed males could likewise impair their perception and consumption of regular fly food, hence their resistance to starvation.

      Also, the authors mention that the reward pathway is affected, this is probably the case as sugar sensation is impaired. One interesting experiment would be (and maybe has been done?) to test rejected males in normal odor-fructose conditioning. The data would suggest that they would do worse.

      We have already measured how courtship failure affected fructose sensitivity (Figure 1 supplement 1D), and we found that the reduction in fructose perception was even more profound than for sucrose. We have not yet tested whether Failed males showed deficits in odor-fructose associative conditioning. That was indeed a very interesting direction to explore. But olfactory reward learning relies on molecular and circuit mechanisms distinct from those governing taste. We therefore argue such experiments would be more suitable in a separate, follow up study.

      The authors could have added another group where males are exposed to other males. It would be interesting if this is also a "stressful" context and if it would also reduce sugar preference - probably beyond the scope of this paper.

      In our experiments, all flies, including those in the Naïve, Failed, and Satisfied groups, were housed in groups of 25 males per vial before the conditioning period (and the Naïve group remained in the same group housing until PER testing). This means every cohort experienced the same level of “social stress” from male-male interactions. While it would indeed be interesting to compare that to solitary housing or other male-only exposures, isolation itself imposes a different kind of stress, and disentangling these effects on sugar preference would require a separate, dedicated study beyond the scope of the present work.

      Would the behavior effect also show up with experienced males? Maybe this has been tested before. Does mating rejection in formerly successful males have the same impact?

      As suggested by the reviewer, we performed an additional experiment in which males that had previously mated successfully were subsequently subjected to courtship rejection. As shown in Figure 1 supplement 1F, prior successful mating did not prevent the decline in sweet sensitivity induced by subsequent mating failure, indicating that even experienced males exhibit the reduction in sugar sensitivity after rejection.

      Is the same circuit present and functioning in females? Does manipulating dopamine receptors in GR5a neurons in females lead to the same phenotype? This would suggest that different internal states in males and females could lead to the same phenotype and circuit modulations.

      This is indeed a very interesting suggestion. In male flies, Gr5a-specific knockdown of dopamine receptors did not alter baseline sweet sensitivity, but it selectively prevented the reduction in sugar perception that followed mating failure (Figure 6C-D), indicating that this dopaminergic pathway is engaged only in the context of courtship rejection. By extension, knocking down the same receptors in female GR5a neurons would likewise be expected to leave their basal sugar sensitivity unchanged. Moreover, because there is currently no established paradigm for inducing mating failure in female flies, we cannot yet test whether sexual rejection similarly modulates sweet taste in females, or whether it operates via the same circuit.

      Reviewer #3 (Recommendations for the authors):

      Suggestions to the authors:

      Introduction, line 61. I suggest the authors add references in fruit flies concerning the rewarding nature of mating. For example, the paper from Zhang et al, 2016 "Dopaminergic Circuitry Underlying Mating Drive" demonstrates the role of the dopamine rewarding system in mating drive. There is a large body of literature showing the link between dopamine and mating.

      We have added this literature in the introduction section.

      Figure 1B and Figure Supplement 1: If I understood correctly, Figure Supplement 1A shows that the total food consumption across all tested flies remains unchanged. However, fewer flies that failed to mate consumed sucrose. I would be curious to see the results for sucrose consumption per individual fly that did eat. According to their results, individual flies that failed to mate should consume more sucrose. This would change the conclusion. The authors currently show that a group of flies that failed to mate consumed less sucrose overall, but since fewer males actually ate, those that failed to mate and did eat consumed more sucrose. The authors should distinguish between failed and satisfied flies in two groups: those that ate and those that did not.

      Please see our responses to the Public Review for details (Reviewer 1, Point 2).

      Figure 1C, right: For a better understanding of all the "MAT" figures, I suggest the authors start the Y axis with the unit 25 and increase it to 400. This would match better the text (line 114) saying that it was significantly elevated in the failed group. As it is, we have the impression of a decrease in the graph.

      We have revised the figures accordingly.

      Line 103: When suggesting a reduced likelihood of meal initiation of these males, do these males take longer to eat when they did it? In other words, is the latency to eat increased in failed males? That would be a good measure of motivational state.

      We tried to analyze feeding latency in the MAFE assay by measuring the time from sucrose presentation to the first proboscis extension, but it was too short to be accurately accounted. Nevertheless, when conducting the experiments, we did not feel/observe any significant difference in the feeding latency between Failed males and Naïve or Satisfied controls.

      Line 117. I don't understand which results the authors refer to when writing "an overall elevation in the threshold to initiate feeding upon appetitive cues". Please specify.

      This phrase refers to the fact that for every sweet tastant we tested, including sucrose (Figure 1C), fructose and glucose (Figure 1 supplement 1D-E), the concentration-response curve in Failed males shifted to the right, and the Mean Acceptance Threshold (MAT) was significantly higher. In other words, for these different appetitive cues, mating failure raised the concentration of sugar required to trigger a proboscis extension, indicating a general elevation in the threshold to initiate feeding upon an appetitive cue.

      Figure 1D. Please specify the time for the satisfied group.

      For clarity, the Naïve and Satisfied groups in Figure 1D each represent pooled data from 0 to 72 hours post-treatment, as their sweet sensitivity remained stable throughout this period. Only the Failed group was shown with time-resolved data, since it was the only group exhibiting a dynamic change in sugar sensitivity over time. We have now specified this in the figure legend.

      Figure 1F. The phenotype was not totally reversed in failed-re-copulated males. Could it be due to the timing between failure and re-copulation? I suggest the authors mention in the figure or in the text, the time interval between failure and re-copulation.

      We’d like to clarify that the interval between the initial treatment (“Failed”) and the opportunity for re copulation was within 30 minutes. The incomplete reversal in the Failed-re-copulated group indeed raised interesting questions. One possible explanation is that mating failure reduces synaptic transmissions between the SEZ dopaminergic neurons and Gr5a<sup>+</sup> sweet sensory neurons (Figure 3), and the regeneration of these transmissions takes a longer time. We have added this information to the figure legend and the Method section.

      Line 227-228 and Figure 3E. The authors showed that the synaptic connections between dopaminergic neurons and Gr5a+ GRNs were significantly weakened. I am wondering about the delay between mating failure and the GFP observation. It would be informative to know this timing to interpret this decrease in synaptic connections. If the timing is relatively long, it is possible that we can observe a neuronal plasticity. However, if this timing is very short, I would not expect such synaptic plasticity.

      The interval between the behavioral treatment and the GRASP-GFP experiment was approximately 20 hours. We chose this time window because it was sufficient for both GFP expression and accumulation. Therefore, the observed reduction in synaptic connections between dopaminergic neurons and Gr5a<sup>+</sup> GRNs likely reflects a genuine, experience-induced structural and functional change rather than an immediate, transient effect. We have added this information to the revised manuscript for clarity in the Method section.

      Line 240-243: The authors demonstrated that there is a reduction of CaLexA-mediated GFP signals in dopaminergic neurons in the SEZ after mating failure, but not a reduction in Gr5a+ GRNs. I suggest replacing "indicate" with "suggest' in line 240.

      We have made the change accordingly. Meanwhile, we would like to clarify that while we observed a reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G), we did not directly test NFAT signal in Gr5a<sup>+</sup> neurons. Notably, the results that the synaptic transmissions from SEZ dopaminergic neurons to Gr5a<sup>+</sup> neurons were weakened (Figure 3E-F), and the reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G-I), were in line with a reduction in sweet sensitivity of Gr5a<sup>+</sup> neurons upon courtship failure (Figure 3B-D).

      Line 243: replace "consecutive" with "constitutive".

      We have revised it accordingly.

      Figure 5: I have trouble understanding the results obtained in Figure 5. Both constitutive activation and inhibition of Dop1R1 and Dop2R neurons lead to the same results, knowing that males who failed mating no longer exhibit decreased sweet sensitivity. I would have expected contrary results for both experimental conditions. I suggest the author to discuss their results.

      Both activation and inhibition of Dop1R1 and Dop2R neurons eliminated the effect of courtship failure on sweet sensitivity (Figure 5). These results are in line with our hypothesis that courtship failure leads to changes in dopamine signaling and hence sweet sensitivity. If dopamine signaling via Dop1R1 and Dop2R was locked, either to a silenced or a constitutively activated state, the effect of courtship failure on sweet sensitivity was eliminated.

      Nevertheless, as the reviewer pointed out, constitutive activation/inhibition should in principle lead to the opposite effect on Naïve flies. In fact, when Dop1R1<sup>+</sup>/Dop2R<sup>+</sup> neurons were silenced in Naïve flies, PER to sucrose was significantly reduced (Figure 5C-D), confirming that these neurons normally facilitate sweet sensation. Meanwhile, while neuronal activation by NaChBac did show a trend towards enhanced PER compared to the GAL4/+ controls, it did not exhibit a difference compared to +>UAS-NaChBac controls that showed a high PER level, likely due to a potential ceiling effect. We have added the discussions to the manuscript.

      Figure 7: I suggest the authors modify their figure a bit. It is not clear why in failed mating, the red arrow in "behavioral modulation" goes to the fly. The authors should find another way to show that mating failure decreased the percentage of flies that are motivated to eat sugar.

      We have modified the figure as suggested.

      Overall, I would suggest the authors be precautious with their conclusion. For example, line 337= "sexual failure suppressed feeding behavior". This is not what is shown by this study. Here, the study shows that mating failure decreases the fraction of flies to eat sucrose. Unless the authors demonstrate that this decrease is generalizable to other metabolites, I suggest the authors modify their conclusion.

      While we primarily used sucrose as the stimulant in our experiments, we also tested responses to two other sugars: fructose and glucose (Figure 1 supplement 1D-E). In all three cases, mating failure led to a significant reduction in sweet perception, suggesting that the effect of courtship failure is not limited to a single metabolite but rather reflects a general decrease in sweet sensitivity. Meanwhile, reduced sweet sensitivity indeed led to a reduction of feeding initiation (Figure 1).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      In the future, could you please include the exact changes made to the manuscript in the relevant section of the rebuttal, so it's clear which changes addressed the comment? That would make it easier to see what you refer to exactly - currently I have to guess which manuscript changes implement e.g. "We have tried to make these points more evident".

      Yes, we apologize for the inconvenience.

      On possible navigation solutions:

      I'm not sure if I follow this argument. If the networks uses a shifted allocentric representation centred on its initial state, it couldn't consistently decode the position from different starting positions within the same environment (I don't think egocentric is the right term here - egocentric generally refers to representations relative to the animal's own direction like "to the left" rather than "to the west" but these would not work in the allocentric decoding scheme here). In other words: If I path integrate my location relative to my starting location s1 in environment 1 and learn how to decode that representation to an environment location, I cannot use the same representation when I start from s2 in environment 1, because everything will have shifted. I still believe using boundaries is the only solution to infer the absolute location for the agent here (because that's the only information that it gets), and that's the reason for finding boundary representations (and not grid cells). Imagine doing this task on a perfect torus where there are no boundaries: it would be impossible to ever find out at what 'absolute' location you are in the environment. I have therefore not updated this part of my review, but do let me know if I misunderstood.

      Thank you for addressing this point, which is a somewhat unusual feature of our network: We believe the point you raise applies if the decoding were fixed. However, in our case, the decoding is dynamic and depends on the firing pattern, as place unit centers are decoded on a per-trajectory basis. Thus, a new place-like basis may be formed for each trajectory (and in each environment). Hence, the model is not constrained to reuse its representation across trajectories or environments, as place centers are inferred based on unit firing. However, we do observe that the network learns to use a fixed place field placement in each geometry, which likely reflects some optimal solution to the decoding problem. This might also help to explain the hexagonal arrangement of learned field centers. Finally, we agree that egocentric may not be entirely accurate, but we found it to be the best word to distinguish from the allocentric-type navigation adopted by the network.

      Regarding noise injection:

      Beyond that noise level, the network might return to high correlations, but that must be due to the boundary interactions - very much like what happens at the very beginning of entering an environment: the network has learned to use the boundary to figure out where it is from an uninformative initial hidden state. But I don't think this is currently reflected well in the main text. That still reads "Thus, even though the network was trained without noise, it appears robust even to large perturbations. This suggests that the learned solutions form an approximate attractor." I think your new (very useful!) velocity ablations show that only small noise is compensated for by attractor dynamics, and larger noise injections are error corrected through boundary interactions. I've added this to the new review.

      Thank you for your kind feedback: We have changed the phrasing in the text to say “robust even to moderate perturbations. ” As we hold that, while numerically small, the amount of injected noise is rather large when compared to the magnitude of activities in the network (see Fig. A5d); the largest maximal rate is around 0.1, which is similar to the noise level at which output representations fail to re-converge. However, some moderation is appropriate, we agree.

      On contexts being attractive:

      In the new bit of text, I'm not sure why "each environment appears to correspond to distinct attractive states (as evidenced by the global-type remapping behavior)", i.e. why global-type remapping is evidence for attractive states. Again, to me global-type remapping is evidence that contexts occupy different parts of activity space, but not that they are attractive. I like the new analysis in Appendix F, as it demonstrates that the context signal determines which region of activity space is selected (as opposed to the boundary information!). If I'm not mistaken, we know three things: 1. Different contexts exist in different parts of representation space, 2. Representations are attractive for small amounts of noise, 3. The context signal determines which point in representation space is selected (thanks to the new analysis in Appendix F). That seems to be in line with what the paper claims (I think "contexts are attractive" has been removed?) so I've updated the review.

      It seems to us that we are in agreement on this point; our aim is simply to point out that a particular context signal appears to correspond to a particular (discrete) attractor state (i.e., occupying a distinct part of representation space, as you state), it just seems we use slightly different language, but to avoid confusion, we changed this to say that “representations are attractive”.

      Thanks again for engaging with us, this discussion has been very helpful in improving the paper.

      Reviewer #2:

      However, I still struggle to understand the entire picture of the boundary-to-place-to-grid model. After all, what is the role of grid cells in the proposed view? Are they just redundant representations of the space? I encourage the authors to clarify these points in the last two paragraphs on pages 17-18 of the discussion.

      Thank you for your feedback. While we have discussed the possible role of a grid code to some extent, we agree that this point requires clarification. We have therefore added to the discussion on the role of grid cells, which now reads “While the lack of grid cells in this model is interesting, it does not disqualify grid cells from serving as a neural substrate for path integration. Rather, it suggests that path integration may also be performed by other, non-grid spatial cells, and/or that grid cells may serve additional computational purposes. If grid cells are involved during path integration, our findings indicate that additional tasks and constraints are necessary for learning such representations. This possibility has been explored in recent normative models, in which several constraints have been proposed for learning grid-like solutions. Examples include constraints concerning population vector magnitude, conformal isometry \cite{xu_conformal_2022, schaeffer_self-supervised_2023, schoyen_hexagons_2024}, capacity, spatial separation and path invariance \cite{schaeffer_self-supervised_2023}. Another possibility is that grid cells are geared more towards other cognitive tasks, such as providing a neural metric for space \cite{ginosar_are_2023, pettersen_self-supervised_2024}, or supporting memory and inference-making \cite{whittington_tolman-eichenbaum_2020}. That our model performs path integration without grid cells, and that a myriad of independent constraints are sufficient for grid-like units to emerge in other models, presents strong computational evidence that grid cells are not solely defined by path integration, and that path integration is not only reserved for grid cells.”

      Thank you again for your time and input.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a nonhymenopteran eusocial insect - a termite and identified the well-established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that, they are able to identify a sensillum that houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localize its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in workers than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well-written and a pleasure to read. The figures are beautiful and clear. I actually had a hard time coming up with suggestions.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail-following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knockdowns via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

      We agree that a functional proof of the PsimOR14 function using reverse genetics would be a valuable addition to the study to firmly establish its role in trail pheromone sensing. Nevertheless, such a functional proof is difficult to obtain. Due to the very slow ontogenetic development inherent to termites (several months from an egg to the worker stage) the CRISPR Cas-9 is not a useful technique for this taxon. By contrast, termites are quite responsive to RNAimediated silencing and RNAi has previously been used for the silencing of the ORCo co-receptor in termites resulting in impairment of the trail-following behavior (DOI: 10.1093/jee/toaa248). Likewise, our previous experiments showed a decreased ORCo transcript abundance, lower sensitivity to neocembrene and reduced neocembrene trail following upon dsPsimORCo administration to P. simplex workers, while we did not succeed in reducing the transcript abundance of PsimOR14 upon dsPsimOR14 injection. We do not report these negative results in the present manuscript so as not to dilute the main message. In parallel, we are currently developing an alternative way of dsRNA delivery using nanoparticle coating, which may improve the RNAi experiments with ORs in termites.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predicted the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I consider that this study will contribute to further understanding of the molecular and evolutionary mechanisms of the chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      We thank the reviewer for the overall positive evaluation of the manuscript.

      Weakness:

      As you can see in the "Recommendations to the Authors" section below, there are several things in this paper that are not fully explained about experimental methods. Except for this point, this paper appears to me to have no major weaknesses.

      We address point by point the specific comments listed in the Recommendation to the authors chapter below.

      Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size.

      We thank the reviewer for these positive comments.

      Weaknesses:

      The manuscript would benefit from a more detailed explanation of the research advances this work provides. Stating that this is the first deorphanization of an odorant receptor in a clade is insufficient. The introduction primarily reviews termite chemical communication and deorphanization of olfactory receptors previously performed. Although this is essential background, it lacks a good integration into explaining what problem the current study solves.

      We understand the comment about the lack of an intelligible cue to highlight the motivation and importance of the present study. In the current version of the manuscript the introduction has been reworked. As suggested by Reviewer 3 in the Recommendations section below, the introduction now integrates some parts of the original discussion, especially the part discussing the OR evolution and emergence of eusociality in hymenopteran social insects and in termites, while underscoring the need of data from termites to compare the commonalities and idiosyncrasies in neurophysiological (pre)adaptations potentially linked with the independent eusociality evolution in the two main social insect clades.

      Selecting target ORs for deorphanization is an essential step in the approach. Unfortunately, the process of choosing these ORs has not been described. Were the authors just lucky that they found the correct OR out of the 50, or was there a specific selection process that increased the probability of success?

      Indeed, we were extremely lucky. Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. The selection criteria for the first set of four receptors were (i) to have full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) to be represented on different branches (subbranches) of the phylogenetic tree. Then it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component. In the revised version, we state these selection criteria in the results section (Phylogenetic reconstruction and candidate OR selection).

      The deorphanization attempts of additional P. simplex ORs are currently running.

      The authors assigned antennal sensilla into five categories. Unfortunately, they did not support their categories well. It is not clear how they were able to differentiate SI and SII in their antennal recordings.

      We agree that the classification of multiporous sensilla into five categories lacks robust discrimination cues. The identification of the neocembrene-responding sensillum was initially carried out by SSR measurements on individual olfactory sensilla of P. simplex workers one-by-one and the topology of each tested sensillum was recorded on optical microscope photographs taken during the SSR experiment. Subsequently, the SEM and HR-SEM were performed in which we localized the neocembrene sensillum and tried to find distinguishing characters. We admit that these are not robust. Therefore, in the revised version of the manuscript we decided to abandon the attempt of sensilla classification and only report the observations about the specific sensillum in which we consistently recorded the response to neocembrene (and geranylgeraniol). The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      The authors used a large odorant panel to determine receptor tuning. The panel included volatile polar compounds and non-volatile non-polar hydrocarbons. Usually, some heat is applied to such non-volatile odorants to increase volatility for receptor testing. It is unclear how it is possible that these non-volatile compounds can reach the tested sensilla without heat application.

      The reviewer points at an important methodological error we made while designing the experiments. Indeed, the inclusion of long-chain hydrocarbons into Panel 1 without additional heat applied to the odor cartridges was inappropriate, even though the experiments were performed at 25–26 °C. We carefully considered the best solution to correct the mistake and finally decided to remove all tested ligands beyond C22 from Panel 1, i.e. altogether five compounds. These changes did not affect the remaining Panels 2-4 (containing compounds with sufficient volatility), nor did they affect the message of the manuscript on highly selective response of PsimOR14 to neocembrene (and geranylgeryniol). In consequence, Figures 2, 3 and 5 were updated, along with the supplementary tables containing the raw data on SSR measurements. In addition, the tuning curve for PsimOR14 was re-built and receptor lifetime sparseness value re-calculated (without any important change). We also exchanged squalene for limonene in the docking and molecular dynamics analysis and made new calculations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) L 208: "than" instead of "that"

      Corrected.

      (2) L 527+527 strange squares (•) before dimensions

      Apparently an error upon file conversion, corrected.

      (3) L553 "reconstructing" instead of "reconstruct"

      Corrected.

      (4) Two references (Chahda et al. and Chang et al. appear too late in the alphabet.

      Corrected. Thank you for spotting this mistake. Due to our mistake the author list was ordered according to the alphabet in Czech language, which ranks CH after H.

      Reviewer #2 (Recommendations for the authors):

      (1) L148: Why did the authors select only four ORs (PsimOR9, 14, 30, and 31) though there are 50 ORs in P. simplex? I would like you to explain why you chose them.

      Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. Then, it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component, while the deorphanization attempts of a set of additional P. simplex ORs is currently running. In the revised version of the manuscript, we state the selection criteria for the four ORs studied in the Results section (Phylogenetic reconstruction and candidate OR selection).

      (2) L149: Where is Figure 1A? Does this mean Figure 1?

      Thank you for spotting this mistake. Fig. 1 is now properly labelled as Fig. 1A and 1B in the figure itself and in the legend. Also the text now either refers to either 1A or 1B.

      (3) Figure 1: The authors also showed the transcription abundance of all 50 ORs of P. simplex in the right bottom of Figure 1, but there is no explanation about it in the main text.

      The heatmap reporting the transcript abundances is now labelled as Fig. 1B and is referred to in the discussion section (in the original manuscript it was referred to on the same place as Fig. 1).

      (4) L260-265: The authors confirmed higher expression of PsimOR14 in workers than soldiers by using RNA-seq data and stronger EAG responses of PsimOR14 to neocembrene in workers than soldiers, but I think that confirming the expression levels of PsimOR14 in workers and soldiers by RT-qPCR would strengthen the authors' argument (it is optional).

      qPCR validation is a suitable complement to read count comparison of RNA Seq data, especially when the data comes from one-sample transcriptomes and/or low coverage sequencing. Yet, our RNA Seq analysis is based on sequencing of three independent biological replicates per phenotype (worker heads vs. soldier heads) with ~20 millions of reads per sample. Thus, the resulting differential gene expression analysis is a sufficient and powerful technique in terms of detection limit and dynamic range.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified since the Methods section only referred to the GenBank accession numbers in the original manuscript. Therefore, we added more information in the Methods section (Bioinformatics) and make clear in the Methods that this data comes from our previous research and related bioproject.

      (5) L491: I think that "The synthetic processes of these fatty alcohols are ..." is better.

      We replaced the sentence with “The de novo organic synthesis of these fatty alcohols is described …”

      (6) L525 and 527: There are white squares between the number and the unit. Perhaps some characters have been garbled.

      Apparently an error upon file conversion, corrected.

      (7) L795: ORCo?

      Corrected.

      (8) L829-830 & Figure 4: Where is Figure 4D?

      Thank you for spotting this mistake from the older version of Figure 4. The SSR traces referred to in the legend are in fact a part of Figure 5. Moreover, Figure 4 is now reworked based on the comments by Reviewer 3.

      (9) L860-864: Why did the authors select the result of edgeR for the volcano plot in Figure 7 although the authors use both DESeq2 and edgeR? An explanation would be needed.

      Both algorithms, DESeq2 and EdgeR, are routinely used for differential gene expression analysis. Since they differ in read count normalization method and statistical testing we decided to use both of them independently in order to reduce false positives. Because the resulting fold changes were practically identical in both algorithms (results for both analyses are listed in Supplementary table S15), we only reported in Fig. 7 the outputs for edgeR to avoid redundancies. We added in the Results section the information that both techniques listed PsimOR14 among the most upregulated in workers.

      Reviewer #3 (Recommendations for the authors):

      The discussion contains many descriptions that would fit better into the introduction, where they could be used to hint at the study's importance (e.g., 292-311, 381-412). The remaining parts often lack a detailed discussion of the results that integrates details from other insect studies. Although references were provided, no details were usually outlined. It would be helpful to see a stronger emphasis on what we learn from this study.

      Along with rewriting the introduction, we also modified the discussion. As suggested, the lines 292-311 were rewritten and placed in the introduction. By contrast, we preferred to keep the two paragraphs 381-412 in the discussion, since both of them outline the potential future interesting targets of research on termite ORs.

      As suggested, the discussion has been enriched and now includes comparative examples and relevant references about the broad/narrow selectivity of insect ORs, about the expected breadth of tuning of pheromone receptors vs. ORs detecting environmental cues, about the potential role of additional neurons housed in the neocembrene-detecting sensillum of P. simplex workers, etc. From both introduction and discussion the redundant details on the chemistry of termite communication have been removed.

      This includes explanations of the advantages of the specific methodologies the authors used and how they helped solve the manuscript's problem. What does the phylogeny solve? Was it used to select the ORs tested? It would be helpful to discuss what the phylogeny shows in comparison to other well-studied OR phylogenies, like those from the social Hymenoptera.

      We understand the comment. In fact, our motivation to include the phylogenetic tree of termite ORs was essentially to demonstrate (i) the orthologous nature of OR diversity with few expansions on low taxonomic levels, and (ii) to demonstrate graphically the relationship among the four selected sequences. We do not attempt here for a comprehensive phylogenetic analysis, because it would be redundant given that we recently published a large OR phylogeny which includes all sequences used in the present manuscript and analysed them in the proper context of related (cockroaches) and unrelated insect taxa (Johny et al., 2023). This paper also discusses the termite phylogenetic pattern with those observed in other Insecta. This paper is repeatedly cited on appropriate places of the present manuscript and its main observations are provided in the Introduction section. Therefore, we feel that thorough discussion on termite phylogeny would be redundant in the present paper.

      The authors categorized the sensilla types. Potential problems in the categorization aside, it would be helpful to know if it is expected that you have sensilla specialized in perceiving one specific pheromone. What is known about sensilla in other insects?

      We understand. In the discussion of the revised version, we develop more about the features typical/expected for a pheromone receptor and the sensillum housing this receptor together with two other olfactory sensory neurons, including examples from other insects.

      As the manuscript currently stands, specialist readers with their respective background knowledge would find this study very interesting. In contrast, the general reader would probably fail to appreciate the importance of the results.

      We hope that the re-organized and simplified introduction may now be more intelligible even for non-specialist readers.

      (1) L35: Should "workers" be replaced with "worker antennae"?

      Corrected.

      (2) L62: Should "conservativeness" be replaced by "conservation"?

      Replaced with “parsimony”.

      (3) L129: How and why did the authors choose four candidate ORs? I could not find any information about this in the manuscript. I wondered why they did not pick the more highly expressed PsimOr20 and 26 (Figure 7).

      As already replied above in the Weaknesses section, we selected for the first deorphanization attempts only a modest set of four ORs, while an additional set is currently being tested. We also explained above the inclusion criteria, i.e. (i) full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) presence on different branches (subbranches) of the OR phylogeny. For these reasons, we did not primarily consider the expression patterns of different ORs. As for Fig. 7, it shows differential expression between soldiers and workers, which was not the primary guideline either and the data was obtained only after having the ORs tested by SSR. Yet, even though we had data on P. simplex ORs expression (Fig. 1B), we did not presume that pheromone receptors should be among the most expressed ORs, given the richness of chemical cues detected by worker termites and unlike, e.g., male moths, where ORs for sex pheromones are intuitively highly expressed.

      The strategy of OR selection is specified in the results section of the revised manuscript under “Phylogenetic reconstruction and candidate OR selection”.

      (4) 198 to 200: SI, II, and III look very similar. Additional measurements rather than qualitative descriptions are required to consider them distinct sensilla. The bending of SIII could be an artifact of preparation. I do not see how the authors could distinguish between SI and SII under the optical microscope for recordings. A detailed explanation is required.

      As we responded above in “Weaknesses” chapter, we admit that the sensilla classification is not intelligible. Therefore, we decided in the revised version to abandon the classification of sensilla types and only focus on the observations made on the neocembreneresponding sensillum. To recognize the specific sensillum, we used its topology on the last antennal segment. Because termite antennae are not densely populated with sensilla, it is relatively easy to distinguish individual sensilla based on their topology on the antenna, both in optical microscope and SEM photographs. The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      (5) 208: "Than" instead of "that"

      Corrected.

      (6) 280: I suggest replacing "demand" with "capabilities"

      Corrected.

      (7) 312: Why "nevertheless? It sounds as if the authors suggest that there is evidence that ORs are not important for communication. This should be reworded.

      We removed “Nevertheless” from the beginning of the sentence.

      (8) 321 to 323: This sentence sounds as if something is missing. I suggest rewriting it.

      This sentence simply says that empty neuron Drosophila is a good tool for termite OR deorphanization and that termite ORs work well Drosophila ORCo. We reworded the sentence.

      (9) 323: I suggest starting a new paragraph.

      Corrected.

      (10) 421: How many colonies were used for each of the analyses?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (11) 430: Did the termites originate from one or multiple colonies and did the authors sample from the Florida and Cuba population?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (12) 501: How was the termite antenna fixated? The authors refer to the Drosophila methods, but given the large antennal differences between these species, more specific information would be helpful.

      Understood. We added the following information into the Methods section under “Electrophysiology”: “The grounding electrode was carefully inserted into the clypeus and the antenna was fixed on a microscope slide using a glass electrode. To avoid the antennal movement, the microscope slide was covered with double-sided tape and the three distal antennal segments were attached to the slide.”

      (13)509: I want to confirm that the authors indicate that the outlet of the glass tube with the airstream and odorant is 4 cm away from the Drosophila or termite antenna. The distance seems to be very large.

      Thank you for spotting this obvious mistake. The 4 cm distance applies for the distance between the opening for Pasteur pipette insertion into the delivery tube, the outlet itself is situated approx. 1 cm from the antenna. This information is now corrected.

      (14) 510/527: It looks like all odor panels were equally applied onto the filter paper despite the difference in solvent (hexane and paraffin oil). How was the solvent difference addressed?

      In our study we combine two types of odorant panels. First, we test on all four studied receptors a panel containing several compounds relevant for termite chemical communication including the C12 unsaturated alcohols, the diterpene neocembrene, the sesquiterpene (3R,6E)-nerolidol and other compounds. These compounds are stored in the laboratory as hexane solutions to prevent the oxidation/polymerization and it is not advisable to transfer them to another solvent. In the second step we used three additional panels of frequently occurring insect semiochemicals, which are stored as paraffin oil solutions, so as to address the breadth of PsimOR14 tuning. We are aware that the evaporation dynamics differ between the two solvents but we did not have any suitable option how to solve this problem. We believe that the use of the two solvents does not compromise the general message on the receptor specificity. For each panel, the corresponding solvent is used as a control. Similarly, the use of two different solvents for SSR can be encountered in other studies, e.g. 10.1016/j.celrep.2015.07.031.

      (15) 518: delta spikes/sec works for all tables except for the wild type in Table S5. I could not figure out how the authors get to delta spikes/sec in that table.

      Thank you for your sharp eye. Due to our mistake, the values of Δ spikes per second reported in Table S5 for W1118 were erroneously calculated using the formula for 0.5 sec stimulation instead of 1 sec. We corrected this mistake which does not impact the results interpretation in Table S5 and Fig. 2.

      522: Did the workers and soldiers originate from different colonies or different populations?

      We now clearly describe in the Material and Methods section the origin of termites for different experiments. EAG measurements were made using individuals (workers, soldiers) from one Cuban colony.

      (16) Figure 6C/D: I suggest matching colors between the two figures. For example, instead of using an orange circle in C and a green coloration of the intracellular flap in D, I recommend using blue, which is not used for something else. In addition, the binding pocket could be separated better from anything else in a different color.

      We agree that the color match for the intracellular flap was missing. This figure is now reworked and the colors should have a better match and the binding region is better delineated.

      (17) Figure 7/Table S15: It is unclear where the transcriptome data originate and what they are based on. Are these antennal transcriptomes or head transcriptomes? Do these data come from previous data sets or data generated in this study? Figure 7 refers to heads, Table S15 to workers and soldiers, and the methods only refer to antennal extractions. This should be clarified in the text, the figure, and the table.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified and that the information that the RNASeq originated from samples of heads+antennae of workers and soldiers should be provided at appropriate places. Therefore, we added more information on replicates and origin of the data in the Methods section (Bioinformatics) and make clear that this data comes from our previous research and refer to the corresponding bioproject. Likewise, the Figure 7 legend and Table S15 heading have been updated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to illuminate how legumes promote symbiosis with beneficial nitrogen-fixing bacteria while maintaining a general defensive posture towards the plethora of potentially pathogenic bacteria in their environment. Intriguingly, a protein involved in plant defence signalling, RIN4, is implicated as a type of 'gatekeeper' for symbiosis, connecting symbiosis signalling with defence signalling. Although questions remain about how exactly RIN4 enables symbiosis, the work opens an important door to new discoveries in this area.

      Strengths:

      The study uses a multidisciplinary, state-of-the-art approach to implicate RIN4 in soybean nodulation and symbiosis development. The results support the authors' conclusions.

      Weaknesses:

      No serious weaknesses, although the manuscript could be improved slightly from technical and communication standpoints.

      Reviewer #2 (Public Review):

      Summary:

      The study by Toth et al. investigates the role of RIN4, a key immune regulator, in the symbiotic nitrogen fixation process between soybean and rhizobium. The authors found that SymRK can interact with and phosphorylate GmRIN4. This phosphorylation occurs within a 15 amino acid motif that is highly conserved in Nfixation clades. Genetic studies indicate that GmRIN4a/b play a role in root nodule symbiosis. Based on their data, the authors suggest that RIN4 may function as a key regulator connecting symbiotic and immune signaling pathways.

      Overall, the conclusions of this paper are well supported by the data, although there are a few areas that need clarification.

      Strengths:

      This study provides important insights by demonstrating that RIN4, a key immune regulator, is also required for symbiotic nitrogen fixation.

      The findings suggest that GmRIN4a/b could mediate appropriate responses during infection, whether it is by friendly or hostile organisms.

      Weaknesses:

      The study did not explore the immune response in the rin4 mutant. Therefore, it remains unknown how GmRIN4a/b distinguishes between friend and foe.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Toth et al reveals a conserved phosphorylation site within the RIN4 (RPM1-interacting protein 4) R protein that is exclusive to two of the four nodulating clades, Fabales and Rosales. The authors present persuasive genetic and biochemical evidence that phosphorylation at the serine residue 143 of GmRIN4b, located within a 15-aa conserved motif with a core five amino acids 'GRDSP' region, by SymRK, is essential for optimal nodulation in soybean. While the experimental design and results are robust, the manuscript's discussion fails to clearly articulate the significance of these findings. Results described here are important to understand how the symbiosis signaling pathway prioritizes associations with beneficial rhizobia, while repressing immunity-related signals.

      Strengths:

      The manuscript asks an important question in plant-microbe interaction studies with interesting findings.

      Overall, the experiments are detailed, thorough, and very well-designed. The findings appear to be robust.

      The authors provide results that are not overinterpreted and are instead measured and logical.

      Weaknesses:

      No major weaknesses. However, a well-thought-out discussion integrating all the findings and interpreting them is lacking; in its current form, the discussion lacks 'boldness'. The primary question of the study - how plants differentiate between pathogens and symbionts - is not discussed in light of the findings. The concluding remark, "Taken together, our results indicate that successful development of the root nodule symbiosis requires cross-talk between NF-triggered symbiotic signaling and plant immune signaling mediated by RIN4," though accurate, fails to capture the novelty or significance of the findings, and left me wondering how this adds to what is already known. A clear conclusion, for eg, the phosphorylation of RIN4 isoforms by SYMRK at S143 modulates immune responses during symbiotic interactions with rhizobia, or similar, is needed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticism of the work, although it could be improved by addressing the following minor points:

      (1) Page 8, Figure 2 legend. Consider changing "proper symbiosis formation" to "normal nodulation" or something that better reflects control of nodule development/number.

      We thank you for the suggestion, the legend was changed to “...required for normal nodule formation” (see Page 10, revised manuscript)

      (2) Page 9. Cut "newly" from the first sentence of paragraph 2, as S143 phosphorylation was identified previously.

      Thank you for the suggestion, we removed “newly” from the sentence.

      (3) Page 10, Figure 3. Panels B showing green-fluorescent nodules are unnecessary given the quantitative data presented in the accompanying panel A. This goes for similar supplemental figures later.

      We appreciate the comment; regarding Figure 3 (complementing rin4b mutant, we updated the figures according to the other reviewer’s comment) and Suppl Figure 6 (OE phenotype of phospho-mimic/negative mutants), we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See Page 10, revised manuscript)

      (4) Consider swapping Figure 3 for Supplemental Figure S7, which I think shows more clearly the importance of RIN4 phosphorylation in nodulation.

      We appreciate the comment and have swapped the figures according to the reviewer’s suggestion. Legend, figure description, and manuscript text have been updated accordingly. (See page 12 and 38, revised manuscript)

      (5) Page 10. Replace "it will be referred to S143..." with "we refer to S143 instead of ....".

      We replaced it according to the comment.

      (6) Page 11, delete "While" from "While no interactions could be observed...".

      We deleted it according to the suggestion.

      (7) Page 33, Fig S5. How many biological replicates were performed to produce the data presented in panel C and what do the error bar and asterisk indicate? Check that this information is provided in all figures that show errors and statistical significance.

      Thank you for the remark. The experiment was repeated three times, and this note was added to the figure description. All the other figure legends with error bar(s) were checked whether replicates are indicated accordingly.

      (8) Page 37, Fig S11, panel B. Are averages of data from the 2 biological and 3 technical replicates shown? Add error bars and tests of significant difference.

      Averages of a total of 6 replicates (from 2 biological replicates, each run in triplicates) are shown. We thank the reviewer for pointing out the missing error bars and statistical test, we have updated the figure accordingly.

      (9) Fig S12. Why are panels A, C, E, and G presented? The other panels seem to show the same data more clearly- showing the linear relationship between peak area ratio and protein concentration.

      We have taken the reviewer’s comment into consideration and revised the figure, removing the calibration curves and showing only four panels. The figure legend has been corrected accordingly. (Please see page 43, revised masnuscript). The original figure (unlike other revised figures) had to be deleted from the revised manuscript,as it caused technical issues when converting the document into pdf.

      Reviewer #2 (Recommendations For The Authors):

      Some small suggestions:

      (1) It's good to include a protein schematic for RIN4 in Figure 1.

      We appreciate the reviewer’s suggestion and we have drawn a protein schematic and added it to Figure 1. The figure legend was updated accordingly.

      (2) There appears to be incorrect labeling in Figure 2c; please double-check and make the necessary corrections.

      With respect, we do not understand the comment about incorrect labeling. Would the reviewer please help us out and give more explanation? In Figure 2C, RIN4a and RIN4b expression was checked in transgenic roots expressing either EV (empty vector) or different silencing constructs targeting RIN4a/b.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed the level of detail and precision in experimental design.

      A discussion point could be - What does it mean that nodule number but not fixation is affected? Is RIN4 only involved in the entry stage of infection but not in nodules during N-fixation?

      Current/Our data suggest that RIN4 does indeed appear to be involved in infection. This hypothesis is supported by the findings that RIN4a/b was found phosphorylated in root hairs but not in root (or it was not detected in the root). The interaction with the early signaling RLKs also suggests that RIN4 is likely involved in the early stage of symbiosis formation.

      How would the authors explain their observation "However, the motif is retained in non-nodulating Fabales (such as C. canadensis, N. schottii; SI Appendix, Figure S2) and Rosales species as well." What does this imply about the role in symbiosis that the authors propose?

      We appreciate the reviewer’s question. The motif seems to be retained, however, it might be not only the motif but also the protein structure that in case of nodulating plants might be different. We have not investigated the structure of RIN4, how it would look based on certain features/upon interaction with another protein and/or post-translational modification(s). Griesman et al, (2018) showed the absence of certain genes within Fabales in non-nodulating species, we can speculate that these absent genes can’t interact with RIN4 in those species, therefore the lack of downstream signaling could be possible (in spite of the retained motif in non-nodulating species). At this point, there is not enough data or knowledge to further speculate.

      qPCR analysis of symbiotic pathway genes showed that both NIN-dependent and NIN-independent branches of the symbiosis signaling pathway were negatively affected in the rin4b mutant. Please derive a conclusion from this.

      We appreciate the comment, it also prompted us to correct the following sentence; original: “Since NIN is responsible for induction of NF-YA and ERN1 transcription factors, their reduced expression in rin4b plants was not unexpected (Fig. 5). “As ERN1 expression is independent of NIN (Kawaharada et al, 2017). The following sentences were also deleted as it represented a repetition of a statement above these sentences: “Soybean NF-YA1 homolog responded significantly to rhizobial treatment in rin4b plants, whereas NF-YA3 induction did not show significant induction (Fig. 5).“

      We added the following conclusion/hypothesis: “Based on the results of the expression data presented above, it seems that both NIN-dependent and NINindependent branches of the symbiotic signaling pathways are affected in the rin4b mutant background. This indicates that the role of RIN4 protein in the symbiotic pathway can be placed upstream of CYCLOPS, as the CYCLOPS transcription activating complex is responsible (directly or indirectly) for the activation of all TFs tested in our expression analysis (Singh et al, 2014/47, 48).” (Please see Page 16, revised manuscript)

      The authors are highly encouraged to write a thoughtful discussion that would accompany the detailed experimental work performed in this manuscript.

      We appreciate the comment, and we did some work on the discussion part of the document. (Please see Pages 17-19, revised manuscript)

      Some minor suggestions for overall readability are below.

      What about immune signaling genes? Given that authors hypothesize that "Absence of AtRIN4 leads to increased PTI responses and, therefore, it might be that GmRIN4b absence also causes enhanced PTI which might have contributed to significantly fewer nodules." Could check marker immune signaling gene expression FLS2 and others.

      We appreciate the reviewer’s comment, and while we believe those are very interesting questions/suggestions, answering them is out of the scope of the current manuscript. Partially because it has been shown that several defenseresponsive genes that were described in leaf immune responses could not be confirmed to respond in a similar manner in root (Chuberre et al., 2018). It was also shown that plant immune responses are compartmentalized and specialized in roots (Chuberre et al., 2018). If we were looking at immune-responsive genes, the signal might be diluted because of its specialized and compartmentalized nature. Another reason why these questions cannot be answered as a part of the current manuscript is because finding a suitable immune responsive gene would require rigorous experiments (not only in root, but also in root hair (over a timecourse) which would be a ground work for a separate study (root hair isolation is not a trivial experiment, it requires at least 250-300 seedlings per treatment/per time-point).

      Regarding FLS2, it is known in Arabidopsis that its expression is tissue-specific within the root, and it seems that FLS2 expression is restricted to the root vasculature (Wyrsch et al, 2015). In our manuscript, we showed that RIN4a/b is highly expressed in root hairs, as well as RIN4 phosphorylation was detectable in root hair but not in the root; therefore, we do not see the reason to investigate FLS2 expression.

      "in our hands only ERN1a could be amplified. One possible explanation for this observation is that primers were designed based on Williams 82 reference genome, while our rin4b mutant was generated in the Bert cultivar background." Is the sequence between the two cultivars and the primers that bind to ERN1b in both cultivars so different? If not, this explanation is not very convincing.

      At the time of performing the experiment the genomic sequence of the Bert cultivar (used for generating rin4b edited lines) was not publicly available. In accordance with the reviewer’s comment, we removed the explanation, as it does not seem to be relevant. (See page 16, revised manuscript)

      The figures are clear and there is a logical flow. The images of fluorescing nodules in Figure 2,3 panels with nodules are not informative or unbiased .

      We appreciate the comment, as for Figure 3 (complementing rin4b mutant), we updated the figures according to the other reviewer’s comment and Suppl. Figure 6 (OE phenotype of phospho-mimic/negative mutants) we removed the panels showing the micrographs. At the same time, we did not modify Figure 2 (where micrographs showing transgenic roots carrying the silencing constructs) for the sake of figure completeness. (See pages 10, 12 and 38, revised manuscript)

      What does the exercise in isolation of rin4 mutants in lotus tell us? Is it worth including?

      Isolation of the Ljrin4 mutant suggests that RIN4 carries such an importance that the mutant version of it is lethal for the plant (as in Arabidospis, where most of the evidence regarding the role of RIN4 has been described), and an additional piece of evidence that RIN4 is similarly crucial across most land plant species.

      Sentence ambiguous. "Co-expression of RIN4a and b with SymRKßΔMLD and NFR1α _resulted in YFP fluorescence detected by Confocal Laser Scanning Microscopy (SI Appendix, Figure S8) suggesting that RIN4a and b proteins closely associate with both RLKs." Were all 4 expressed together?

      Thank you for the remark. Not all 4 proteins were co-expressed together. We adjusted the sentence as follows: “Co-expression of RIN4a/ and b with SymRKßΔMLD as well as and NFR1α resulted in YFP fluorescence…” I hope it is phrased in a clearer way. (See page 13, revised manuscript)

      Minor spelling errors throughout.. Costume-made (custom made?)

      Thank you for noticing. According to the Cambridge online dictionary, it is written with a hyphen, therefore, we added a hyphen and corrected the manuscript accordingly.

      CRISPR-cas9 or CRISPR/Cas9? Keep it consistent throughout. CRISPR-cas9 is the latest consensus.

      We corrected it to “CRISPR-Cas9” throughout the manuscript.

      References are missing for several 'obvious statements' but please include them to reach a broader audience. For example the first 5 sentences of the introduction. Also, statements such as 'Root hairs are the primary entry point for rhizobial infection in most legumes.'.

      Thank you for the comment. To make it clearer, we also added reference #1, after the third sentence of the introduction, as well as we added an additional review as reference. This additional review was also cited as the source for the sentence “Root hairs are the primary…” (Please see page 2, revised manuscript)

      Can you provide a percent value? Silencing of RIN4a and RIN4b resulted in significantly reduced nodule numbers on soybean transgenic roots in comparison to transgenic roots carrying the empty vector control. Also, this wording suggests it was a double K.D. but from the images, it appears they were individually silenced.

      We appreciate the reviewer's comment. We observed a 50-70% reduction in the number of nodules. We adjusted the text according to the reviewer's remark. (See page 9, revised manuscript)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      This manuscript reports preliminary evidence of successful optogenetic activation of single retinal ganglion cells (RGCs) through the eye of a living monkey using adaptive optics (AO).

      Strengths

      The eventual goals of this line of research have enormous potential impact in that they will probe the perceptual impact of activating single RGCs. While I think more data should be included, the four examples shown look quite convincing. Weaknesses

      While this is undoubtedly a technical achievement and an important step along this group's stated goal to measure the perceptual consequences of single-RGC activations, the presentation lacks the rigor that I would expect from what is really a methods paper. In my view, it is perfectly reasonable to publish the details of a method before it has yielded any new biological insights, but in those publications, there is a higher burden to report the methodological details, full data sets, calibrations, and limitations of the method. There is considerable room for improvement in reporting those aspects. Specifically, more raw data should be shown for activations of neighboring RGCs to pinpoint the actual resolution of the technique, and more than two cells (one from each field of view) should be tested.

      We have expanded sections discussing both the methodology and limitations of this technique via a rewrite of the results and discussion section. The data used in the paper is available online via the link provided in the manuscript. We agree that a more detailed investigation of the strengths and limitations of the approach would have been a laudable goal. However, before returning to more detailed studies, we have shifted our effort to developing the monkey psychophysical performance we need to combine with the single cell stimulation approach described here. In addition, the optogenetic ChrimsonR used in this study is not the best choice for this experiment because of its poor sensitivity. We are currently exploring the use of ChRmine (as described in lines 93-97), which is roughly 2 orders of magnitude more sensitive. We have also been working on methods to improve probe stabilization to reduce tracking errors during eye movements. Once these improvements have been implemented, we will undertake the more detailed studies suggested here. Nonetheless, as a pragmatic matter, we submit that it is valuable to document proof-of-concept with this manuscript.

      Some information about the density of labeled RGCs in these animals would also be helpful to provide context for how many well-isolated target cells exist per animal.

      We agree. Getting reliable information about labeled cell density would be difficult without detailed histology of the retina, which we are reluctant to do because it would require sacrificing these precious and expensive monkeys from which we continue to get valuable information. We are actively exploring methods to reduce the cell density to make isolation easier including the use of the CAMKII promoter as well as the use of intracranial injections via AAV.retro that would allow calcium indicator expression in the peripheral retina where RGCs form a monolayer. It may be that the rarity of isolated RGCS will not be a fundamental limitation of the approach in the future.

      Reviewer #2 (Public Review):

      This proof-of-principle study lays important groundwork for future studies. Murphy et al. expressed ChrimsonR and GCaMP6s in retinal ganglion cells of a living macaque. They recorded calcium responses and stimulated individual cells, optically. Neurons targeted for stimulation were activated strongly whereas neighboring neurons were not.

      The ability to record from neuronal populations while simultaneously stimulating a subset in a controlled way is a high priority for systems neuroscience, and this has been particularly challenging in primates. This study marks an important milestone in the journey towards this goal.

      The ability to detect stimulation of single RGCs was presumably due to the smallness of the light spot and the sparsity of transduction. Can the authors comment on the importance of the latter factor for their results? Is it possible that the stimulation protocol activated neurons nearby the targeted neuron that did not express GCaMP? Is it possible that off-target neurons near the targeted neuron expressed GCaMP, and were activated, but too weakly to produce a detectable GCaMP signal? In general, simply knowing that off-target signals were undetectable is not enough; knowing something about the threshold for the detection of off-target signals under the conditions of this experiment is critical.

      We agree with these points. We cannot rule out the possibility that some nearby cells were activated but we could not detect this because they did not express GCaMP. We also do not know whether cells responded but our recording methods were not sufficiently sensitive to detect them. A related limitation is that we do not know of course what the relationship is between the threshold for detection with calcium imaging and what the psychophysical detection threshold would have been an awake behaving monkey. Nonetheless, the data show that we can produce a much larger response in the target cell than in nearby cells whose response we can measure, and we suggest that that is a valuable contribution even if we can’t argue that the isolation is absolute. We’ve acknowledged these important limitations in the revised manuscript in lines 66-77.

      Minor comments:

      Did the lights used to stimulate and record from the retina excite RGCs via the normal lightsensing pathway? Were any such responses recorded? What was their magnitude?

      The recording light does activate the normal light-sensing pathway to some extent, although it does not fall upon the RGC receptive fields directly. There was a 30 second adaptation period at the beginning of each trial to minimize the impact of this on the recording of optogeneticallymediated responses, as described in lines 222-224. The optogenetic probe does not appear to significantly excite the cone pathway, and we do not see the expected off-target excitations that would result from this.

      The data presented attest to a lack of crosstalk between targeted and neighboring cells. It is therefore surprising that lines 69-72 are dedicated to methods for "reducing the crosstalk problem". More information should be provided regarding the magnitude of this problem under the current protocol/instrumentation and the techniques that were used to circumvent it to obtain the data presented.

      The “crosstalk problem” referred to in this quote refers to crosstalk caused by targeting cells at higher eccentricities that are more densely packed, which are not represented in the data. The data presented is limited to the more isolated central RGCs.

      Optical crosstalk could be spatial or spectral. Laying out this distinction plainly could help the reader understand the issues quickly. The Methods indicate that cells were chosen on the basis that they were > 20 µm from their nearest (well-labeled) neighbor to mitigate optical crosstalk, but the following sentence is about spectral overlap.

      We have added a clearer explanation of what precisely we mean by crosstalk in lines 213-221.

      Figure 2 legend: "...even the nearby cell somas do not show significantly elevated response (p >> 0.05, unpaired t-test) than other cells at more distant locations." This sentence does not indicate how some cells were classified as "nearby" whereas others were classified as being "at more distant locations". Perhaps a linear regression would be more appropriate than an unpaired t-test here.

      The distinction here between “nearby” and “more distant” is 50 µm. We have clarified this in the figure caption. Performing a linear regression on cell response over distance shows a slight downward trend in two of the four cells shown here, but this trend does not reach the threshold of significance.

      Line 56: "These recordings were... acquired earlier in the session where no stimulus was present." More information should be provided regarding the conditions under which this baseline was obtained. I assume that the ChrimsonR-activating light was off and the 488 nmGCaMP excitation light was on, but this was not stated explicitly. Were any other lights on (e.g. room lights or cone-imaging lights)? If there was no spatial component to the baseline measurement, "where" should be "when".

      Your assumptions are correct. There was no spatial component to the baseline measurement, and these measurements are explained in more detail in lines 240-243.

      Please add a scalebar to Figure 1a to facilitate comparison with Figure 2.

      This has been done.

      Lines 165-173: Was the 488 nm light static or 10 Hz-modulated? The text indicates that GCaMP was excited with a 488 nm light and data were acquired using a scanning light ophthalmoscope, but line 198 says that "the 488 nm imaging light provides a static stimulus".

      The 488nm is effectively modulated at 25 Hz by the scanning action of the system. I believe the 10 Hz modulated you speak of is the closed-loop correction rate of the adaptive optics. The text has been updated in lines 217-219 to clarify this.

      A potential application of this technology is for the study of visually guided behavior in awake macaques. This is an exciting prospect. With that in mind, a useful contribution of this report would be a frank discussion of the hurdles that remain for such application (in addition to eye movements, which are already discussed).

      Lines 109-130 now offer an expanded discussion of this topic.

      Reviewer #3 (Public Review):

      This paper reports a considerable technical achievement: the optogenetic activation of single retinal ganglion cells in vivo in monkeys. As clearly specified in the paper, this is an important step towards causal tests of the role of specific ganglion cell types in visual perception. Yet this methodological advance is not described currently in sufficient detail to replicate or evaluate. The paper could be improved substantially by including additional methodological details. Some specific suggestions follow.

      The start of the results needs a paragraph or more to outline how you got to Figure 1. Figure 1 itself lacks scale bars, and it is unclear, for example, that the ganglion cells targeted are in the foveal slope.

      The results have been rewritten with additional explanation of methodology and the location of the RGCs has been clarified.

      The text mentions the potential difficulties targeting ganglion cells at larger eccentricities where the soma density increases. If this is something that you have tried it would be nice to include some of that data (whether or not selective activation was possible). Related to this point, it would be helpful to include a summary of the ganglion cell density in monkey retina.

      This is not something we tried, as we knew that the axial resolution allowed by the monkey’s eye would result in an axial PSF too large to only hit a single cell. The overall ganglion cell density is less relevant than the density of cells expressing ChrimsonR/GCaMP, which we only have limited info about without detailed histology.

      Related to the point in the previous paragraph - do you have any experiments in which you systematically moved the stimulation spot away from the target ganglion cell to directly test the dependence of stimulation on distance? This would be a valuable addition to the paper.

      We agree that this would have been a valuable addition to the paper, but we are reluctant to do them now. We are implementing an improved method to track the eye and a better optogenetic agent in an entirely new instrument, and we think that future experiments along these lines would be best done when those changes are completed.

      The activity in Figure 1 recovers from activation very slowly - much more slowly than the light response of these cells, and much more slowly than the activity elicited in most optogenetic studies. Can you quantify this time course and comment on why it might be so slow?

      We attribute the slow recovery to the calcium dynamics of the cell, and this slow recovery time is consistent with calcium responses seen in our lab elicited via the cone pathway. Similar time courses can be seen in Yin (2013) for RGCs excited via their cone inputs.

      Traces from non-targeted cells should be shown in Figure 1 along with those of targeted cells.

      We have added this as part of Figure 2.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Although we have no further revisions on the manuscript, we would like to respond to the remaining comments from the reviewers as follows.

      Reviewer 1:

      The authors have addressed some concerns raised in the initial review but some remain. In particular it is still unclear what conclusions can be drawn about taskrelated activity from scans that are performed 30 minutes after the behavioral task. I continue to think that a reorganization/analysis data according to event type would be useful and easier to interpret across the two brain areas, but the authors did not choose to do this. Finally, switching the cue-response association, I am convinced, would help to strengthen this study.

      As for the task-related activity, the strategy for PET scan was explained in our response to the comment 2 from Reviewer 2. Briefly, rats receive intravenous administration of 18F-FDG solution before the start of the behavioral session. The 18FFDG uptake into the cells starts immediately and reaches the maximum level until 30 min, being kept at least for 1 h. A 30-min PET scan is executed 25 min after the session. Therefore, the brain activity reflects the metabolic state during task performance in rats.

      Regarding data presentation of the electrophysiological experiments, we described the subpopulations of event-related neurons showing notable neuronal activity patterns in the order of aDLS and pVLS, according to the procedure of explanations for the behavioral study

      For switching the cue-response association, we mentioned the difference in firing activity between HR and LL trials, suggesting that different combinations between the stimulus and response may affect the level of firing activity. As suggested by the reviewer, an examination of switching the cue-response association is useful to confirm our interpretation. We will address this issue in our future studies.

      Reviewer 2:

      The authors have made important revisions to the manuscript and it has improved in clarity. They also added several figures in the rebuttal letter to answer questions by the reviewers. I would ask that these figures are also made public as part of the authors' response or if not, included in the manuscript.

      We will present the figures publicly available as part of our response.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e., 13B onto 13A, or among each other, i.e., 13As onto other 13As, and/or onto leg motoneurons, i.e., 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories, with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to a few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly affect leg grooming. As well as activating or silencing subpopulations, i.e., 3 to 6 elements of the 13A and 13B groups, has marked effects on leg grooming, including frequency and joint positions, and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e., feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e., grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition, and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects the generation of the motor behavior, thereby exemplifying their important role in generating grooming.

      We thank the reviewer for their thoughtful and constructive evaluation of our work. We are encouraged by their recognition of the major contributions of our study, including the identification of multiple inhibitory circuit motifs and their contribution to organizing rhythmic leg grooming behavior. We also appreciate the reviewer’s comments highlighting our use of connectomics, targeted manipulations, and modeling to reveal how distinct subsets of inhibitory interneurons contribute to motor behavior.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow for differentiation between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so, open loop experiments, e.g., in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      We appreciate the reviewer’s point regarding the role of sensory feedback in our experimental design. We agree that reafferent (sensory) input from ongoing movements could contribute to the behavioral outcomes of our optogenetic manipulations. However, our aim was not to isolate central versus peripheral contributions, but rather to assess the role of 13A/B neurons within the intact, operational sensorimotor system during natural grooming behavior.

      These inhibitory neurons form recurrent loops, synapse onto motor neurons, and receive proprioceptive input—placing them in a position to both shape central motor output and process sensory feedback. As such, manipulating their activity engages both central control and sensory consequences.

      The finding that silencing 13A neurons in dusted flies disrupts rhythmic leg coordination highlights their role in organizing grooming movements. Prior studies (e.g., Ravbar et al., 2021) show that grooming rhythms persist when sensory input is reduced, indicating a central origin, while sensory feedback refines timing, coordination, and long-timescale stability. We concluded that rhythmicity arises centrally but is shaped and stabilized by mechanosensory or proprioceptive feedback. Our current results are consistent with this view and support a model in which inhibitory premotor neurons participate in a closed-loop control architecture that generates and tunes rhythmic output.

      While we agree that fully removing sensory feedback and parsing distinct roles for neurons that participate in multiple circuit motifs would be desirable, we do not see a plausible experimental path to accomplish this - we would welcome suggestions!

      We considered the method used by Mendes and Mann (eLife 2023) to assess sensory feedback to walking, 5-40-GAL4, DacRE-flp, UAS->stop>TNT + 13A/B-spGAL4 X UAS-csChrimson. This would require converting one targeting system to LexA and presents significant technical challenges. More importantly, we believe the core interpretation issue would remain: broadly silencing proprioceptors would produce pleiotropic effects and impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input.

      We will clarify in the revised manuscript that our behavioral experiments were performed in freely moving flies under closed-loop conditions. We thank the reviewer for highlighting these important considerations and will revise the manuscript to better communicate the scope and interpretation of our findings.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      We thank the reviewer for their thoughtful and encouraging evaluation of our work. We are especially grateful for their recognition of our detailed connectome analysis and its contribution to understanding the organization of premotor inhibitory circuits. We appreciate the reviewer’s comments highlighting the integration of connectomics with optogenetic perturbations to functionally interrogate the 13A and 13B circuits, as well as their recognition of our modeling approach as a valuable framework for linking circuit architecture to behavior.

      Weaknesses:

      (1) In Figure 4, while the authors report statistically significant shifts in both proximal inter-leg distance and movement frequency across conditions, the distributions largely overlap, and only in Panel K (13B silencing) is there a noticeable deviation from the expected 7-8 Hz grooming frequency. Could the authors clarify whether these changes truly reflect disruption of the grooming rhythm?

      We are re-analyzing the whole dataset in the light of the reviews (specifically, we are now applying LMM to these statistics). For the panels in question (H-J), there is indeed a large overlap between the frequency distributions, but the box plots show median and quartiles, which partially overlap. (In the current analysis, as it stands, differences in means were small yet significant.) However, there is a noticeable (not yet quantified) difference in variability between the frequencies (the experimental group being the more variable one). If the activations/deactivations of 13A/B circuits disrupt the rhythm, we would indeed expect the frequencies to become more variable. So, in the revised version we will quantify the differences in both the means and the variabilities, and establish whether either shows significance after applying the LMM.

      More importantly, all this data would make the most sense if it were performed in undusted flies (with controls) as is done in the next figure.

      In our assay conditions, undusted flies groom infrequently. We used undusted flies for some optogenetic activation experiments, where the neuron activation triggers behavior initiation, but we chose to analyze the effect of silencing inhibitory neurons in dusted flies because dust reliably activates mechanosensory neurons and elicits robust grooming behavior, enabling us to assess how manipulation of 13A/B neurons alters grooming rhythmicity and leg coordination.

      (2) In Figure 4-Figure Supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (3) For broader lines targeting six or more 13A neurons, the authors provide specific predictions about expected behavioral effects-e.g., that activation should bias the limb toward flexion and silencing should bias toward extension based on connectivity to motor neurons. Yet, when using the more restricted line labeling only two 13A neurons (Figure 4 - Figure Supplement 2), no such prediction is made. The authors report disrupted grooming but do not specify whether the disruption is expected to bias the movement toward flexion or extension, nor do they discuss the muscle target. This is a missed opportunity to apply the same level of mechanistic reasoning that was used for broader manipulations.

      While we know which two neurons are labeled based on confocal expression, assigning their exact identity in the EM datasets has been challenging. One of these neurons appears absent from our 13A reconstructions of the right T1 neuropil in FANC, although we did locate it in MANC. However, its annotation in MANC has undergone multiple revisions, making confident assignment difficult at this time. Since we can’t be sure which motor neurons and muscles are most directly connected, we did not want to predict this line’s effect on leg movements.

      (4) Regarding Figure 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing that the authors get the behavior! It would still be important for the authors to mention the optogenetics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also surprised - and intrigued - by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We tried several different activation paradigms: pulsed from 8Hz to 500Hz and with various on/off intervals. Because several of these different stimulation protocols resulted in grooming, and with different rhythmic frequencies, we think the phenotypes are a specific property of the neural circuits we have activated, rather than the kinetics of CsChrimson itself.

      We will include the data from other frequencies in a new Supplementary Figure, we will discuss the caveats CsChrimson’s slow off-kinetics present to precise temporal control of neural activity, and we will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Thank you!

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study, in its current form, makes an important but overclaimed contribution to the literature due to a mismatch between the claims in the paper and the data presented.

      Strengths:

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      (1) They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      (2) They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      (3) They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      We appreciate the reviewer’s thorough and constructive feedback on our work. We are encouraged by their recognition of the complementary approaches used in our study.

      Weaknesses:

      The manuscript aims to reveal an instructive, rhythm-generating role for premotor inhibition in coordinating the multi-joint leg synergies underlying grooming. It makes a valuable contribution, but currently, the main claims in the paper are not well-supported by the presented evidence.

      Major points

      (1) Starting with the title of this manuscript, "Inhibitory circuits generate rhythms for leg movements during Drosophila grooming", the authors raise the expectation that they will show that the 13A and 13B hemilineages produce rhythmic output that underlies grooming. This manuscript does not show that. For instance, to test how they drive the rhythmic leg movements that underlie grooming requires the authors to test whether these neurons produce the rhythmic output underlying behavior in the absence of rhythmic input. Because the optogenetic pulses used for stimulation were rhythmic, the authors cannot make this point, and the modelling uses a "black box" excitatory network, the output of which might be rhythmic (this is not shown). Therefore, the evidence (behavioral entrainment; perturbation effects; computational model) is all indirect, meaning that the paper's claim that "inhibitory circuits generate rhythms" rests on inferred sufficiency. A direct recording (e.g., calcium imaging or patch-clamp) from 13A/13B during grooming - outside the scope of the study - would be needed to show intrinsic rhythmogenesis. The conclusions drawn from the data should therefore be tempered. Moreover, the "black box" needs to be opened. What output does it produce? How exactly is it connected to the 13A-13B circuit?

      We will modify the title to better reflect our strongest conclusions: “Inhibitory circuits coordinate rhythmic leg movements during Drosophila grooming”

      Our optogenetic activation was delivered in a patterned (70 ms on/off) fashion that entrains rhythmic movements but does not rule out the possibility that the rhythm is imposed externally. In the manuscript, we state that we used pulsed light to mimic a flexion-extension cycle and note that this approach tests whether inhibition is sufficient to drive rhythmic leg movements when temporally patterned. While this does not prove that 13A/13B neurons are intrinsic rhythm generators, it does demonstrate that activating subsets of inhibitory neurons is sufficient to elicit alternating leg movements resembling natural grooming and walking.

      Our goal with the model was to demonstrate that it is possible to produce rhythmic outputs with this 13A/B circuit, based on the connectome. The “black box” is a small recurrent neural network (RNN) consisting of 40 neurons in its hidden layer. The inputs are the “dust” levels from the environment (the green pixels in Figure 6I), the “proprioceptive” inputs (“efference copy” from motor neurons), and the amount of dust accumulated on both legs. The outputs (all positive) connect to the 13A neurons, the 13B neurons, and to the motor neurons. We refer to it as the “black box” because we make no claims about the actual excitatory inputs to these circuits. Its function is to provide input, needed to run the network, that reflects the distribution of “dust” in the environment as well as the information about the position of the legs.

      The output of the “black box” component of the model might be rhythmic. In fact, in most instances of the model implementation this is indeed the case. However, as mentioned in the current version of the manuscript: “But the 13A circuitry can still produce rhythmic behavior even without those external sensory inputs (or when set to a constant value), although the legs become less coordinated.” Indeed, when we refine the model (with the evolutionary training) without the “black box” (using a constant input of 0.1) the behavior is still rhythmic and sustained. Therefore, the rhythmic activity and behavior can emerge from the premotor circuitry itself without a rhythmic input.

      The context in which the 13A and 13B hemilineages sit also needs to be explained. What do we know about the other inputs to the motorneurons studied? What excitatory circuits are there?

      We agree that there are many more excitatory and inhibitory, direct and indirect, connections to motor neurons that will also affect leg movements for grooming and walking. Our goal was to demonstrate what is possible from a constrained circuit of inhibitory neurons that we mapped in detail, and we hope to add additional components to better replicate the biological circuit as behavioral and biomechanical data is obtained by us and others. We will add this clarification of the limits of the scope to the Discussion.

      Furthermore, the introduction ignores many decades of work in other species on the role of inhibitory cell types in motor systems. There is some mention of this in the discussion, but even previous work in Drosophila larvae is not mentioned, nor crustacean STG, nor any other cell types previously studied. This manuscript makes a valuable contribution, but it is not the first to study inhibition in motor systems, and this should be made clear to the reader.

      We thank the reviewer for this important reminder and we will expand our discussion of the relevant history and context in our revision. Previous work on the contribution of inhibitory neurons to invertebrate motor control certainly influenced our research and we should acknowledge this better.

      (2) The experimental evidence is not always presented convincingly, at times lacking data, quantification, explanation, appropriate rationales, or sufficient interpretation.

      We are committed to improving the clarity, rationale, and completeness of our experimental descriptions. We will revisit the statistical tests applied throughout the manuscript and expand the Methods.

      (3) The statistics used are unlike any I remember having seen, essentially one big t-test followed by correction for multiple comparisons. I wonder whether this approach is optimal for these nested, high‐dimensional behavioral data. For instance, the authors do not report any formal test of normality. This might be an issue given the often skewed distributions of kinematic variables that are reported. Moreover, each fly contributes many video segments, and each segment results in multiple measurements. By treating every segment as an independent observation, the non‐independence of measurements within the same animal is ignored. I think a linear mixed‐effects model (LMM) or generalized linear mixed model (GLMM) might be more appropriate.

      We thank the reviewer for raising this important point regarding the statistical treatment of our segmented behavioral data. Our initial analysis used independent t-tests with Bonferroni correction across behavioral classes and features, which allowed us to identify broad effects. However, we acknowledge that this approach does not account for the nested structure of the data. To address this, we will re-analyze key comparisons using linear mixed-effects models (LMMs) as suggested by the reviewer. This approach will allow us to more appropriately model within-fly variability and test the robustness of our conclusions. We will update the manuscript based on the outcomes of these analyses.

      (4) The manuscript mentions that legs are used for walking as well as grooming. While this is welcome, the authors then do not discuss the implications of this in sufficient detail. For instance, how should we interpret that pulsed stimulation of a subset of 13A neurons produces grooming and walking behaviours? How does neural control of grooming interact with that of walking?

      We do not know how the inhibitory neurons we investigated will affect walking or how circuits for control of grooming and walking might compete. We speculate that overlapping pre-motor circuits may participate in walking and grooming because both behaviors have extension flexion cycles at similar frequencies, but we do not have hard experimental data to support. This would be an interesting area for future research. Here, we focused on the consequences of activating specific 13A/B neurons during grooming because they were identified through a behavioral screen for grooming disruptions, and we had developed high-resolution assays and familiarity with the normal movements in this behavior. We will clarify this rationale in the revised discussion.

      (5) The manuscript needs to be proofread and edited as there are inconsistencies in labelling in figures, phrasing errors, missing citations of figures in the text, or citations that are not in the correct order, and referencing errors (examples: 81 and 83 are identical; 94 is missing in text).

      We will carefully proofread the manuscript to fix all figure labeling, citation order, and referencing errors.

    1. Brandt he made a statement that yesterday's breakthrough is today's graduate seminar is tomorrow's off-the-shelf home entertainment would that that were true but in fact yesterday's breakthrough is the thing that is most often forgotten it's not today's graduate seminar and it's definitely not tomorrow's home entertainment because what usually happens in the the grand tradition of Hollywood producers sitting around a table looking at a script and saying hey we've got ideas too too often people it's not a question of not invented here it's a question if I want to invent this myself and the if you look at the micro computers that are being sold today you'll see hardly anything that approximates the kind of total system design that the link had I think we all can realize that they are not sold as complete packages they're not sold as Honda's most of the things that you need can't even be plugged into them there are millions of wires to worry about and so forth but what is that may not be too surprising because after all that was a garage culture perhaps less forgivable

      we're not doing any better a computer out of the box doesn't do anything today without a whole set of payed software and subscriptions..

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Work by Brosseau et. al. combines NMR, biochemical assays, and MD simulations to characterize the influence of the C-terminal tail of EmrE, a model multi-drug efflux pump, on proton leak. The authors compare the WT pump to a C-terminal tail deletion, delta_107, finding that the mutant has increased proton leak in proteoliposome assays, shifted pH dependence with a new titratable residue, faster-alternating access at high pH values, and reduced growth, consistent with proton leak of the PMF.

      Strengths:

      The work combines thorough experimental analysis of structural, dynamic, and electrochemical properties of the mutant relative to WT proteins. The computational work is well aligned in vision and analysis. Although all questions are not answered, the authors lay out a logical exploration of the possible explanations.

      Weaknesses:

      There are a few analyses that are missing and important data left out. For example, the relative rate of drug efflux of the mutant should be reported to justify the focus on proton leak. Additionally, the correlation between structural interactions should be directly analyzed and the mutant PMF also analyzed to justify the claims based on hydration alone. Some aspects of the increased dynamics at high pH due to a potential salt bridge are not clear.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores the role of the C-terminal tail of EmrE in controlling uncoupled proton flux. Leakage occurs in the wild-type transporter under certain conditions but is amplified in the C-terminal truncation mutant D107. The authors use an impressive combination of growth assays, transport assays, NMR on WT and mutants with and without key substrates, classical MD, and reactive MD to address this problem. Overall, I think that the claims are well supported by the data, but I am most concerned about the reproducibility of the MD data, initial structures used for simulations, and the stochasticity of the water wire formation. These can all be addressed in a revision with more simulations as I point out below. I want to point out that the discussion was very nicely written, and I enjoyed reading the summary of the data and the connection to other studies very much.

      Strengths:

      The Henzler-Wildman lab is at the forefront of using quantitative experiments to probe the peculiarities in transporter biophysics, and the MD work from the Voth lab complements the experiments quite well. The sheer number of different types of experimental and computational approaches performed here is impressive.

      Weaknesses:

      The primary weaknesses are related to the reproducibility of the MD results with regard to the formation of water wires in the WT and truncation mutant. This could be resolved with simulations starting from structures built using very different loops and C-terminal tails.

      The water wire gates identified in the MD should be tested experimentally with site-directed mutagenesis to determine if those residues do impact leak.

      We appreciate the reviewers thoughtful consideration of our manuscript, and their recognition of the variety of experimental and computational approaches we have brought to bear in probing the very challenging question of uncoupled proton leak through EmrE.

      We did record SSME measurements with MeTPP+, a small molecule substrate at two different protein:lipid ratios. These experiments report the rate of net flux when both proton-coupled substrate antiport and substrate-gated proton leak are possible. We will add this data to the revision, including data acquired with different lipid:protein ratio that confirms we are detecting transport rather than binding. In brief, this data shows that the net flux is highly dependent on both proton concentration (pH) and drug-substrate concentration, as predicted by our mechanistic model. This demonstrates that both types of transport contribute to net flux when small molecule substrates are present.

      In the absence of drug-substrate, proton leak is the only possible transport pathway. The pyranine assay directly assesses proton leak under these conditions and unambiguously shows faster proton entry into proteoliposomes through the ∆107-EmrE mutant than through WT EmrE, with the rate of proton entry into ∆107-EmrE proteoliposomes matching the rate of proton entry achieved by the protonophore CCCP. We have revised the text to more clearly emphasize how this directly measures proton leak independently of any other type of transport activity. The SSME experiments with a proton gradient only (no small molecule substrate present) provide additional data on shorter timescales that is consistent with the pyranine data. The consistency of the data across multiple LPRs and comparison of transport to proton leak in the SSME assays further strengthens the importance of the C-terminal tail in determining the rate of flux.

      None of the current structural models have good resolution (crystallography, EM) or sufficient restraints (NMR) to define the loop and tail conformations sufficiently for comparison with this work. We are in the process of refining an experimental structure of EmrE with better resolution of the loop and tail regions implicated in proton-entry and leak. Direct assessment of structural interactions via mutagenesis is complicated because of the antiparallel homodimer structure of EmrE. Any point mutation necessarily affects both subunits of the dimer, and mutations designed to probe the hydrophobic gate on the more open face of the transporter also have the potential to disrupt closure on the opposite face, particularly in the absence of sufficient resolution in the available structures. Thus, mutagenesis to test specific predicted structural features is deferred until our structure is complete so that we can appropriately interpret the results.

      In our simulation setup, the MD results can be considered representative and meaningful for two reasons. First, the C-terminal tail, not present in the prior structure and thus modeled by us, is only 4 residues long. We will show in the revision and detailed response that the system will lose memory of its previous conformation very quickly, such that velocity initialization alone is enough for a diverse starting point. Second, our simulation is more like simulated annealing, starting from a high free energy state to show that, given such random initialization, the tail conformation we get in the end is consistent with what we reported. It is also difficult to sample back-and-forth tail motion within a realistic MD timescale. Therefore, it can be unconclusive to causally infer the allosteric motions with unbiased MD of the wildtype alone. The best viable way is to look at the equilibrium statistics of the most stable states between WT- and ∆107-EmrE and compare the differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is well done and well presented. In my opinion, the authors must address the following questions.

      (1) It is unclear to a non-SSME-expert, why the net charge translocated in delta_107 is larger than in WT. For such small pH gradients (0.5-1pH unit), it seems that only a few protons would leave the liposome before the internal pH is adjusted to be the same as the external. This number can be estimated given the size of the liposomes. What is it? Once the pH gradient is dissipated, no more net proton transport should be observed. So, why would more protons flow out of the mutant relative to WT?

      We appreciate the complexity of both the system and assay and have made revisions to both the main text and SI to address these points more clearly. While we can estimate liposomes size, we cannot easily quantify the number of liposomes on the sensor surface so cannot calculate the amount of charge movement as suggested by the reviewer. We have revised Fig. 3.2 and added additional data at low and high pH with different lipid to protein ratios to distinguish pre-steady state (proton release from the protein) and steady state processes (transport). An extended Fig. 3.2 caption and revised discussion in the main text clarify these points.

      We have also revised SI figure 3.2 to include an example of transport driven by an infinite drug gradient. Drug-proton antiport results in net charge build-up in the liposome since two protons will be driven out for every +1 drug transported in. This also creates a pH gradient is created (higher proton concentration outside). The negative inside potential inhibits further antiport of drug. However, both the negative-inside potential and proton gradient will drives protons back into the liposome if there is a leak pathway available. This is clearly visible with a reversal of current negative (antiport) to positive (proton backflow), and the magnitude of this back flow is larger for ∆107-EmrE which lacks the regulatory elements provided by the C-terminal tail. We have amended the main text and SI to include this discussion.

      (2) Given the estimated rate of transport, size of liposomes, and pH gradient, how quickly would the SSME liposomes reach pH balance?

      Since SSME measurements are due to capacitive coupling and will represent the net charge movement, including pre-steady state contributions, the current values will be incredibly sensitive to individual rates of alternating access, proton and drug on- and off-rates. Time to pH balance would, therefore, differ based on the construct, LPR, absolute pH or drug concentrations as well as the magnitude of the given gradients. For this reason, we necessarily use integrated currents (transported charge over time) when comparing mutants as it reflects kinetic differences inherent to the mutant without over-processing the data, for example, by normalizing to peak currents which would over emphasize certain properties that will differ across mutants. This process allows for qualitative comparisons by subjecting mutants to the same pH and substrate gradients when the same density of transporter construct is present, and care is given to not overstate the importance of the actual quantities of charges that are moving as they will be highly context dependent. This is clearly seen in Fig 3.2 where the current is not zero and the net transported charge is still changing at the end of 1 second. We have amended SI figure 3.2 and the main text to include this discussion.

      (3) Given that H110 and E14 would deprotonate when the external pH is elevated above 7 and that these protons would be released to external bulk, the external bulk pH would decrease twice as much for WT compared to delta107. This would decrease the pH gradient for WT relative to the mutant. Can these effects be quantified and accounted for? Would this ostensibly decrease the amount of charge that transfers into the liposomes for WT? How would this impact the current interpretation that the two systems are driven by the same gradient?

      The reviewer is correct that there will be differences in deprotonation of WT and ∆107 and the amount of proton release will also change with pH. We have amended Figure 3.2 to clarify this difference and its significance. For the proton gradient only conditions in Figure 3, each set of liposomes were equilibrated to the starting pH by repeated washings and incubation before measurement occurred. For example, for the pH 6.5 inside, pH 7 outside condition, both the inside and outside pH were equilibrated at 6.5, and both E14 residues will be predominantly protonated in WT and ∆107, and H110 will be predominantly protonated in WT-EmrE. Upon application of the external pH 7 solution, protons will be released from the E14 of either construct, with additional proton being released from H110 for WT-EmrE causing a large pre-steady state negative contribution to the signal (Fig. 3.2A). Under this pH condition, we the peak current correlates with the LPR, as this release of protons will depend on density of the transporter. However, we also see that the longer-time decay of the signal correlates with the construct (WT or ∆107) and is relatively independent of LPR, consistent with a transport process rather than a rapid pre-steady state release of protons. Therefore, when we look at the actual transported charge over time, despite the higher contribution of proton release to the WT-EmrE signal, the significant increase in uncoupled proton transport for the C-terminal deletion mutant dominates the signal.

      As a contrast, we apply this same analysis to the pH 8 inside, pH 8.5 outside condition where both sets of transports will be deprotonated from the start (Fig. 3.2B). Now the peak currents, decay rates, and transported charge over time are all consistent for a given construct (WT or ∆107). The two LPRs for an individual construct match within error, as the differences in overall charge movement and transported charge over time are independent of pre-steady-state proton release from the transporter at high pH.

      (4) A related question, how does the protonation of H110 influence the potential rate of proton transport between the two systems? Does the proton on H110 transfer to E14?

      The protonation of H110 will only influence the rate of transport of WT-EmrE as its protonation is required for formation of the hydrogen bonding network that coordinates gating. However, protonation of both E14s will influence the rate of proton transport of both systems as protonation state affects the rate of alternating access which is necessary for proton turnover. This is another reason we use the transported charge over time metric to compare mutants as it allows for a common metric for mutants with altered rates which are present in the same density and under the same gradient conditions. We do not have any evidence to support transfer of proton from H110 to E14, but there is also no evidence to exclude this possibility. We do not discuss this in the manuscript because it would be entirely speculative.

      (5) Is the pKa in the simulations (Figure 6B) consistent with the experiment?

      We calculated the pKa from this WT PMF and got a pKa of 7.1, which is in close proximity of the experimental value of 6.8

      (6) Why isn't the PMF for delta_107 compared to WT to corroborate the prediction that hydration sufficiently alters both the rate and pKa of E14?

      We appreciate the reviewer’s suggestion and agree that a direct comparison would be valuable. However, several factors limit the interpretability of such an analysis in this context:

      (a) Our data indicate that the primary difference in free energy barriers between WT and Δ107 lies in the hydration step rather than proton transport itself. To fully resolve this, a 2D PMF calculation via 2D umbrella sampling would be required which can be very expensive. Solely looking at the proton transport side of this PMF will not give much difference.

      (b) Given this, the aim for us to calculate this PMF is to support our conjecture that the bottleneck for such transport is the hydrophobic gate.

      (7) The authors suggest that A61 rotation 'controls the water wire formation' by measuring the distribution of water connectivity (water-water distances via logS) and average distances between A61 and I68/I67. Delta_107 has a larger inter-residue distance (Figure 6A) more probable small log S closer waters connecting E14 and two residues near the top of the protein (Figure 5A). However, it strikes me that looking at average distances and the distribution of log S is not the best way to do this. Why not quantify the correlation between log S and A61 orientation and/or A61-I68/I71 distances as well as their correlation to the proposed tail interactions (D84-R106 interactions) to directly verify the correlation (and suggest causation) of these interactions on the hydration in this region. Additionally, plotting the RMSD or probability of waters below I68 and I171 as a function of A61-I68 distances and/or numbers over time would support the log S analysis.

      The reviewer requested that we provide direct correlation analyses between A61 orientation, residue distances (A61-I68/I71), and water connectivity (logS) to better support the claim about water wire formation, rather than relying solely on average distances and distributions.

      We appreciate the reviewer’s suggestion to strengthen our analysis with direct correlations. However, due to the slow kinetics of hydration/dehydration events, unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses. Instead, our approach focuses on equilibrium statistics, which reveal the dominant conformational states of WT- and Δ107-EmrE and provide meaningful insights into shifts in hydration patterns.

      (8) It looks like the D84-R106 salt bridge controls this A61-I68 opening. Could this also be quantifiably correlated?

      As discussed in response to the previous question, the unbiased simulation timescales do not permit sufficient sampling of multiple transitions to perform statistically robust dynamic correlation analyses.

      (9) The NMR results show that alternating access increases in frequency from ~4/s for WT at low and high pH to ~17/s for delta_107 only at high pH. They then go on to analyze potential titration changes in the delta_107 mutant, finding two residues with approximate pKa values of 5.6 and 7.1. The former is assigned to E14, consistent with WT. But the latter is suggested to be either D84, which salt bridges to R106, or the C-terminal carboxylate. If it is D84, why would deprotonation, which would be essential to form the salt bridge, increase the rate of alternating access relative to WT?

      We note that the faster alternating access rate was observed for TPP+-bound ∆107-EmrE, not the transporter in the absence of substrate. In the absence of substrate the relatively broad lines preclude quantitative determination of the alternating access rate by NMR making it difficult to judge the validity of the reviewers reasoning. Identification of which residue (D84 or H110) corresponds to the shifted pKa is ultimately of little consequence as this mutant does not reflect the native conditions of the transporter. It is far more important to acknowledge that both R106 and D84 are sensitive to this deprotonation as it indicates these residues are close in space and provides experimental support for the existence of the salt bridge identified in the MD simulations, as discussed in the manuscript.

      (10) In a more general sense, can the authors speculate why an efflux pump would evolve this type of secondary gate that can be thrown off by tight binding in the allosteric site such as that demonstrated by Harmane? What potential advantage is there to having a tail-regulated gate?

      This was likely a necessity to allow for better coupling as these transporters evolved to be more promiscuous. The C-terminal tail is absent in tightly coupled family members such as Gdx who are specific for a single substrate and have a better-defined transport stoichiometry. We have included this discussion in the main text and are currently investigating this phenomenon further. Those experiments are beyond the scope of the current manuscript.

      (11) It is hard to visualize the PT reaction coordinate. Is the e_PT unit vector defined for each window separately based on the initial steered MD pathway? If so, how reliant is the PT pathway on this initial approximate path? Also, how does this position for each window change if/when E14 rotates? This could be checked by plotting the x,y,z distributions for each window and quantifying the overlap between windows in cartesian space. These clouds of distributions could also be plotted in the protein following alignment so the reader can visualize the reaction coordinate. Does the CEC localization ever stray to different, disconnected regions of cartesian phase space that are hidden by the reaction coordinate definition?

      The unit vector e_PT is the same across all windows based on unbiased MD. Therefore, the reaction coordinate (a scalar) is the vector from the starting point to the CEC, projected on this unit vector. E14 rotation does not significantly change the window definition a lot unless the CEC is very close to E14, where we found this to be a better CV. For detailed discussions about this CV, especially a comparison between a curvilinear CV, please see J. Am. Chem. Soc. 2018, 140, 48, 16535–16543 “Simulations of the Proton Transport” and its SI Figure S1.In the Supplementary Information, we added figure 6.1 to show the average X, Y, Z coordinates of each umbrella window.

      (12) Lastly, perhaps I missed it, but it's unclear if the rate of substrate efflux is also increased in the delta_107 mutant. If this is also increased, then the overall rate of exchange is faster, including proton leak. This would be important to distinguish since the focus now is entirely on proton leaks. I.e., is it only leak or is it overall efflux and leak?

      We have amended SI figure 3.2 to include a gradient condition where an infinite drug gradient is created across the liposome. The infinite gradient allows for rapid transport of drug into the liposomes until charge build-up opposes further transport. This peak is at the same time for both LPRs of WT- and ∆107-EmrE suggesting the rate of substrate transport is similar. Differences in the peak heights across LPRs can be attributed to competition between drug and proton for the primary binding site such that more proton will be released for the higher density constructs as described above. This process does also create a proton gradient as drug moving in is coupled to two protons moving out so as charge build-up inhibits further drug movement, the building proton gradient will also begin to drive proton back in which is another example of uncoupled leak. Here, again we see that this back-flow of protons or leak is of greater magnitude for ∆107-EmrE proteoliposomes that for those with WT-EmrE. We have included this discussion in the SI and main text.

      Minor

      (1) Introduction - the authors describe EmrE as a model system for studying the molecular mechanism of proton-coupled transport. This is a rather broad categorization that could include a wide range of phenomena distal from drug transport across membranes or through efflux pumps. I suggest further specifying to not overgeneralize.

      We revised to note the context of multidrug efflux.

      Reviewer #2 (Recommendations for the authors):

      Simulations. The initial water wire analysis is based on 4 different 1 ms simulations presented in Figure 5. The 3 WT replicates show similar results for the tail-blocking water wire formation, but the details of the system build and loop/C-terminal tail placement are not clear. It does appear that a single C-terminal tail model was created for all WT replicates. Was there also modeling for any parts of the truncation mutant? Regardless, since these initial placements and uncertainties in the structures may impact the results and subsequent water wire formation, I would like a discussion of how these starting structures impacted the formation or not of wires. I think that another WT replicate should be run starting from a completely new build that places the tail in a different (but hopefully reasonable location). This could be built with any number of tools to generate reasonable starting structures. It's critical to ensure that multiple independent simulations across different initial builds show the same water wire behavior so that we know the results are robust and insensitive to the starting structure and stochastic variation.

      We thank Reviewer 2 for their suggestion regarding the discussion of the initial structure. In our simulations, the C-terminal tail was initially modeled in an extended conformation (solvent-exposed) to mimic its disordered state prior to folding. This approach resembles an annealing process, where the system evolves from a higher free-energy state toward equilibrium. Notably, across all three replicas, we observed consistent folding of the tail onto the protein surface, supporting the robustness of this conformational preference.

      For the Δ107 truncation mutant, minimal modeling was required, as most experimental structures resolve residues up to S105 or R106. To rigorously assess the influence of the starting configuration, we analyzed the tail’s dynamics using backbone dihedral angle auto- and cross-correlation functions (new Supplementary Figures 10.1 and 10.2). These analyses reveal rapid decay of correlations—consistent with the tail’s short length (5 residues) and high flexibility—indicating that the system "forgets" its initial configuration well within the simulation timescale. Thus, we conclude that our sampling is sufficient to capture equilibrium behavior, independent of the starting structure.

      What does the size of the barrier in the PMF (Figure 6B) imply about the rate of proton transfer/leak and can the pKa shift of the acidic residue be estimated with this energy value compared to bulk?

      We noticed this point aligns with a related concern raised by Reviewer 1. For a detailed discussion please refer to Point 5 in our response to Reviewer 1.

      Experimental validation. The hypotheses generated by this work would be better buttressed if there were some mutation work at the hydrophobic gate (61, 68, 71) to support it. I realize that this may be hard, but it would significantly improve the quality.

      Due to the small size of the transporter, any mutagenesis of EmrE should necessarily be accompanied by functional characterization to fully assess the effects of the mutation on rate-limiting steps. We have revised the manuscript to add a discussion of the challenges with analyzing simple point mutants and citing what is known from prior scanning mutagenesis studies of EmrE.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors present a novel CRISPR/Cas9-based genetic tool for the dopamine receptor dop1R2. Based on the known function of the receptor in learning and memory, they tested the efficacy of the genetic tool by knocking out the receptor specifically in mushroom body neurons. The data suggest that dop1R2 is necessary for longer-lasting memories through its action on ⍺/ß and ⍺'/ß' neurons but is dispensable for short-term memory and thus in ɣ neurons. The experiments impressively demonstrate the value of such a genetic tool and illustrate the specific function of the receptor in subpopulations of KCs for longer-term memories. The data presented in this manuscript are significant.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examines the role of the dopamine receptor, Dop1R2, in memory formation. This receptor has complex roles in supporting different stages of memory, and the neural mechanisms for these functions are poorly understood. The authors are able to localize Dop1R2 function to the vertical lobes of the mushroom body, revealing a role in later (presumably middle-term) aversive and appetitive memory. In general, the experimental design is rigorous, and statistics are appropriately applied. While the manuscript provides a useful tool, it would be strengthened further by additional mechanistic studies that build on the rich literature examining the roles of dopamine signaling in memory formation. The claim that Dop1R2 is involved in memory formation is strongly supported by the data presented, and this manuscript adds to a growing literature revealing that dopamine is a critical regulator of olfactory memory. However, the manuscript does not necessarily extend much beyond our understanding of Dop1R2 in memory formation, and future work will be needed to fully characterize this reagent and define the role of Dop1R2 in memory.

      Strengths:

      (1) The FRT lines generated provide a novel tool for temporal and spatially precise manipulation of Dop1R2 function. This tool will be valuable to study the role of Dop1R2 in memory and other behaviors potentially regulated by this gene.

      (2) Given the highly conserved role of Dop1R2 in memory and other processes, these findings have a high potential to translate to vertebrate species.

      Weaknesses:

      (1) The authors state Dop1R2 associates with two different G-proteins. It would be useful to know which one is mediating the loss of aversive and appetitive memory in Dop1R2 knockout flies.

      We thank you for the insightful comment. We agree that it would be very useful to know which G-proteins are transmitting Dop1R2 signaling. To that extent, we examined single-cell transcriptomics data to check the level of co-expression of Dop1R2 with G-proteins that are of interest to us. (Figure 1 S1)

      Lines 312-325

      “Some RNA binding proteins and Immediate early genes help maintain identities of Mushroom body cells and are regulators of local transcription and translation (de Queiroz et al., 2025; Raun et al., 2025). So, the availability of different G-proteins may change in different lobes and during different phases of memory. The G-protein via which GPCRs signal, may depend on the pool of available G-proteins in the cell/sub-cellular region (Hermans, 2003)., Therefore, Dop1R2 may signal via different G-proteins in different compartments of the Mushroom body and also different compartments of the neuron. We looked at Gαo and Gαq as they are known to have roles in learning and forgetting (Ferris et al., 2006; Himmelreich et al., 2017). We found that Dop1R2 co-expresses more frequently with Gαo than with Gαq (Figure 1 S1). While there is evidence for Dop1R2 to act via Gαq (Himmelreich et al., 2017). It is difficult to determine whether this interaction is exclusive, or if Dop1R2 can also be coupled to other G-proteins. It will be interesting to determine the breadth of G-proteins that are involved in Dop1R2 signaling.”

      (2) It would be interesting to examine 24hr aversive memory, in addition to 24hr appetitive memory.

      This is indeed an important point and we agree that it will complete the assessment of temporally distinct memory traces. We therefore performed the Aversive LTM experiments and include them in the results.

      Lines 208-228

      “24h memory is impaired by loss of Dop1R2

      Next, we wanted to see if later memory forms are also affected. One cycle of reward training is sufficient to create LTM (Krashes & Waddell, 2008), while for aversive memory, 5-6 cycles of electroshock-trainings are required to obtain robust long-term memory scores (Tully et al., 1994). So, we looked at both, 24h aversive and appetitive memory. For aversive LTM, the flies were tested on the Y-Maze apparatus as described in (Mohandasan et al., (2022).

      Flipping out Dop1R2 in the whole MB causes a reduced 24h memory performance (Figure 4A, E). No phenotype was observed when Ddop1R2 was flipped out in the γ-lobe (Figure 4B, F). However, similar to 2h memory, loss of Ddop1R2 in the α/β-lobes (Figure 4C, G) or the α’/β’-lobes (Figure 4D, H) causes a reduction in memory performance. Thus, Dop1R2 seems to be involved in aversive and appetitive LTM in the α/β-lobes and the α’/β’-lobes.

      Previous studies have shown mutation in the Dop1R2 receptor leads to improvement in LTM when a single shock training paradigm is used (Berry et al., 2012). As we found that it disrupts LTM, we wanted to verify if the absence of Dop1R2 outside the MB is what leads to an improvement in memory. To that extent, we tested panneuronal flip-out of Dop1R2 flies for 6hr and 24hr memory upon single shock using the elav-Gal4 driver. We found that it did not improve memory at both time points (Figure 4 S1). Confirming that flipping out Dop1R2 panneuronally does not improve LTM (Figure 4 S1C) and highlighting its irrelevance in memory outside the MB.”

      (3) The manuscript would be strengthened by added functional analysis. What are the DANs that signal through Dop1R. How do these knockouts impact MBONs?

      We thank you for this question. We indeed agree that it is a highly relevand and open question, how distinct DANs signal via distinct Dopamine receptors. Our work here uniquely focusses on Dop1R2 within the MB. We aim to investigate other DopRs and the connection between DANs in the future using similar approaches.

      (4) Also in Figure 2, the lobe-specific knockouts might be moved to supplemental since there is no effect. Instead, consider moving the control sensory tests into the main figure.

      We thank you for this suggestion and understand that in Figure 2 no significant difference is seen. However, we have emphasized in the text that the results from the supplementary figures are just to confirm that the modifications made at the Dop1R2 locus did not alter its normal function.

      Lines 156-162

      “We wanted to see if flipping out Dop1R2 in the MB affects memory acquisition and STM by using classical olfactory conditioning. In short, a group of flies is presented with an odor coupled to an electric shock (aversive) or sugar (appetitive) followed by a second odor without stimulus. For assessing their memory, flies can freely choose between the odors either directly after training (STM) or at a later timepoint.

      To ensure that the introduced genetic changes to the Dop1R2 locus do not interfere with behavior we first checked the sensory responses of that line”

      (5) Can the single-cell atlas data be used to narrow down the cell types in the vertical lobes that express Dop1R2? Is it all or just a subset?

      This is indeed an interesting question, and we thank you for mentioning it. To address this as best as we could, we analyzed the single cell transcriptomic data from (Davie et al., 2018) and presented it in Figure 1 S1.

      Reviewer #3 (Public Review):

      Summary:

      Kaldun et al. investigated the role of Dopamine Receptor Dop1R2 in different types and stages of olfactory associative memory in Drosophila melanogaster. Dop1R2 is a type 1 Dopamine receptor that can act both through Gs-cAMP and Gq-ERCa2+ pathways. The authors first developed a very useful tool, where tissue-specific knock-out mutants can be generated, using Crispr/Cas9 technology in combination with the powerful Gal4/UAS gene-expression toolkit, very common in fruit flies.

      They direct the K.O. mutation to intrinsic neurons of the main associative memory centre fly brain-the mushroom body (MB). There are three main types of MB-neurons, or Kenyon cells, according to their axonal projections: a/b; a'/b', and g neurons.

      Kaldun et al. found that flies lacking dop1R2 all over the MB displayed impaired appetitive middle-term (2h) and long-term (24h) memory, whereas appetitive short-term memory remained intact. Knocking-out dop1R2 in the three MB neuron subtypes also impaired middle-term, but not short-term, aversive memory.

      These memory defects were recapitulated when the loss of the dop1R2 gene was restricted to either a/b or a'/b', but not when the loss of the gene was restricted to g neurons, showcasing a compartmentalized role of Dop1R2 in specific neuronal subtypes of the main memory centre of the fly brain for the expression of middle and long-term memories.

      Strengths:

      (1) The conclusions of this paper are very well supported by the data, and the authors systematically addressed the requirement of a very interesting type of dopamine receptor in both appetitive and aversive memories. These findings are important for the fields of learning and memory and dopaminergic neuromodulation among others. The evidence in the literature so far was generated in different labs, each using different tools (mutants, RNAi knockdowns driven in different developmental stages...), different time points (short, middle, and long-term memory), different types of memories (Anesthesia resistant, which is a type of protein synthesis independent consolidated memory; anesthesia sensitive, which is a type of protein synthesis-dependent consolidated memory; aversive memory; appetitive memory...) and different behavioral paradigms. A study like this one allows for direct comparison of the results, and generalized observations.

      (2) Additionally, Kaldun and collaborators addressed the requirement of different types of Kenyon cells, that have been classically involved in different memory stages: g KCs for memory acquisition and a/b or a'/b' for later memory phases. This systematical approach has not been performed before.

      (3) Importantly, the authors of this paper produced a tool to generate tissue-specific knock-out mutants of dop1R2. Although this is not the first time that the requirement of this gene in different memory phases has been studied, the tools used here represent the most sophisticated genetic approach to induce a loss of function phenotypes exclusively in MB neurons.

      Weaknesses:

      (1) Although the paper does have important strengths, the main weakness of this work is that the advancement in the field could be considered incremental: the main findings of the manuscript had been reported before by several groups, using tissue-specific conditional knockdowns through interference RNAi. The requirement of Dop1R2 in MB for middle-term and long-term memories has been shown both for appetitive (Musso et al 2015, Sun et al 2020) and aversive associations (Plaçais et al 2017).

      Thank you for this comment. We believe that the main takeaway from the paper is the elegant tool we developed, to study the role of Dop1R2 in fruit flies by effectively flipping it out spatio-temporally. Additionally, we studied its role in all types of olfactory associative memory to establish it as a robust tool that can be used for further research in place of RNAi knockouts which are shown to be less efficient in insects as mentioned in the texts in line 394-398.

      “The genetic tool we generated here to study the role of the Dop1R2 dopamine receptor in cells of interest, is not only a good substitute for RNAi knockouts, which are known to be less efficient in insects (Joga et al., 2016), but also provides versatile possibilities as it can be used in combination with the powerful genetic tools of Drosophila.”

      (2) The approach used here to genetically modify memory neurons is not temporally restricted. Considering the role of dopamine in the correct development of the nervous system, one must consider the possible effects that this manipulation can have in the establishment of memory circuits. However, previous studies addressing this question restricted the manipulation of Dop1R2 expression to adulthood, leading to the same findings than the ones reported in this paper for both aversive and appetitive memories, which solidifies the findings of this paper.

      We thank you for this comment and we agree that it would be important to show a temporally restricted effect of Dop1R2 knockout. To assess this and rule out potential developmental defects we decided to restrict the knockout to the post-eclosion stage and to include these results.

      Lines 230-250

      “Developmental defects are ruled out in a temporally restricted Dop1R2 conditional knockout.

      To exclude developmental defects in the MB caused by flip-out of Dop1R2, we stained fly brains with a FasII antibody. Compared to genetic controls, flies lacking Dop1R2 in the mushroom body had unaltered lobes (Figure 4 S2C).

      Regardless, we wanted to control for developmental defects leading to memory loss in flip-out flies. So, we generated a Gal80ts-containing line, enabling the temporal control of Dop1R2 knockout in the entire mushroom body (MB). Given that the half-life of the receptor remains unknown, we assessed both aversive short-term memory (STM) and long-term memory (LTM) to determine whether post-eclosion ablation of Dop1R2 in the MB produced differences compared to our previously tested line, in which Dop1R2 was constitutively knocked out from fertilization. To achieve this, flies were maintained at 18°C until eclosion and subsequently shifted to 30°C for five to seven days. On the fifth day, training was conducted, followed by memory testing. Our results indicate that aversive STM was not significantly impaired in Dop1R2-deficient MBs compared to control flies (Figure 4 S3), consistent with our previous findings (Figure 2). However, aversive LTM was significantly impaired relative to control lines (Figure 4 S3), which also aligned with prior observations. These findings strongly indicate that memory loss caused by Dop1R2 flip-out is not due to developmental defects.”

      (3) The authors state that they aim to resolve disparities of findings in the field regarding the specific role of Dop1R2 in memory, offering a potent tool to generate mutants and addressing systematically their effects on different types of memory. Their results support the role of this receptor in the expression of long-term memories, however in the experiments performed here do not address temporal resolution of the genetic manipulations that could bring light into the mechanisms of action of Dop1R2 in memory. Several hypotheses have been proposed, from stabilization of memory, effects on forgetting, or integration of sequences of events (sensory experiences and dopamine release).

      We thank you for this comment. We agree that it would be interesting to dissect the memory stages by knocking out the receptor selectively in some of them (encoding, consolidation, retrieval). However, our tool irreversibly flips out Dop1R2 preventing us from investigating the receptor’s role in retrieval. Our results show that the receptor is dispensable for STM formation (Figure 2, Figure 4 Supplement 3), suggesting that it is not involved in encoding new information. On the other hand, it is instead involved in consolidation and/or retrieval of long-term and middle-term memories (Figure 3, Figure 4, Figure 5B).

      Overall, the authors generated a very useful tool to study dopamine neuromodulation in any given circuit when used in combination with the powerful genetic toolkit available in Drosophila. The reports in this paper confirmed a previously described role of Dop1R2 in the expression of aversive and appetitive LTM and mapped these effects to two specific types of memory neurons in the fly brain, previously implicated in the expression and consolidation of long-term associative memories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) On the first view, the results shown here are different from studies published earlier, while in the same line with others (e.g. Sun et al, for appetitive 24h memories). For example, Berry et al showed that the loss of dop1R2 impairs immediate memory, while memory scores are enhanced 3h, 6h, and 24h after training. Further, they showed data that shock avoidance, at least for higher shock intensities, is reduced in mutant (damb) flies. All in all, this favors how important it is to improve the genetic tools for tissue-specific manipulation. Despite the authors nicely discussing their data with respect to the previous studies, I wondered whether it would be suitable to use the new tool and knock out dop1R2 panneuronally to see whether the obtained data match the results published by Berry et al.. Further, as stated in line 105ff: "As these studies used different learning assays - aversive and appetitive respectively as well as different methods, it is unclear if Dop1R2 has different functions for the different reinforcement stimulus" I wondered why the authors tested aversive and appetitive learning for STM and 2h memory, but only appetitive memory for 24h.

      Thank you for this comment. To that extent, as mentioned above in response to reviewer #2, we included in the results the aversive LTM experiment (Figure 4). Moreover, we performed experiments along the line of Berry et al. using our tool as shown in Figure 4 S1. Our results support that Dop1R2 is required for LTM, rather than to promote forgetting.

      (2) Line 165ff: I can´t find any of the supplementary data mentioned here. Please add the corresponding figures.

      Thank you for pointing this out. In that line we don’t refer to any supplementary data, but to the Figure 1F, showing the absence of the HA-tag in our MB knock-out line. We have clarified this in the text (lines 151-153)

      (3) I can't imagine that the scale bar in Figure 1D-F is correct. I would also like to suggest to show a more detailed analysis of the expression pattern. For example, both anterior and posterior views would be appropriate, perhaps including the VNC. This would allow the expression pattern obtained with this novel tool to be better compared with previously published results. Also, in relation to my comment above (1), it may help to understand the functional differences with previous studies, especially as the authors themselves state that the receptor is "mainly" expressed in the mushroom body (line 99). It would be interesting to see where else it is expressed (if so). This would also be interesting for the panneuronal knockdown experiment suggested under (1). If the receptor is indeed expressed outside the mushroom body, this may explain the differences to Berry et al.

      Thank you for noting this, there was indeed a mistake in the scale bar which we now fixed. Since with our HA-tag immunostaining we could not detect any noticeable signal outside of the MB, we decided to analyze previously existing single cell transcriptomics data that showed expression of the receptor in 7.99% of cells in the VNC and in 13.8% of cells outside the MB (lines 98-100) confirming its sparse expression in the nervous system. The lack of detection of these cells is likely due to the sparse and low expression of the protein. The HA-tag allows to detect the endogenous level of the locus (it is possible that a Gal4/UAS amplification of the signal might allow to detect these cells).

      Regarding the panneuronal knockout, we decided to try to replicate the experiment shown in Berry et al. in Figure 4 S1 and found that Dop1R2 is required for LTM.

      (4) Related to learning data shown in Figures 2-4, the authors should show statistical differences between all groups obtained in the ANOVA + PostHoc tests. Currently, only an asterisk is placed above the experimental group, which does not adequately reflect the statistical differences between the groups. In addition, I would like to suggest adding statistical tests to the chance level as it may be interesting to know whether, for example, scores of knockout flies in 3C and 3D are different from the chance level.

      Many thanks for this correction, we agree with the fact that the way significance scores were shown was not informative enough. We fixed the point by now showing significance between all the control groups and the experimental ones. We also inserted the chance level results in the figure legends.

      (5) Unfortunately, the manuscript has some typing errors, so I would like to ask the authors to check the manuscript again carefully.

      Some Examples:

      Line 31: the the

      Line 56: G-Protein

      Line 64: c-AMP

      Line 68: Dopamine

      Line 70: G-Protein (It alternates between G-protein and G-Protein)

      Line 76: References are formatted incorrectly

      Line 126: Ha-Tag (It alternates between Ha and HA)

      Line 248: missing space before the bracket...is often found

      Thank you for noticing these errors, we have now corrected the spelling throughout the manuscript.

      (6) In the figures the axes are labelled Preference Index (Pref"I"). In the methods, however, the calculation formula is defined as "PREF".

      We thank you for drawing attention to this. To avoid confusion, we changed the definition in the methods section so that it could be clear and coherent (“Memory tests” paragraph in the methods section).

      “PREF = ((N<sub>arm1</sub> - N<sub>arm2</sub>) 100) / N<sub>total</sub> the two preference indices were calculated from the two reciprocal experiments. The average of these two PREFs gives a learning index (LI). LI = (PREF<sub>1</sub> + PREF<sub>2</sub>) / 2.

      In case of all Long-term Aversive memory experiments, Y-Maze protocol was adapted to test flies 24 hours post training. Testing using the Y-Maze was done following the protocol as described in (Mohandasan et al., 2022) where flies were loaded at the bottom of 20-minutes odorized 3D-printed Y-Mazes from where they would climb up to a choice point and choose between the two odors. The learning index was then calculated after counting the flies in each odorized vial as follows: LI = ((N<sub>CS-</sub> - N<sub>CS+</sub>) 100) / N<sub>total</sub>. Where NCS- and NCS+ are the number of flies that were found trapped in the untrained and trained odor tube respectively.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figures 2 and 3, the legends running two different subfigures is confusing. Would be helpful to find a different way to present.

      Thank you for your suggestion. We modified how we present legends, placing them vertically so that it is clearer.

      (2) Use additional drivers to verify middle and long-term memory phenotypes.

      We agree that it would be interesting to see the role of Dop1R2 in other neurons. To that extent, we looked at long term aversive memory in flies where the receptor was panneuronaly flipped out, and did not find evidence that suggested involvement of Dop1R2 in memory processes outside the MB. (Figure 4 S1)

      (3) Additional discussion of genetic background for fly lines would be helpful.

      Thank you for your advice. We have mentioned the genetic background of flies in the key resources table of the methods sections. Additionally, we also included further explanation on how the lines were created and their genetic background (see “Fly Husbandry” paragraph in the methods section).

      “UAS-flp;;Dop1R2 cko flies and Gal4;Dop1R2<sup>cko</sup> flies were crossed back with ;;Dop<sup>cko</sup> flies to obtain appropriate genetic controls which were heterozygous for UAS and Gal4 but not Dop1R2<sup>cko</sup>.”

      Reviewer #3 (Recommendations For The Authors):

      Line 109 states that to resolve the problem a tool is developed to knock down Dop1R2 in s spatial and temporal specific manner- while I agree that this is within the potential of the tool, there is no temporal control of the flipase action in this study; at least I cannot find references to the use of target/gene switch to control stages of development or different memory phases. However the version available for download is missing supplementary information, so I did not have access to supplementary figures and tables.

      Thank you for the comment, as mentioned before it would be great to be able to dissect the memory phases. We show in lines 232 – 250 and Figure 4 S3 that the temporally restricted flip-out to the post-eclosion life stage gave us coherent results with the previous findings, ruling out potential developmental defects.

      In relation to my comment on the possible developmental effects of the loss of the gene, Figure 1F could showcase an underdeveloped g lobe when looking at the lobe profiles. I understand this is not within the scope of the figure, but maybe a different z projection can be provided to confirm there are no obvious anatomical alterations due to the loss of the receptor.

      We understand the doubt about the correct development of the MB and we thank you for your insightful comment. To that extent we decided to perform a FasII immunostaining that could show us the MB in the different lines (Figure 4 S2) and it appears that there are no notable differences in the lobes development in our knockout line.

      It seems that the obvious missing piece of the puzzle would be to address the effects of knocking out Dop1R2 in aversive LTM. The idea of systematically addressing different types of memory at different time points and in different KCs is the most attractive aspect of this study beyond the technical sophistication, and it feels that the aim of the study is not delivered without that component.

      We agree and we thank you for the clarification. As mentioned above in response to Reviewer #2, we decided to test aversive LTM as described in lines –208-228, Figure 4, Figure 4 S1.

      Some statements of the discussion seem too vague, and I think could benefit from editing:

      Line 284 "however other receptors could use Gq and mediate forgetting"- does this refer to other dopamine receptors? Other neuromodulators? Examples?

      Thank you for pointing this out. We Agree and therefore decided to omit this line.

      Line 289 "using a space training protocol and a Dop1R2 line" - this refers to RNAi lines, but it should be stated clearly.

      That is correct, we thank you for bringing attention to this and clarified it in the manuscript.

      –Lines 329-330

      “Interestingly, using a spaced training protocol and a Dop1R2 RNAi knockout line another study showed impaired LTM (Placais et al., 2017).”

      The paragraph starting in line 305 could be re-written to improve clarity and flow. Some statements seem disconnected and require specific citations. For example "In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways and the internal state of the animal...". This is not accurate. Berry et al 2012 report enhanced LTM performance in dop1R2 mutants whereas Plaçais et al 2017 report LTM defects in Dop1R2 knock-downs, but these different findings do not seem to rely on different internal states or signaling pathways. Maybe further elaboration can help the reader understand this speculation.

      We agree and we thank you for this advice. We decided to add additional details and citations to validate our speculation

      Lines 350-353

      “In aversive memory formation, loss of Dop1R2 could lead to enhanced or impaired memory, depending on the activated signaling pathways. The signaling pathway that is activated further depends on the available pool of secondary messengers in the cell (Hermans, 2003) which may be regulated by the internal state of the animal.”

      "...for reward memory formation, loss of Dop1R2 seems to impair memory", this seems redundant at this point, as it has been discussed in detail, however, citations should be provided in any case (Musso 2015, Sun 2020)

      Thank you for noting this. We recognize the redundancy and decided to exclude the line.

      Finally, it would be useful to additionally refer to the anatomical terminology when introducing neuron names; for example MBON MVP2 (MBON-g1pedc>a/b), etc.

      Thank you for this suggestion. We understand the importance of anatomical terminologies for the neurons. Therefore, we included them when we introduce neurons in the paper.

      We thank you for your observations. We recognize their value, so we have made appropriate changes in the discussion to sound less vague and more comprehensive.

    1. photography

      living over 80 years in the future photography is so normal we don't even consider it, but for a forward looking person he could see the implications.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The resubmitted version of the manuscript adequately addressed several initial comments made by reviewing editors, including a more detailed analysis of the results (such as those of bilayer thickness). This version was seen by 2 reviewers. Both reviewers recognize this work as being an important contribution to the field of BK and voltage-dependent ion channels in general. The long trajectories and the rigorous/novel analyses have revealed important insights into the mechanisms of voltage-sensing and electromechanical coupling in the context of a truncated variant of the BK channel. Many of these observations are consistent with structural and functional measurements of the channel, available thus far. The authors also identify a novel partially expanded state of the channel pore that is accessed after gating-charge displacement, which informs the sequence of structural events accompanying voltage-dependent opening of BK.

      However, there are key concerns regarding the use of the truncated channel in the simulations. While many gating features of BK are preserved in the truncated variant, studies have suggested that opening of the channel pore to voltage-sensing domain rearrangement is impaired upon gating-ring deletion. So the inferences made here might only represent a partial view of the mechanism of electromechanical coupling.

      It is also not entirely clear whether the partially expanded pore represents a functionally open, sub-conductance, or another closed state. Although the authors provide evidence that the inner pore is hydrated in this partially open state, in the absence of additional structural/functional restraints, a confident assignment of a functional state to this structure state is difficult. Functional measurements of the truncated channel seem to suggest that not only is their single channel conductance lower than full-length channels, but they also appear to have a voltage-independent step that causes the gates to open. It is unclear whether it is this voltage-independent step that remains to be captured in these MD trajectories. A clean cut resolution of this conundrum might not be feasible at this time, but it could help present the various possibilities to the readers.

      We appreciate the positive comments and agree that there will likely be important differences between the mechanistic details of voltage activation between the Core-MT and full-length constructs of BK channels. We also agree that the dilated pore observed in the simulation may not be the fully open state of Core-MT.

      Nonetheless, the notion that the simulation may not have captured the full pore opening transition or the contribution of the CTD should not render the current work “incomplete”, because a complete understanding of BK activation would be an unrealistic goal beyond the scope of this work. We respectfully emphasize that the main insights of the current simulations are the mechanisms of voltage sensing (e.g., the nature of VSD movements, contributions of various charged residues, how small charge movements allow voltage sensing, etc.) as well as the role of the S4-S5-S6 interface in VSD-pore coupling. As noted by the Editor and reviewers, these insights represent important steps towards establishing a more complete understanding of BK activation.

      Below are the specific comments of the two experts who have assessed the work and made specific suggestions to improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Although the successful simulation of V-dependent K+ conduction through the BK channel pore and analysis of associated state dependent VSD/pore interactions and coupling analysis is significant, there are two related questions that are relevant to the conclusions and of interest to the BK channel community which I think should be addressed or discussed.

      One key feature of BK channels is their extraordinarily large conductance compared to other K+ selective channels. Do the simulations of K+ conductance provide any insight into this difference? Is the predicted conductance of BK larger than that of other K+ channels studied by similar methods? Is there any difference in the conductance mechanism (e.g., the hard and soft knock-on effects mentioned for BK)?

      The molecular basis of the large conductance of BK channels is indeed an interesting and fundamental question. Unfortunately, this is beyond the scope of this work and the current simulation does not appear to provide any insight into the basis of large conductance. It is interesting to note, though, the conductance is apparently related to the level of pore dilation and the pore hydration level, as increasing hydration level from ~30 to ~40 waters in the pore increases the simulated conductance from ~1.5 to 6 pS (page 8). This is consistent with previous atomistic simulations (Gu and de Groot, Nature Communications 2023; ref. 33) showing that the pore hydration level is strongly correlated with observed conductance. As noted in the manuscript, the conductance mechanism through the filter appears highly similar to previous simulations of other K+ channels (Page 8). Given the limit conductance events observed in the current simulations, we will refrain from discussing possible basis of the large conductance in BK channels except commenting on the role of pore hydration (page 8; also see below in response to #5).

      The pore in the MD simulations does not open as wide as the Ca-bound open structure, which (as the authors note) may mean that full opening requires longer than 10 us. I think that is highly likely given that the two 750 mV simulations yielded different degrees of opening and that in BK channels opening is generally much slower than charge movement. Therefore, a question is - do any of the conclusions illustrated in Figures 6, S5, S6 differ if the Ca-bound structure is used as the open state? For example, I expect the interactions between S5 and S6 might at least change to some extent as S6 moves to its final position. In this case, would conclusions about which residues interact, and get stronger or weaker, be the same as in Figures S6 b,c? Providing a comparison may help indicate to what extent the conclusions are dependent on achieving a fully open conformation.

      We appreciate the reviewer’s suggestion and have further analyzed the information flow and coupling pathways using the simulation trajectory initiated from the Ca2+-bound cryo-EM structure (sim 7, Table S1). The new results are shown in two new SI Figures S7 and S8, and new discussion has been added to pages 14-15. Comparing Figures 5 and S7, we find that dynamic community, coupling pathways, and information flow are highly similar between simulation of the open and closed states, even though there are significant differences in S5 contacts in the simulated open state vs Ca2+-bound open state (Figure S8). Interestingly, there are significant differences in S4-S5 packing in the simulated and Ca2+-bound open states (Figure S8 top panel), which likely reflect important difference in VSD/pore interactions during voltage vs Ca2+ activation.

      (2) P4 Significance -"first, successful direct simulation of voltage-activation"

      This statement may need rewording. As noted above Carrasquel-Ursulaez et al.,2022 (reference 39) simulated voltage sensor activation under comparable conditions to the current manuscript (3.9 us simulation at +400 mV), and made some similar conclusions regarding R210, R213 movement, and electric field focusing within the VSD. However, they did not report what happens to the pore or simulate K+ movement. So do the authors here mean something like "first, successful direct simulation of voltage-dependent channel opening"?

      We agree with the reviewer and have revised the statement to “ … the first successful direct simulation of voltage-dependent activation of the big potassium (BK) channel, ..”

      (3) P5 "We compare the membrane thickness at 300 and 750 mV and the results reveal no significant difference in the membrane thickness (Figure S2)" The figure also shows membrane thickness at 0 mV and indicates it is 1.4 Angstroms less than that at 300 or 750 mV. Whether or not this difference is significant should be stated, as the question being addressed is whether the structure is perturbed owing to the use of non-physiological voltages (which would include both 300 and 750 mV).

      We have revised the Figure S2 caption to clarify that one-way ANOVA suggest the difference is not significant.

      (4) P7 "It should be noted that the full-length BK channel in the Ca2+ bound state has an even larger intracellular opening (Figure 2f, green trace), suggesting that additional dilation of the pore may occur at longer timescales."

      As noted above, I agree it is likely that additional pore dilation may occur at longer timescales. However, for completeness, I suppose an alternative hypothesis should be noted, e.g. "...suggesting that additional dilation of the pore may occur at longer timescales, or in response to Ca-binding to the full length channel."

      This is a great suggestion. Revised as suggested.

      (5) Since the authors raise the possibility that they are simulating a subconductance state, some more discussion on this point would be helpful, especially in relation to the hydrophobic gate concept. Although the Magleby group concluded that the cytoplasmic mouth of the (fully open) pore has little impact on single channel conductance, that doesn't rule out that it becomes limiting in a partially open conformation. The simulation in Figure 3A shows an initial hydration of the pore with ~15 waters with little conductance events, suggesting that hydration per se may not suffice to define a fully open state. Indeed, the authors indicate that the simulated open state (w/ ~30-40 waters) has 1/4th the simulated conductance of the open structure (w/ ~60 waters). So is it the degree of hydration that limits conductance? Or is there a threshold of hydration that permits conductance and then other factors that limit conductance until the pore widens further? Addressing these issues might also be relevant to understanding the extraordinarily large conductance of fully open BK compared to other K channels.

      We agree with the reviewer’s proposal that pore hydration seems to be a major factor that can affect conductance. This is also well in-line with the previous computational study by Gu and de Groot (2023). We have now added a brief discussion on page 8, stating “Besides the limitation of the current fixed charge force fields in quantitively predicting channel conductance, we note that the molecular basis for the large conductance of BK channels is actually poorly understood (78). It is noteworthy that the pore hydration level appears to be an important factor in determining the apparent conductance in the simulation, which has also been proposed in a previous atomistic simulation study of the Aplysia BK channel (33).”

      Minor points

      (1) P5 "the fully relaxed pore profile (red trace in Figure S1d, top row) shows substantial differences compared to that of the Ca2+-free Cryo-EM structure of the full-length channel." For clarity, I suggest indicating which is the Ca-free profile - "... Ca2+-free Cryo-EM structure of the full-length channel (black trace)."

      We greatly appreciate the thoughtful suggestion. Revised as suggested.

      (2) P8 "Consistent with previous simulations (78-80), the conductance follows a multi-ion mechanism, where there are at least two K+ ions inside the filter" For clarity, I suggest indicating these are not previous simulations of BK channels (e.g., "previous simulations of other K+ channels ...").

      Revised as suggested. Thank you.

      (3) Figure 2, S1 - grey traces representing individual subunits are very difficult to see (especially if printed). I wonder if they should be made slightly darker. Similar traces in Figure 3 are easier to see.

      The traces in Figure S1 are actually the same thickness in Figure 3 and they appear lighter due to the size of the figure. Figure 2 panels a-c have been updated to improve the resolution.

      (4) Figure 2 - suggest labeling S6 as "S6 313-324" (similar to S4 notation) to indicate it is not the entire segment.

      Figure 2 panel d) has been updated as suggested.

      (5) Figure 2 legend - "Voltage activation of Core-MT BK channels. a-d)..."

      It would be easier to find details corresponding to individual panels if they were referenced individually. For example:

      "a-d) results from a 10-μs simulation under 750 mV (sim2b in Table S1). Each data point represents the average of four subunits for a given snapshot (thin grey lines), and the colored thick lines plot the running average. a) z-displacement of key side chain charged groups from initial positions. The locations of charged groups were taken as those of guanidinium CZ atoms (for Arg) and sidechain carboxyl carbons (for Asp/Glu) b) z-displacement of centers-of-mass of VSD helices from initial positions, c) backbone RMSD of the pore-lining S6 (F307-L325) to the open state, and d) tilt angles of all TM helices. Only residues 313-324 of S6 were included inthe tilt angle calculation, and the values in the open and closed Cryo-EM structures are marked using purple dashed lines. "

      We appreciate the thoughtful suggestion and have revised the caption as suggested.

      (6) Figure S1 - column labels a,b,c, and d should be referenced in the legend.

      The references to column labels have been added to Figure S1 caption.

      (7) References need to be double-checked for duplicates and formatting.

      a) I noticed several duplicate references, but did not do a complete search: Budelli et al 2013 (#68, 100), Horrigan Aldrich 2002 (#22,97), Sun Horrigan 2022 (#40, 86), Jensen et al 2012 (#56,81).

      b) Reference #38 is incorrectly cited with the first name spelled out and the last name abbreviated.

      We appreciate the careful proofreading of the reviewer. The duplicated references were introduced by mistake due to the use of multiple reference libraries. We have gone through the manuscript and removed a total of 5 duplicated references.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has been through a previous level of review. The authors have provided their responses to the previous reviewers, which appear to be satisfactory, and I have no additional comments, beyond the caveats concerning interpretations based on the truncated channel, which are noted above.

      We greatly appreciate the constructive comments and insightful advice. Please see above response to the Reviewing Editor’s comments for response and changes regarding the caveats concerning interpretations of the current simulations.

    1. Indeed it turns out the number of available job opportunities for translators and interpreters has actually been increasing. This is not to say that the technology isn’t good, I think it’s pretty close to as good as it can be at what it does. It’s also not to say that machine translation hasn’t changed the profession of translation: in the article linked above, Bridget Hylak, a representative from the American Translators Association, is quoted as saying “Since the advent of neural machine translation (NMT) around 2016, which marked a significant improvement over traditional machine translation like Google Translate, we [translators and interpreters] have been integrating AI into our workflows.” To explain this apparent contradiction, we need to understand what it is translators actually do because, like us programmers, they suffer from having the nature of their work consistently misunderstood by non-translators. The laity’s image of a translator is a walking dictionary and grammar reference, who substitutes words and and grammatical structures from one language to another with ease, the reality is that a translators’ and interpreters’ work is mostly about ensuring context, navigating ambiguity, and handling cultural sensitivity. This is what Google Translate cannot currently do.

      Shitty text being available in more languages may make people want more good text in their languages, too.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Wu et al presents interesting data on bacterial cell organization, a field that is progressing now, mainly due to the advances in microscopy. Based mainly on fluorescence microscopy images, the authors aim to demonstrate that the two structures that account for bacterial motility, the chemotaxis complex and the flagella, colocalize to the same pole in Pseudomonas aeruginosa cells and to expose the regulation underlying their spatial organization and functioning.

      Strengths:

      The subject is of importance.

      Weaknesses:

      The conclusions are too strong for the presented data. The lack of statistical analysis makes this paper incomplete. The novelty of the findings is not clear.

      We have strengthened the data analysis by including appropriate statistical tests to support our conclusions more convincingly. Additionally, we have refined the description of the research background to better emphasize the novelty and significance of our findings. Please see the detailed responses below for further information.

      Major issues:

      (1) The novelty is in question since in the Abstract the authors highlight their main finding, which is that both the chemotaxis complex and the flagella localize to the same pole, as surprising. However, in the Introduction they state that "pathway-related receptors that mediate chemotaxis, as well as the flagellum are localized at the same cell pole17,18". I am not a pseudomonas researcher and from my short glance at these references, I could not tell whether they report colocalization of the two structures to the same pole. However, I trust the authors that they know the literature on the localization of the chemotaxis complex and flagella in their organism. See also major issue number 5 on the novelty regarding the involvement of c-di-GMP.

      We thank the reviewer for this valuable comment and appreciate the opportunity to clarify our statements.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      (2) Statistics for the microscopy images, on which most conclusions in this manuscript are based, are completely missing. Given that most micrographs present one or very few cells, together with the fact that almost all conclusions depend on whether certain macromolecules are at one or two poles and whether different complexes are in the same pole, proper statistics, based on hundreds of cells in several fields, are absolutely required. Without this information, the results are anecdotal and do not support the conclusions. Due to the importance of statistics for this manuscript, strict statistical tests should be used and reported. Moreover, representative large fields with many cells should be added as supportive information.

      We thank the reviewer for this important comment, which significantly improves the rigor and persuasiveness of our manuscript.

      For the colocalization analyses presented in Fig. 1D and Fig. 2B, we quantified 145 and 101 cells with fluorescently labeled flagella, respectively, and observed consistent colocalization of the chemoreceptor complexes and flagella in all examined cells (now added in the figure legends). Regarding the distribution patterns of chemoreceptors shown in Fig. 3A, we have now included comprehensive statistical analyses for both wild-type and mutant strains. For each strain, more than 300 cells were analyzed across at least three independent microscopic fields, providing robust statistical power (detailed data are presented in Fig. 3C).

      To further strengthen the evidence, statistical tests were applied to confirm the significance and reproducibility of our findings (Fig. 3C). In addition, representative large-field fluorescence images containing numerous cells have been added to the supplementary materials (Fig. S1 and Fig. S3).

      The problem is more pronounced when the authors make strong statements, as in lines 157-158: "The results revealed that the chemoreceptor arrays no longer grow robustly at the cell pole (Figure 2A)". Looking at the seven cells shown in Figure 2A, five of them show polar localization of the chemoreceptors. The question is then: what is the percentage of cells that show precise polar, near-polar, or mid cell localization (the three patterns shown here) in the mutant and in the wild type? Since I know that these three patterns can also be observed in WT cells, what counts is the difference, and whether it is statistically significant.

      We thank the reviewer for raising this important point. Following the reviewer's suggestion, we have now analyzed and categorized the distribution of the chemotaxis complex in both wild-type and flhF mutant strains into three patterns: precise-polar, near-polar, and mid-cell localization. For each strain, more than 200 cells across three independent fields of view were quantified.

      Our statistical analysis shows that in the wild-type strain, approximately 98% of cells exhibit precise polar localization of the chemotaxis complex. In contrast, the ΔflhF mutant displays a clear shift in distribution, with about 5% of cells showing mid-cell localization and 9.5% showing near-polar localization. These differences demonstrate a significant alteration in the spatial pattern upon flhF deletion.

      We have revised the relevant text in lines 166-170 accordingly and included the detailed statistical data in the newly added Fig. S4.

      Even for the graphs shown in Figures 3C and 3D, where the proportion of cells with obvious chemoreceptor arrays and absolute fluorescence brightness of the chemosensory array are shown, respectively, the questions that arise are: for how many individual cells these values hold and what is the significance of the difference between each two strains?

      The number of cells analyzed for each strain is indicated in the original manuscript: 372 wild-type cells (line 123), 221 ΔflhF cells (line 172), 234 ΔfliG cells (line 197), 323 ΔfliF cells (line 200), 672 ΔflhFΔfliF cells (line 202), and 242 ΔmotAΔmotCD cells (line 207). For each strain, data were collected from three independent fields of view. We have now also provided the number of cells in Fig. 3 legend.

      We have now performed statistical comparisons using t-tests between strains. Notably, the measured values in Fig. 3C exhibit a clear, monotonic decrease with successive gene knockouts, supporting the robustness of the observed trend.

      Regarding the absolute fluorescence intensity shown in the original Fig. 3D, the mutants did not display consistent directional changes compared to the wild type. Reliable comparison of absolute fluorescence intensity requires consistent fluorescent protein maturation levels across strains. Given the likely variability in maturation levels between strains, we concluded that this data may not accurately reflect true differences in protein concentrations. Therefore, we have removed the fluorescence intensity graph from the revised manuscript to avoid potential misinterpretation.

      (3) The authors conclude that "Motor structural integrity is a prerequisite for chemoreceptor self-assembly" based on the reduction in cells with chemoreceptor clusters in mutants deleted for flagellar genes, despite the proper polar localization of the chemotaxis protein CheY. They show that the level of CheY in the WT and the mutant strains is similar, based on Western blot, which in my opinion is over-exposed. "To ascertain whether it is motor integrity rather than functionality that influences the efficiency of chemosensory array assembly", they constructed a mutant deleted for the flagella stator and found that the motor is stalled while CheY behaves like in WT cells. The authors further "quantified the proportion of cells with receptor clusters and the absolute fluorescence intensity of individual clusters (Figures 3C-D)". While Figure 3DC suggests that, indeed, the flagella mutants show fewer cells with a chemotaxis complex, Figure 3D suggests that the differences in fluorescence intensity are not statistically significant. Since it is obvious that the regulation of both structures' production and localization is codependent, I think that it takes more than a Western blot to make such a decision.

      We thank the reviewer for the suggestions. To further clarify that the assembly of flagellar motors and chemoreceptor clusters occurs in an orderly manner rather than being merely codependent, we performed additional experiments. Specifically, we constructed a ΔcheA mutant strain, in which chemoreceptor clusters fail to assemble. Using in vivo fluorescent labeling of flagellar filaments, we observed that the proportion of cells with flagellar filaments in the ΔcheA strain was comparable to that of the wild type (Fig. S5).

      In contrast, mutants lacking complete motor structures, such as ΔfliF and ΔfliG, showed a significant reduction in the proportion of cells with obvious receptor clusters (Fig. 3C). Based on these results, we conclude that the structural integrity of the flagellar motor is, to a certain extent, a prerequisite for the self-assembly of chemoreceptor clusters.

      Accordingly, we have revised the relevant statement in lines 213-217 of the manuscript to reflect this clarification.

      (4) I wonder why the authors chose to label CheY, which is the only component of the chemotaxis complex that shuttles back and forth to the base of the flagella. In any case, I think that they should strengthen their results by repeating some key experiments with labeled CheW or CheA.

      We thank the reviewer for this valuable suggestion. In our study, we initially focused on the positional relationship between chemoreceptor clusters and flagella, then investigated factors influencing cluster distribution and assembly efficiency. The physiological significance of motor and cluster co-localization was ultimately proposed with CheY as the starting point.

      Previous work by Harwood's group demonstrated that both CheY-YFP and CheA-GFP localize to the old poles of dividing Pseudomonas aeruginosa cells. Since our physiological hypothesis centers on CheY, we chose to label CheY-EYFP in our experiments.

      To further strengthen our conclusions, we constructed a plasmid expressing CheA-CFP and introduced it into the cheY-eyfp strain via electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP (Fig. S2), confirming that CheY-EYFP accurately marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 119-123) and added these data as Fig. S2.

      (5) The last section of the results is very problematic, regarding the rationale, the conclusions, and the novelty. As far as the rationale is concerned, I do not understand why the authors assume that "a spatial separation between the chemoreceptors and flagellar motors should not significantly impact the temporal comparison in bacterial chemotaxis". Is there any proof for that?

      We apologize for the lack of clarity in our original explanation. The rationale behind the statement was initially supported by comparing the timescales of CheY-P diffusion and temporal comparison in chemotaxis. Specifically, the diffusion time for CheY-P to traverse the entire length of a bacterial cell is approximately 100 ms (refs 39&40), whereas the timescale for bacterial chemotaxis temporal comparison is on the order of seconds (ref 41).

      To clarify and strengthen this argument, we have expanded the discussion as follows:

      The diffusion coefficient of CheY in bacterial cells is about 10 µm2/s, which corresponds to an estimated end-to-end diffusion time on the order of 100 ms (refs 40&41). If the chemotaxis complexes were randomly distributed rather than localized, diffusion times would be even shorter. In contrast, the timescale for the chemotaxis temporal comparison is on the order of seconds (ref. 42). Additionally, a study by Fukuoka and colleagues reported that intracellular chemotaxis signal transduction requires approximately 240 ms beyond CheY or CheY-P diffusion time (ref. 41). Moreover, the intervals of counterclockwise (CCW) and clockwise (CW) rotation of the P. aeruginosa flagellar motor under normal conditions are 1-2 seconds, as determined by tethered cell or bead assays (refs. 30&43).

      Taken together, these indicate that for P. aeruginosa, which moves via a run-reverse mode, the potential 100 ms reduction in response time due to co-localization of the chemotaxis complex and motor has a limited effect on overall chemotaxis timing.

      We have revised the corresponding text accordingly (lines 238-245) to better explain this rationale.

      More surprising for me was to read that "The signal transduction pathways in E. coli are relatively simple, and the chemotaxis response regulator CheY-P affects only the regulation of motor switching". There are degrees of complexity among signal transduction pathways in E. coli, but the chemotaxis seems to be ranked at the top. CheY is part of the adaptation. Perfect adaptation, as many other issues related to the chemotaxis pathway, which include the wide dynamic range, the robustness, the sensitivity, and the signal amplification (gain), are still largely unexplained. Hence, such assumptions are not justified.

      We apologize for the confusion and imprecision in our original statements. Our intention was to convey that the chemotaxis pathway in E. coli is relatively simple compared to the more complex chemosensory systems in P. aeruginosa. We did not mean to generalize this simplicity to all signal transduction pathways in E. coli.

      We acknowledge that E. coli chemotaxis is a highly sophisticated system, involving processes such as perfect adaptation, wide dynamic range, robustness, sensitivity, and signal amplification, many aspects of which remain incompletely understood. CheY indeed plays a crucial role in adaptation and motor switching regulation.

      Accordingly, we have revised the original text (lines 249-255) to avoid any misunderstanding.

      More perplexing is the novelty of the authors' documentation of the effect of the chemotaxis proteins on the c-di-GMP level. In 2013, Kulasekara et al. published a paper in eLife entitled "c-di-GMP heterogeneity is generated by the chemotaxis machinery to regulate flagellar motility". In the same year, Kulasekara published a paper entitled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa". The authors did not cite these works and I wonder why.

      We apologize for having been unaware of these important references and thank the reviewer for bringing them to our attention. We have now cited the eLife paper and the PhD thesis titled "Insight into a Mechanism Generating Cyclic di-GMP Heterogeneity in Pseudomonas aeruginosa" by Kulasekara et al.

      Regarding novelty, there are key differences between our findings and those reported by Kulasekara et al. While they proposed that CheA influences c-di-GMP heterogeneity through interaction with a specific phosphodiesterase (PDE), our results demonstrate that overexpression of CheY leads to an increase in intracellular c-di-GMP levels.

      We have revised the original text accordingly (lines 358-362) to clarify these distinctions.

      (6) Throughout the manuscript, the authors refer to foci of fluorescent CheY as "chemoreceptor arrays". If anything, these foci signify the chemotaxis complex, not the membrane-traversing chemoreceptors.

      We thank the reviewer for this clarification. We have revised the manuscript accordingly to refer to the fluorescent CheY foci as representing the chemotaxis complex rather than the chemoreceptor arrays.

      Conclusions:

      The manuscript addresses an interesting subject and contains interesting, but incomplete, data.

      Reviewer #2 (Public Review):

      Summary:

      Here, the authors studied the molecular mechanisms by which the chemoreceptor cluster and flagella motor of Pseudomonas aeruginosa (PA) are spatially organized in the cell. They argue that FlhF is involved in localizing the receptors-motor to the cell pole, and even without FlhF, the two are colocalized. FlhF is known to cause the motor to localize to the pole in a different bacterial species, Vibrio cholera, but it is not involved in receptor localization in that bacterium. Finally, the authors argue that the functional reason for this colocalization is to insulate chemotactic signaling from other signaling pathways, such as cyclic-di-GMP signaling.

      Strengths:

      The experiments and data look to be high-quality.

      Weaknesses:

      However, the interpretations and conclusions drawn from the experimental observations are not fully justified in my opinion.

      I see two main issues with the evidence provided for the authors' claims.

      (1) Assumptions about receptor localization:

      The authors rely on YFP-tagged CheY to identify the location of the receptor cluster, but CheY is a diffusible cytoplasmic protein. In E. coli, CheY has been shown to localize at the receptor cluster, but the evidence for this in PA is less strong. The authors refer to a paper by Guvener et al 2006, which showed that CheY localizes to a cell pole, and CheA (a receptor cluster protein) also localizes to a pole, but my understanding is that colocalization of CheY and CheA was not shown. My concern is that CheY could instead localize to the motor in PA, say by binding FliM. This "null model" would explain the authors' observations, without colocalization of the receptors and motor. Verifying that CheY and CheA are colocalized in PA would be a very helpful experiment to address this weakness.

      We thank the reviewer for this valuable suggestion. We agree that verifying the colocalization of CheY and CheA would strengthen our conclusions. To address this, we constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex rather than the flagellar motor.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      (2) Argument for the functional importance of receptor-motor colocalization at the pole:

      The authors argue that colocalization of the receptors and motors at the pole is important because it could keep phosphorylated CheY, CheY-p, restricted to a small region of the cell, preventing crosstalk with other signaling pathways. Their evidence for this is that overexpressing CheY leads to higher intracellular cdG levels and cell aggregation. Say that the receptors and motors are colocalized at the pole. In E. coli, CheY-p rapidly diffuses through the cell. What would prevent this from occurring in PA, even with colocalization?

      We appreciate the reviewer's insightful question. The colocalization of both the signaling source (the kinase) and sink (the phosphatase) at the chemoreceptor complex at the cell pole results in a rapid decay of CheY-P concentration within approximately 0.2 µm from the cell pole, leading to a nearly uniform distribution elsewhere in the cell, as demonstrated by Vaknin and Berg (ref. 46). This spatial arrangement effectively confines high CheY-P levels to the pole region. When the motor is also localized at the cell pole, this reduces the need for elevated CheY-P concentrations throughout the cytoplasm, thereby minimizing potential crosstalk with other signaling pathways.

      We have revised the manuscript accordingly (lines 280-286) to clarify this point.

      Elevating CheY concentration may increase the concentration of CheY-p in the cell, but might also stress the cells in other unexpected ways. It is not so clear from this experiment that elevated CheY-p throughout the cell is the reason that they aggregate, or that this outcome is avoided by colocalizing the receptors and motor at the same pole. If localization of the receptor array and motor at one pole were important for keeping CheY-p levels low at the opposite pole, then we should expect cells in which the receptors and motor are not at the pole to have higher CheY-p at the opposite pole. According to the authors' argument, it seems like this should cause elevated cdG levels and aggregation in the delta flhF mutants with wild-type levels of CheY. But it does not look like this happened. Instead of varying CheY expression, the authors could test their hypothesis that receptor-motor colocalization at the pole is important for preventing crosstalk by measuring cdG levels in the flhF mutant, in which the motor (and maybe the receptor cluster) are no longer localized in the cell pole.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using ΔflhF mutants.

      First, as noted above, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in <i.E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigated the assembly and polar localization of the chemosensory cluster in P. aeruginosa. They discovered that a certain protein (FlhF) is required for the polar localization of the chemosensory cluster while a fully-assembled motor is necessary for the assembly of the cluster. They found that flagella and chemosensory clusters always co-localize in the cell; either at the cell pole in wild-type cells or randomly-located in the cell in FlhF mutant cells. They hypothesize that this co-localization is required to keep the level of another protein (CheY-P), which controls motor switching, at low levels as the presence of high levels of this protein (if the flagella and chemosensory clusters were not co-localized) is associated with high-levels of c-di-GMP and cell aggregations.

      Strengths:

      The manuscript is clearly written and straightforward. The authors applied multiple techniques to study the bacterial motility system including fluorescence light microscopy and gene editing. In general, the work enhances our understanding of the subtlety of interaction between the chemosensory cluster and the flagellar motor to regulate cell motility.

      Weaknesses:

      The major weakness in this paper is that the authors never discussed how the flagellar gene expression is controlled in P. aeruginosa. For example, in E. coli there is a transcriptional hierarchy for the flagellar genes (early, middle, and late genes, see Chilcott and Hughes, 2000). Similarly, Campylobacter and Helicobacter have a different regulatory cascade for their flagellar genes (See Lertsethtakarn, Ottemann, and Hendrixson, 2011). How does the expression of flagellar genes in P. aeruginosa compare to other species? How many classes are there for these genes? Is there a hierarchy in their expression and how does this affect the results of the FliF and FliG mutants? In other words, if FliF and FliG are in class I (as in E. coli) then their absence might affect the expression of other later flagellar genes in subsequent classes (i.e., chemosensory genes). Also, in both FliF and FliG mutants no assembly intermediates of the flagellar motor are present in the cell as FliG is required for the assembly of FliF (see Hiroyuki Terashima et al. 2020, Kaplan et al. 2019, Kaplan et al. 2022). It could be argued that when the motor is not assembled then this will affect the expression of the other genes (e.g., those of the chemosensory cluster) which might play a role in the decreased level of chemosensory clusters the authors find in these mutants.

      We thank the reviewer for the insightful comments. P. aeruginosa possesses a four-tiered transcriptional regulatory hierarchy controlling flagellar biogenesis. Within this system, fliF and fliG belong to class II genes and are regulated by the master regulator FleQ. In contrast, chemotaxis-related genes such as cheA and cheW are regulated by intracellular free FliA, and currently, there is no evidence that FliA activity is influenced by proteins like FliG.

      To verify that the expression of core chemotaxis proteins was not affected by deletion of fliG, we performed Western blot analyses to compare CheY levels in wild-type, ΔfliF, and ΔfliG strains. We observed no significant differences, indicating that the reduced presence of receptor clusters in these mutants is not due to altered expression of chemotaxis proteins.

      Accordingly, we have revised the manuscript (lines 341-348) and updated Fig. 3B to reflect these findings.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers comment on several important aspects that should be addressed, namely: the lack of statistical analysis; the need for clarifications regarding assumptions made regarding receptor localization; the functional importance of receptor-motor colocalization; and the need for an elaborate discussion of flagellar gene expression. Also, two reviewers pointed out the need to prove the co-localization of CheY and CheA; This is important since CheY is dynamic, shuttling back and forth from the chemotaxis complex to the base of the flagella, whereas CheA (or cheW or, even better, the receptors) is considered less dynamic and an integral part of the chemotaxis complex.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      Line 43: "ubiquitous" - I would choose another word.

      We changed "ubiquitous" to "widespread".

      Line 49: "order" - change to organize.

      We changed "order" to "organize".

      Line 52: "To grow and colonize within the host, bacteria have evolved a mechanism for migrating...". Motility "towards more favorable environments" is an important survival strategy of bacteria in various ecological niches, not only within the host.

      We revised it to "grow and colonize in various ecological niches".

      Line 72: Define F6 in "F6 pathway-related receptors".

      The proteins encoded by chemotaxis-related genes collectively constitute the F6 pathway, which we have now explained in the manuscript text.

      Line 72-73: Do references 17 &18 really report colocalization of the chemotaxis receptor and flagella to the same pole? If these or other reports document such colocalization, then the sentence in the Abstract "Surprisingly, we found that both are located at the same cell pole..." is not correct.

      Kazunobu et al. (ref. 18) used scanning electron microscopy to preliminarily characterize the flagellation pattern of Pseudomonas aeruginosa during cell division, showing that existing flagella are located at the old pole. Zehra et al. (ref. 17), through fluorescence microscopy, observed that CheA and CheY proteins in dividing cells are typically also present at the old pole. Based on these observations, we inferred in the Introduction that the chemotaxis complex and flagellum may localize to the same cell pole.

      However, this inference is indirect and lacks direct live-cell evidence of colocalization, leaving its validity to be confirmed. This uncertainty was indeed the starting point and motivation for our study.

      In our work, we simultaneously visualized flagellar filaments and core chemoreceptor proteins at the single-cell level in P. aeruginosa. We characterized the assembly and spatial coordination of the chemotaxis network and flagellar motor throughout the cell cycle, providing direct evidence of their colocalization and coordinated assembly. This represents a significant advance beyond prior indirect observations and supports the novelty of our study.

      Accordingly, we have revised the relevant statements in lines 71-75 of the manuscript to better reflect the current state of the literature and emphasize the novelty of our direct observations.

      Line 108: "CheY has been shown to colocalize with chemoreceptors". The authors rely here (reference 29) and in other places on findings in E. coli. However, in the Introduction, they describe the many differences between the motility systems of P. aeruginosa and E. coli, e.g., the number of chemosensory systems and their spatial distribution (E. coli is a peritrichous bacterium, as opposed to the monotrichous bacterium P. aeruginosa). There seem to be proofs for colocalization of the Che and MCP proteins in P. aeruginosa, which should be cited here.

      Thank you for pointing this out. Harwood's group reported that a cheY-YFP fusion strain exhibited bright fluorescent spots at the cell pole, which disappeared upon knockout of cheA or cheW-genes encoding structural proteins of the chemotaxis complex. This strongly suggests colocalization of CheY with MCP proteins in P. aeruginosa. We have now cited this study as reference 17 in the manuscript.

      Figure 1B: Please replace the order of the schematic presentations, so that the cheY-egfp fusion, which is described first in the text, is at the top.

      We have modified the order of related images in Fig. 1B.

      Line 127: "by introducing cysteine mutations". Replace either by "by introducing cysteines" or by "by substituting several residues with cysteines".

      We changed the relevant statement to "by introducing cysteines".

      Line 144-145: "Given that the physiological and physical environments of both cell poles are nearly identical.". I think that also the physical, but certainly the physiological environment of the two poles is not identical. First, one is an old pole, and the other a new pole. Second, many proteins and RNAs were detected mainly or only in one of the poles of rod-shaped Gram-negative bacteria that are regarded as symmetrically dividing. Although my intuition is that the authors are correct in assuming that "it is unlikely that the unipolar distribution of the chemoreceptor array can be attributed to passive regulatory factors", relating it to the (false) identity between the poles is incorrect.

      We thank the reviewer for this important correction. We agree that the physiological environments of the two poles are not identical, given that one is the old pole and the other the new pole, and that many proteins and RNAs show polar localization in rod-shaped Gram-negative bacteria. Accordingly, we have revised the original text (lines 150-152) to read:

      “Despite potential differences in the physical and especially physiological environments at the two cell poles, it is unlikely that the unipolar distribution of the chemotaxis complex can be attributed to passive regulatory factors.”

      Lines 151-154: "Considering the consistent colocalization pattern between chemosensory arrays and flagellar motors in P. aeruginosa". Does the word consistent relate to different reports on such colocalization or to the results in Figure 1D? In case it is the latter, then what is the word consistent based on? All together only 7 cells are presented in the 5 micrographs that compose Figure 1D (back to statistics...).

      We thank the reviewer for raising this point. To clarify, the word "consistent" refers to the observation of colocalization shown in Figure 1D & Figure S3. As noted in the revised figure legend for Figure 1D, a total of 145 cells with labeled flagella were analyzed, all exhibiting consistent colocalization between flagella and chemosensory arrays. Additionally, we have included a new image showing a large field of co-localization in the wild-type strain as Figure S3 to better illustrate this consistency.

      Figure 2A: Omit "Subcellular localization of" from the beginning of the caption.

      We removed the relevant expression from the caption.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend checking that CheY localizes to the receptor cluster in PA. This could be done by tagging cheA with a different fluorophore and demonstrating their colocalization. It would also be helpful to check that they are colocalized in the delta flhF mutant.

      We thank the reviewer for this valuable suggestion. We constructed a plasmid expressing CheA-CFP and introduced it into the CheY-EYFP strain by electroporation. Fluorescence imaging revealed a high degree of spatial overlap between CheA-CFP and CheY-EYFP signals, indicating that CheY-EYFP indeed marks the location of the chemoreceptor complex.

      We have revised the manuscript accordingly (lines 118-123) and included these results in the new Fig. S2.

      The experiments under- and over-expressing CheY part seemed too unrelated to receptor-motor colocalization. I think the authors should think about a more direct way of testing whether colocalization of the motor and receptors is important for preventing signaling crosstalk. One way would be to measure cdG levels in WT and in delta flhF mutants and see if there is a significant difference.

      We thank the reviewer for raising the important point regarding potential cellular stress caused by elevated CheY concentrations, as well as for the suggestion to test the hypothesis using flhF mutants.

      First, as noted in the response to your 2nd comment in Public Review, CheY-P concentration rapidly decreases away from the receptor complex. While deletion of flhF alters the position of the receptor complex, thereby shifting the region of high CheY-P concentration, it does not increase CheY-P levels elsewhere in the cell. Importantly, in the ΔflhF strain, the receptor complex and the motor still colocalize, so this mutant may not effectively test the role of receptor-motor colocalization in preventing crosstalk as suggested.

      Regarding the possibility that elevated CheY levels stress the cells independently of CheY-P signaling, prior work in E. coli by Cluzel et al. (ref. 11) showed that overexpressing CheY several-fold did not cause phenotypic changes, indicating that simple CheY overexpression alone may not be generally stressful. Furthermore, our data indicate that the increase in c-di-GMP levels and subsequent cell aggregation upon CheY overexpression is not an all-or-none switch but occurs progressively as CheY concentration rises.

      To further confirm that CheY overexpression promotes aggregation through increased c-di-GMP levels, we performed additional experiments co-overexpressing CheY and a phosphodiesterase (PDE) from E. coli to reduce intracellular c-di-GMP. These experiments showed that PDE expression mitigates cell aggregation caused by CheY overexpression (Fig. S8).

      We have revised the manuscript accordingly (lines 290-294) and added these new results in Fig. S8.

      Reviewer #3 (Recommendations For The Authors):

      (1) Can the authors elaborate more on the hierarchy of flagellar gene expression in P. aeruginosa and how this relates to their work?

      We thank the reviewer for the suggestion. We have now described the hierarchy of flagellar gene expression in P. aeruginosa in lines 341-348.

      (2) I would suggest that the authors check other flagellar mutants (than FliF and FliG) where the motor is partially assembled (e.g., any of the rod proteins or the P-ring protein), together with FlhF mutant, to see how a partially assembled motor affects the assembly of the chemosensory cluster.

      We thank the reviewer for this valuable suggestion. The P ring, primarily composed of FlgI, acts as a bushing for the peptidoglycan layer, and its absence leads to partial motor assembly. We constructed a ΔflgI mutant and observed that the proportion of cells exhibiting distinct chemotactic complexes was similar to that of the wild-type strain, suggesting that the assembly of the receptor complex is likely influenced mainly by the C-ring and MS-ring structures rather than by the P ring. We have revised the original text accordingly (lines 217-220) and added the corresponding data as Figure S6.

      (3) I would suggest that the authors check the levels of CheY in cells induced with different concentrations of arabinose (i.e., using western blotting just like they did in Figure 3B).

      We have assessed the levels of CheY in cells induced with different concentrations of arabinose using western blotting, as suggested. The results have been incorporated into the manuscript (lines 274-275) and are presented in Figure S7.

      (4) To my eyes, most of the foci in FliF-FlhF mutant in Figure 3A are located at the pole (which is unlike the FlhF mutant in Figure 2). Is this correct? I would suggest that the authors also investigate this to see where the chemosensory cluster is located.

      We thank the reviewer for pointing this out. The distribution of the chemotaxis complex in the ΔflhFΔfliF strain was investigated and showed in Fig. S4. Indeed, most of the chemoreceptor foci in this mutant are located at the pole. This probably suggests that, in the absence of both FlhF and an assembled motor, the position of the receptor complex may be largely influenced by passive factors such as membrane curvature. This interesting possibility warrants further investigation in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this work, the authors recorded the dynamics of the 5-HT with fiber photometry from CA1 in one hemisphere and LFP from CA1 in the other hemisphere. They observed an ultra-slow oscillation in the 5-HT signal during both wake fulness and NREM sleep. The authors have studied different phases of the ultra-slow oscillation to examine the potential difference in the occurrence of some behavioral state-related physiological phenomena hippocampal ripples, EMG, and inter-area coherence).

      Strengths

      The relation between the falling/rising phase of the ultra-slow oscillation and the ripples is sufficiently shown. There are some minor concerns about the observed relations that should be addressed with some further analysis.

      Systematic observations have started to establish a strong relation between the dynamics of neural activity across the brain and measures of behavioral arousal. Such relations span a wide range of temporal scales that are heavily inter-related. Ultra-slow time-scales are specifically under-studied due to technical limitations and neuromodulatory systems are the strongest mechanistic candidates for controlling/modulating the neural dynamics at these time-scales. The hypothesis of the relation between a specific time-scale and one certain neuromodulator (5-HT in this manuscript) could have a significant impact on the understanding of the hierarchy in the temporal scales of neural activity.

      Weaknesses:

      One major caveat of the study is that different neuromodulators are strongly correlated across all time scales and related to this, the authors need to discuss this point further and provide more evidence from the literature (if any) that suggests similar ultra-slow oscillations are weaker or lack from similar signals recorded for other neuromodulators such as Ach and NA.

      The reviewer is correct to point out that the levels of different neuromodulators are often correlated. For example, most monoaminergic neurons, including serotonergic neurons of the raphe nuclei, show similar firing rates across behavioral states, firing most during wake behavior, less during NREM, and ceasing firing during ‘paradoxical sleep’ or REM (Eban-Rothschild et al 2018). Notably, other neuromodulators, such as acetylcholine (ACh), show the opposite pattern across states, with highest levels observed during REM, an intermediate level during wake behavior, and the lowest level during NREM (Vazquez et al. 2001). Despite these differences, ultraslow oscillations of both monoaminergic and non-monoaminergic neuromodulators, have been described, albeit only during NREM sleep (Zhang et al. 2021, Zhang et al. 2024, Osorio-Ferero et al. 2021, Kjaerby et al. 2022). How ultraslow oscillations of different neuromodulators are related has been only recently explored (Zhang et al. 2024). In this study, dual recording of oxytocin (Oxt) and ACh with GRAB sensors showed that the levels of the two neuromodulators were indeed correlated at ultraslow frequencies with a 2 s temporal shift. Furthermore, this shift could be explained by a hippocampal-to-lateral septum intermediate pathway, in which the level of ACh causally impacts hippocampal activity, which then in turn controls Oxt levels. Given the known temporal relationship between ripples, ACh and Oxt, and now with our work, between ripples and 5-HT, one could infer the relative timing of ultraslow oscillations of ACh, Oxt and 5-HT. While dual recordings of norepinephrine (NE) and 5-HT have not been performed, a similar correlation with temporal shift could be hypothesized given the parallel relationships between NE and spindles (OsorioFerero et al. 2021), and 5-HT and ripples, with the known temporal delay between ripples and spindles (Staresina et al. 2023). The fact that the locus coerulus receives particularly dense projections from the dorsal raphe nucleus (Kim et al. 2004) further suggests that 5-HT ultraslow oscillations could drive NE oscillations. How exactly ultraslow oscillations of serotonin are related to ultraslow oscillations of different neuromodulators in different brain regions remains to be studied.

      We have further addressed this question and how it relates to the issue of causality in the Discussion section of the manuscript (p. 13):

      “In addition to the difficulties involved with typical causal interventions already mentioned, the fact that the levels of different neuromodulators are interrelated and affected by ongoing brain activity makes it very hard to pinpoint ultraslow oscillations of one specific neuromodulator as controlling specific activity patterns, such as ripple timing. While a recent paper purported to show a causative effect of norepinephrine levels on ultraslow oscillations of sigma band power, the fact that optogenetic inhibition of locus coerulus (LC) cells, but also excitation, only caused a minor reduction of the ultraslow sigma power oscillation suggests that other factors also contribute (Osorio-Forero et al., 2021). Generally, it is thought that many neuromodulators together determine brain states in a combinatorial manner, and it is probable that the 5-HT oscillations we measure, like the similar oscillations in NE, are one factor among many.

      Nevertheless, given the known effects of 5-HT on neurons, it is not unlikely that the 5-HT fluctuations we describe have some impact on the timing of ripples, MAs, hippocampal-cortical coherence, or EMG signals that correlate with either the rising or descending phase. In fact, causal effects of 5-HT on ripple incidence (Wang et al. 2015, ul Haq et al. 2016 and Shiozaki et al. 2023), MA frequency (Thomas et al. 2022), sensory gating (Lee et al. 2020), which is subserved by inter-areal coherence (Fisher et al. 2020), and movement (Takahashi et al. 2000, Alvarez et al. 2022, Jacobs et al. 1991 and Luchetti et al. 2020) have all been shown. Our added findings that serotonin affects ripple incidence in hippocampal slices in a dose-dependent manner (Figure S1) further suggests that the relationship between ultraslow 5-HT oscillations and ripples we report may indeed result, at least in part, from a direct effect of serotonin on the hippocampal network.

      Whether these ‘causal’ relationships between 5-HT and the different activity measures we describe can be used to support a causal link between ultraslow 5-HT oscillations and the correlated activity we report remains an open question. To that point, some studies have described changes in ultraslow oscillations due to manipulation of serotonin signaling. Specifically, reduction of 5-HT1a receptors in the dentate gyrus was recently shown to reduce the power of ultraslow oscillations of calcium activity in the same region (Turi et al. 2024). Furthermore, psilocin, which largely acts on the 5-HT2a receptor, decreased NREM episode length from around 100 s to around 60 s, and increased the frequency of brief awakenings (Thomas et al. 2022). While ultraslow oscillations were not explicitly measured in this study, the change in the rhythmic pattern of NREM sleep episodes and brief awakenings, or microarousals, suggests an effect of psilocin on ultraslow oscillations during NREM. Although these studies do not necessarily point to an exclusive role for 5-HT in controlling ultraslow oscillations of different brain activity patterns, they show that changes in 5-HT can contribute to changes in brain activity at ultraslow frequencies.”

      A major question that has been left out from the study and discussion is how the same level of serotonin before and after the peak could be differentially related to the opposite observed phenomenon. What are the possible parallel mechanisms for distinguishing between the rising and falling phases? Any neurophysiological evidence for sensing the direction of change in serotonin concentration (or any other neuromodulator), and is there any physiological functionality for such mechanisms?

      We have added a paragraph in the discussion to address how this differentiation of the 5-HT signal may be carried out (Discussion, paragraph #3, p. 10):

      “In order for the ultraslow oscillation phase to segregate brain activity, as we have observed, the hippocampal network must somehow be able to sense the direction of change of serotonin levels. While single-cell mechanisms related to membrane potential dynamics are typically too fast to explain this calculation, a theoretical work has suggested that feedback circuits can enable such temporal differentiation, also on the slower timescales we observe (Tripp and Eliasmith, 2010). Beyond the direction of change in serotonin levels, temporal differentiation could also enable the hippocampal network to discern the steeper rising slope versus the flatter descending slope that we observe in the ultraslow 5-HT oscillations (Figure S2), which may also be functionally relevant (Cole and Voytek, 2017). The distinction between the rising and falling phase of ultraslow oscillations is furthermore clearly discernible at the level of unit responses, with many units showing preferences for either half of the ultraslow period (Figure S6). Another factor that could help distinguish the rising from the falling phase is the level of other neuromodulators, as it is likely the combination of many neuromodulators at any given time that defines a behavioral substate. Given the finding that ACh and Oxt exhibit ultraslow oscillations with a temporal shift (Zhang et al. 2024), one could posit that distinct combinations of different levels of neuromodulators could segregate the rising from the falling phase via differential effects of the combination of neuromodulators on the hippocampal network.”

      Functionally, the ability to distinguish between the rising and falling phases of an oscillatory cycle is a form of phase coding. A well-known example of this can be seen in hippocampal place cells, which fire relative to the ongoing theta oscillations. The key advantage of phase coding is that it introduces an additional dimension, i.e. phase of firing, beyond the simple rate of neural firing. This allows for the multiplexing of information (Panzeri et al., 2010), enabling the brain to encode more complex patterns of activity. Moreover, phase coding is metabolically more efficient than traditional spike-rate coding (Fries et al., 2007).

      Reviewer #2 (Public review):

      Summary:

      In their study, Cooper et al. investigated the spontaneous fluctuations in extracellular 5-HT release in the CA1 region of the hippocampus using GRAB5-HT3.0. Their findings revealed the presence of ultralow frequency (less than 0.05 Hz) oscillations in 5-HT levels during both NREM sleep and wakefulness. The phase of these 5-HT oscillations was found to be related to the timing of hippocampal ripples, microarousals, electromyogram (EMG) activity, and hippocampal-cortical coherence. In particular, ripples were observed to occur with greater frequency during the descending phase of 5-HT oscillations, and stronger ripples were noted to occur in proximity to the 5-HT peak during NREM. Microarousal and EMG peaks occurred with greater frequency during the ascending phase of 5-HT oscillations. Additionally, the strongest coherence between the hippocampus and cortex was observed during the ascending phase of 5-HT oscillations. These patterns were observed in both NREM sleep and the awake state, with a greater prevalence in NREM. The authors posit that 5-HT oscillations may temporally segregate internal processing (e.g., memory consolidation) and responsiveness to external stimuli in the brain.

      Strengths:

      The findings of this research are novel and intriguing. Slow brain oscillations lasting tens of seconds have been suggested to exist, but to my knowledge they have never been analyzed in such a clear way. Furthermore, although it is likely that ultra-slow neuromodulator oscillations exist, this is the first report of such oscillations, and the greatest strength of this study is that it has clarified this phenomenon both statistically and phenomenologically.

      Weaknesses:

      As with any paper, this one has some limitations. While there is no particular need to pursue them, I will describe ten of them below, including future directions:

      (1) Contralateral recordings: 5-HT levels and electrophysiological recordings were obtained from opposite hemispheres due to technical limitations. Ipsilateral simultaneous recordings may show more direct relationships.

      Although we argue that bilateral symmetry defines both the serotonin system and many hippocampal activity patterns (Methods: Dual fiber photometry and silicon probe recordings), we agree that ipsilateral recordings would be superior to describe the link between serotonin and electrophysiology in the hippocampus. In addition to noting that a recent study has adopted the same contralateral design (Zhang et al. 2024), we add a reference further supporting bilateral hippocampal synchrony, specifically of dentate spikes (Farrell et al. 2024). However, as functional lateralization has been recently proposed to underlie certain hippocampal functions in the rodent (Jordan 2020), future studies should ideally include both imaging and electrophysiology in a single hemisphere to guarantee local correlations rather than assuming inter-hemispheric synchrony. This could be accomplished using an integrated probe with attached optical fibers, as described in Markowitz et al. 2018, which is however technically more challenging and has, to our knowledge, not yet been implemented with fiber photometry recordings with GRAB sensors. Given the required separation of a few hundred micrometers between the probe shanks and the optical fiber cannula, it is important to consider whether the recordings are capturing the same neuronal populations. For example, there is a risk of recording electrical activity from dorsal hippocampal neurons while simultaneously measuring light signals from neurons in the intermediate hippocampus, which are functionally distinct populations (Fanselow and Dong 2009).

      (2) Sample size: The number of mice used in the experiments is relatively small (n=6). Validation with a larger sample size would be desirable.

      While larger sample sizes generally reduce the influence of random variability and minimize the impact of outliers on conclusions, our use of mixed-effects models mitigates these concerns by accounting for both inter-session and inter-mouse variability. With this approach, we explicitly model random effects, such as the variability between individual mice and sessions, alongside fixed effects (such as treatment), which ensures that our results are not driven by random fluctuations in a few individual mice or sessions. Furthermore, the inclusion of random intercepts and slopes in the models allows for the possibility that different animals and/or sessions have different baseline characteristics and respond to different degrees of magnitude to the treatment. In summary, while validating these findings with a larger sample size would certainly help detect more subtle effects, we are confident in the robustness of the conclusions presented.

      (3) Lack of causality: The observed associations show correlations, not direct causal relationships, between 5-HT oscillations and neural activity patterns.

      We agree that the data we present in this study is largely correlational and generally avoid claims of causality in the manuscript. In the Discussion section, we discuss barriers to interpreting typical causal interventions in vivo, such as optogenetic activation of raphe nuclei: “The two previously mentioned in vivo studies showing reduced ripple incidence…”(paragraph #10, pg. 12), as well as an added section on further causality considerations in the Discussion section of the manuscript (paragraph #12, pg. 13): “In addition to the difficulties involved with…”

      Due to these barriers, as a first step, we wanted to describe how physiological changes in serotonin levels are correlated to changes in the hippocampal activity. Equipped with a deeper understanding of physiological serotonin dynamics, future studies could explore interventions that modulate serotonin in keeping with the natural range of serotonin fluctuations for a given state. On that point, another challenge which we have not mentioned in the manuscript is that modulating serotonin, or any neuromodulator’s levels, has the potential, depending on the degree of modulation, to transition the brain to an entirely different behavioral state. This then complicates interpretation, as one is not sure whether effects observed are due to the changes in the neuromodulator itself, or secondary to changes in state. At the same time, 5-HT activity drives networks which in return can change the release of other neurotransmitters, leading to indirect effects.

      The results of our in vitro experiments suggest that a causal relationship between serotonin and ripples is possible (Figure S1). Though the hippocampal slice preparation is clearly an artificial model, it provides a controlled environment to isolate the effects of serotonin manipulation on the hippocampal formation, without the confounding influence of systemic 5-HT fluctuations in other brain regions. Notably, the dose-dependent effects of serotonin (5-HT) wash-in on ripple incidence observed in vitro closely mirror the inverted-U dose-response curve seen in our in vivo experiments across states, where small increases in serotonin lead to the highest ripple incidence, and both lower and higher levels correspond to reduced ripple activity. This parallel suggests that the gradual washing of serotonin in our in vitro system may mimic the tonic firing changes in serotonergic neurons that occur during state transitions in vivo. These findings underscore the importance of studying how different dynamics of serotonin modulation can differentially affect hippocampal network activity.

      (4) Limited behavioral states: The study focuses primarily on sleep and quiet wakefulness. Investigation of 5-HT oscillations during a wider range of behavioral states (e.g., exploratory behavior, learning tasks) may provide a more complete understanding.

      We agree that future studies should investigate a broader range of behavioral states. For this study, as we were focused on general sleep and wake patterns, our recordings were done in the home cage, and we limited ourselves to the basic behavioral states described in the paper. Future studies should be designed to investigate ultraslow 5-HT oscillations during different behaviors, such as continuous treadmill running. Specifically, a finer segregation of extended wake behaviors by level of arousal could greatly add to our understanding of the role of ultraslow serotonin oscillations.

      (5) Generalizability to other brain regions: The study focuses on the CA1 region of the hippocampus. It's unclear whether similar 5-HT oscillation patterns exist in other brain regions.

      Given the reported ultraslow oscillations of population activity in serotonergic neurons of the dorsal raphe nucleus (Kato et al. 2022) as well as the widespread projections of the serotonergic nuclei, we would expect a broad expression of ultraslow 5-HT oscillations throughout the brain. So far, ultraslow 5-HT oscillations have been described in the basal forebrain, as well as in the dentate gyrus, in addition to what we have shown in CA1 (Deng et al. 2024 and Turi et al. 2024). Furthermore, our results showing that hippocampal-cortical coherence changes according to the phase of hippocampal ultraslow 5-HT oscillations suggests that 5-HT can affect oscillatory activity either indirectly by modulating hippocampal cells projecting to the cortical network or directly by modulating the cortical postsynaptic targets. Given the heterogeneity in projection strength, as well as in pre- and postsynaptic serotonin receptor densities across brain regions (de Filippo & Schmitz, 2024), it would be interesting to see whether local ultraslow 5-HT oscillations are differentially modulated, e.g. in terms of oscillation power. Future studies investigating different brain regions via implantation of multiple optic fibers in different brain areas or using the mesoscopic imaging approach adopted in Deng et al. 2024, will be needed to examine the extent of spatial heterogeneity in this ultraslow oscillation.

      (6) Long-term effects not assessed: Long-term effects of ultra-low 5-HT oscillations (e.g., on memory consolidation or learning) were not assessed.

      While beyond the scope of our current study, we agree that an important next step would involve modulating the ultraslow serotonin oscillation after learning, and then examining potential effects on memory consolidation, presumably via changes in ripple dynamics, though many possibilities could explain potential effects. There, our results suggest it would be important to isolate effects due to the change in ultraslow oscillation features, rather than simply overall levels of 5-HT. To that end, it would be important to test different modulation dynamics, specifically modulating the oscillation strength, around a constant mean 5-HT level by carefully timed optogenetic stimulation/inhibition. Afterwards, showing a clear correlation between the strength of the 5-HT modulation and memory performance would be important to establishing the relationship, as done in Lecci et al 2017, where more prominent ultraslow oscillations of sigma power in the cortex during sleep, alongside a higher density of spindles, were correlated with better memory consolidation. Given the tight coupling of spindles and ripples during sleep, it is possible that a similar effect on memory consolidation would be observed following changes in ultraslow 5-HT oscillation power.

      (7) Possible species differences: It's uncertain whether the findings in mice apply to other mammals, including humans.

      We agree that the experiments should ultimately be replicated in humans. In the 2017 study by Lecci et al., the authors highlighted the shared functional requirements for sleep across species, despite apparent differences, such as variations in sleep volume. To explore these commonalities, the researchers conducted parallel experiments in both mice and humans, aiming to identify a universal organizing structure. They discovered that the ultraslow oscillation of sigma power serves this role, enabling both species to balance the competing demands of arousability and sleep imperviousness. Based on this finding, it is plausible that ultraslow oscillations of serotonin, which similarly modulate activity according to arousal levels, would serve a comparable function in humans.

      (8) Technical limitations: The temporal resolution and sensitivity of the GRAB5-HT3.0 sensor may not capture faster 5-HT dynamics.

      The kinetics of the GRAB5-HT3.0 sensor used in this study limit the range of serotonin dynamics we can observe. However, the ultraslow oscillations we measure reflect temporal changes on the scale of 20 s and greater, whereas the GRAB sensor we use has sub-second on kinetics and below 2 s off kinetics (Deng et al. 2024). Therefore, the sensor is capable of reporting much faster activity than the ultraslow oscillations we observe, indicating that the ultraslow 5-HT signal accurately reflects the dynamics on this time scale. Furthermore, the presence of ultraslow oscillations in spiking activity—observed in the hippocampal formation (Gonzalo Cogno et al., 2024; Aghajan et al., 2023; Penttonen et al., 1999) and in the dorsal raphe (Mlinar et al., 2016), which are not affected by the same temporal smoothing, suggests that the oscillations we record are not likely due to signal aliasing, but instead reflect genuine oscillatory activity. Of course, this does not preclude that other, faster serotonin dynamics are also present in our signal, some of which may be too fast to be observed. For instance, rapid serotonin signaling via the ionotropic 5-HT3a receptors could be missed in our recordings. Additionally, with the fiber photometry approach we adopted, we are limited to capturing spatially broad trends in serotonin levels, potentially overlooking more localized dynamics.

      (9) Interactions with other neuromodulators: The study does not explore interactions with other neuromodulators (e.g., norepinephrine, acetylcholine) or their potential ultraslow oscillations.

      We agree that the interaction between neuromodulators in the context of ultraslow oscillations is an important issue, which we have addressed in our response to reviewer #1 under ‘Weaknesses.’

      (10) Limited exploration of functional significance: While the study suggests a potential role for 5-HT oscillations in memory consolidation and arousal, direct tests of these functional implications are not included.

      We agree and reference our answer to (6) regarding memory consolidation. Regarding arousal, direct tests of arousability to different sensory stimuli during different phases of the ultraslow 5-HT oscillation during sleep would be beneficial, in addition to the indirect measures of arousal we examine in the current study, e.g. degree of movement (icEMG) and long range coherence. In line with what we have shown, Cazettes et al. (2021) has demonstrated a direct relationship between 5-HT levels and pupil size, an indicator of arousal level, which like our findings, is consistent across behavioral states.

      Reviewer #3 (Public review):

      Summary:

      The activity of serotonin (5-HT) releasing neurons as well as 5-HT levels in brain structures targeted by serotonergic axons are known to fluctuate substantially across the animal's sleep/wake cycle, with high 5-HT levels during wakefulness (WAKE), intermediate levels during non-REM sleep (NREM) and very low levels during REM sleep. Recent studies have shown that during NREM, the activity of 5HT neurons in raphe nuclei oscillates at very low frequencies (0.01 - 0.05 Hz) and this ultraslow oscillation is negatively coupled to broadband EEG power. However, how exactly this 5-HT oscillation affects neural activity in downstream structures is unclear.

      The present study addresses this gap by replicating the observation of the ultraslow oscillation in the 5-HT system, and further observing that hippocampal sharp wave-ripples (SWRs), biomarkers of offline memory processing, occur preferentially in barrages on the falling phase of the 5-HT oscillation during both wakefulness and NREM sleep. In contrast, the raising phase of the 5-HT oscillation is associated with microarousals during NREM and increased muscular activity during WAKE. Finally, the raising 5-HT phase was also found to be associated with increased synchrony between the hippocampus and neocortex. Overall, the study constitutes a valuable contribution to the field by reporting a close association between raising 5-HT and arousal, as well as between falling 5-HT and offline memory processes.

      Strengths:

      The study makes compelling use of the state-of-the-art methodology to address its aims: the genetically encoded 5-HT sensor used in the study is ideal for capturing the ultraslow 5-HT dynamics and the novel detection method for SWRs outperforms current state-of-the-art algorithms and will be useful to many scientists in the field. Explicit validation of both of these methods is a particular strength of this study.

      The analytical methods used in the article are appropriate and are convincingly applied, the use of a general linear mixed model for statistical analysis is a particularly welcome choice as it guards against pseudoreplication while preserving statistical power.

      Overall, the manuscript makes a strong case for distinct sub-states across WAKE and NREM, associated with different phases of the 5-HT oscillation.

      Weaknesses:

      All of the evidence presented in the study is correlational. While the study mostly avoids claims of causality, it would still benefit from establishing whether the 5-HT oscillation has a direct role in the modulation of SWR rate via e.g. optogenetic activation/inactivation of 5-HT axons. As it stands, the possibility that 5-HT levels and SWRs are modulated by the same upstream mechanism cannot be excluded.

      We agree that causality claims cannot be made with our data, and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      One major question in the presented data is the nature of the asymmetrical shape of the targeted slow events. How much does it reflect the 5-HT concentration and how much is this shape affected by the dynamics of the designed 5-HT sensor? This needs to be addressed in more detail referencing the original paper for the used sensor.

      We have added a paragraph in the Results section of the manuscript to address the asymmetric waveform of the ultraslow 5-HT oscillations and whether it could be affected by the asymmetric kinetics of the GRAB sensor we use: “The waveform of these ultraslow 5-HT oscillations…” (Results, paragraph #4, pg. 5). We include an extended answer to the question here:

      Indeed, the GRAB5-HT3.0 sensor we use in the study shows activation response kinetics which are faster than their deactivation time, with time constants at 0.25 s and 1.39 s, respectively (Deng et al. 2024). Likewise, the slope of the rising phase of the ultraslow serotonin oscillation we measure is faster than the slope of the falling phase, and the ratio of time spent in the rising phase versus the falling phase is less than 1, indicating longer falling phases (Figure S2). Although we cannot completely rule out that the asymmetric shape of the ultraslow serotonin oscillations we record is affected by this asymmetry in the 5-HT sensor kinetics, we believe this is unlikely, as the 5-HT signal clearly contains reductions in 5-HT levels that are much faster than the descending phase of the ultraslow oscillation. Although it is difficult to directly compare the different-sized signals, the reported timescales of off kinetics, on the order of a few seconds (Deng et al. 2024), are far below the tens of seconds timescale of the ultraslow oscillation. Furthermore, the finding that some dorsal raphe neurons modulate their firing rate at ultraslow frequencies, and moreover that all examples of such ultraslow oscillations shown display clear asymmetry in rising time versus decay, suggests that the asymmetry we observe in our data could be due to neural activity rather than temporal smoothing by the sensor (Mlinar et al. 2016). In this same direction, another study found similar asymmetry in extracellular 5-HT levels measured with fast scan cyclic voltammetry (FSCV), a technique with greater temporal resolution (sampling rate of 10 Hz) than GRAB sensors, after single pulse stimulation (Bunin and Wightman 1998). In this study, 5-HT was shown to be released extrasynaptically, making the longer clearing time compared to the release time intuitive. Finally, the observation that the onsets and offsets of ripple clusters, recorded with a sampling rate of 20 kHz, are precisely aligned with the peaks and troughs of ultraslow serotonin oscillations (Figure 1, H1-2, columns 2-3) suggests that the duration of the falling phase is not artificially distorted by the temporal smoothing of the sensor dynamics.

      Regardless of the dynamics of the serotonin concentration, it should be noted that the elicited neuronal effect might have different dynamics compared to the 5-HT concentration that need to be more studied: to address this one can either examine the average of the broadband LFP (not high passfiltered by the amplifier) or the distribution of simultaneously recorded spiking activity around the peak of ultra-slow oscillations.

      We have added Figure S6, showing unit activity relative to the phase of ultraslow serotonin oscillations.

      From this analysis, we uncover three groups of units which are largely preserved across states (Figure S6, E vs. F), albeit with a slight temporal shift rightward from NREM to WAKE (Figure S6, C vs. D). Namely, some units spike preferentially during the rising phase, some during the falling phase, and a third group have no clear phase preference. Unit activity during the falling phase is unsurprising, as it is where ripples largely occur, which themselves are associated with spike bursts. During the rising phase, the unit activity we observe could correspond to firing of the hippocampal subpopulation known to be active during NREM interruption states (Jarosiewicz et al. 2002, Miyawaki et al. 2017). While the units’ phase preference was tested based on the category of rising vs. falling phase, as this division described most variation in the data, a few units in the ‘No preference’ group showed heightened activity near the oscillation peak. However, given the very small number of units with this preference, more unit data is needed to describe this group, ideally with high-density recordings. Overall, most units showed a falling vs. rising phase preference, indicating a phase coding of hippocampal activity by 5-HT ultraslow oscillations.

      Related to the previous point, it would be helpful to show the average cycle shape of these oscillations (relative to the phase 0 extracted in Figure 3) and do the shape comparison across sessions and also wake/NREM

      We agree, and to this end we have added Figure S2. From this waveform analysis, we show that the ultraslow serotonin oscillation is asymmetric, with the rising phase having a greater slope, but shorter length, than the falling phase. While this asymmetry is observed both in NREM and WAKE, the slope difference and length ratio difference in rising vs. falling phase is greater in NREM (Figure S2. B).

      In Figure 3D, there seem to be oscillatory rhythms with faster cycles on top of the targeted oscillations. That would make the phase estimation less accurate, e.g. in the left panel, in the second cycle, it is not clear if there are two faster cycles or it is one slow cycle as targeted, and if noted in the rising phase of the second fast cycle there are no ripples. This might suggest that regardless of specific oscillation frequency whenever 5-HT is started to get released, the ripples are suppressed and once the 5-HT is not synaptically effective anymore the ripples start to get generated while the photometry signal starts to wane with the serotonin being cleared. Still, if there is any rhythmicity between bouts of no ripple, it would suggest an ultra-slow regularity in the 5-HT release.

      The reviewer is correct to point out that some faster increases in serotonin, which occur on top of the ultraslow oscillations we measure, seem to be associated with decreased ripple incidence, as in the example referenced. The dominance of ultraslow frequencies in the power spectrum of the 5-HT signal suggests, however, that oscillations faster than the ultraslow oscillations we describe are far less prevalent in the data. While there may be some coupling of ripples and other measures to serotonin oscillations of different frequencies, this may be hard or impossible to detect with phase analysis based on their infrequent occurrence and nonstationary nature. In fact, we show in Figure S3 that the strongest phase modulation of ripples by ultraslow serotonin oscillations is observed in the frequencies we use (0.01-0.06 Hz). Methodologically, phase analysis indeed assumes stationary signals, which are rare if not absent in physiological data (Lo et al. 2009), however generally the narrower the frequency band, the better the phase estimation. The narrow frequency band we use provides phase estimates that are largely robust and unaffected by the presence of faster oscillations, as can be seen in the example phase traces shown in Figure 4.

      The hypothesis that the rising phase burst of synaptic serotonin is what silences ripples, and that with the clearing of serotonin from the synapses, ripples recover, is a possible explanation of our findings. However, if this were the case, one could expect the ripple rate to increase over the course of the falling phase of ultraslow 5-HT oscillations, as 5-HT decreases, and peak at the trough. This is at odds with what we observe, namely a fairly uniform distribution of ripples along the falling phase (Figure 3F2,F4). Furthermore, the Mlinar et al. 2016 study describes a subpopulation of raphe neurons whose firing rates themselves oscillate at ultraslow frequencies, rather than on-off bursting at ultraslow frequencies, which would argue against this hypothesis. However, as this study looks at a small number of neurons in slices, further in vivo experiments examining firing rates of median raphe neurons are required to understand how the ultraslow oscillation of extracellular serotonin that we measure is generated as well as how it is related to ripple rates.

      In Figure 3B, it is not clear why IRI is z-scored. It would be informative to have the actual value of IRI. What is the z relative to? Is it the mean value of IRI in each recording session? Is this to reduce the variability across sessions?

      We have now included in Figure 3D a box plot displaying the IRI distributions across different states and sessions. To minimize inter-session variability, data were z-scored within each session for visualization purposes. However, all general linear models were based on raw data, and as a result, the raw differences in IRI are shown in Figure 3C.

      Figure 3E, panel labels don't match with the caption

      We are grateful to the reviewer for pointing out this mistake, which we have corrected in the updated version of the manuscript.

      In the text related to Figure 3E, the related analysis can be more clearly described. "phase preference of individual ripples" does not immediately suggest that the occurring phase of each ripple relative to the targeted oscillation is extracted. I suggest performing this analysis individually for each session and summarizing the results across the sessions.

      We have reworded the sentence in Results: 5-HT and ripples to better reflect the analysis performed: “Next, we calculated the ultraslow 5-HT phases at which individual ripples occurred during both NREM and WAKE (3E-F) ...”. Regarding session-level data, we have added Figure S3, which shows session level mean phase vectors, as well as the grand mean across sessions for both NREM and WAKE. Included in this figure are session level means for frequency bands outside of the ultraslow band we used in our study, intended to show that ripples are most strongly timed by the ultraslow band (0.01-0.06 Hz), reflected by the greater amplitude of the mean phase vector for this band.

      Figure 3E2, based on the result of ripple-triggered 5-HT in left panels of 2H1-2, one would expect to see a preferred phase closer to 180 (toward the end of the falling phase), it would be helpful to compare and discuss the results of these two analyses.

      The reviewer is correct to point out the apparent discrepancy in where the mean ripple falls with respect to the ongoing serotonin oscillation between the two figures mentioned. We have addressed this point in Results: 5-HT and ripples, paragraph #4: “This result appear to be at odds with…”.

      Regarding the analysis in 3F, please also compare the power distribution of ripples between NREM and wake. This will help to better understand the potential difference behind the observed difference: how much the strong ripples are comparable between wake and NREM. It is also necessary to report the ripple detection failure rate across ripples with different strengths.

      We have added a figure showing analysis done on a subset of the data in which ripples were manually curated in order to evaluate the performance of the ripple detection model (Figure S7) and explanatory text in Methods: Model performance: ‘To ensure that our model …’. In summary, while missed ripples did tend to have lower power than correctly detected ripples, including them did not change the distribution of ripples by the phase of the ultraslow serotonin oscillation (Figure S7C). We would also note that while the phase preference is noisier than what is presented in Figure 3F because this analysis was done with a small subset of all recorded ripples, the fact that ripples occur more clearly on the falling phase is visible for both detected ripples and detected + false negative ripples.

      The mixed-effects model examining the influence of 5-HT ultraslow oscillation phase on ripple power revealed no significant effect of state (p = 0.088). This indicates that whether the data were collected during NREM or wake periods did not significantly impact ripple power and that the lack of a significant effect (in Figure 3G,H) in WAKE is probably not due to a difference in the distribution of ripple power between states.

      4D, y label is z?

      We are grateful for the reviewer to point that out, yes, the y label should be ‘z-score’, as the two traces represent z-scored 5-HT (blue) and z-scored shuffled data (orange). Figure 4D2 and Figure 2H1-2, which show similar data, have been corrected to address this oversight.

      Relating to Figure 4, EMG comparison across phases of the oscillations is insightful. Two related and complementary analyses are to compare the theta and gamma power between the falling and rising phases.

      We have addressed this suggestion in Figure S5 A-C. While low gamma, high gamma and theta power are modulated identically in NREM, with higher power observed during the falling phase than the rising phase, during WAKE, different patterns can be seen. Specifically, low gamma power shows no phase preference, while high gamma shows a peak near the center of the ultraslow 5-HT oscillation. Theta power, as in NREM, is higher during the falling phase of ultraslow 5-HT oscillations. Increased power across many frequency bands was shown to coincide with decreases in DRN population activity during NREM, which matches with what we report here (Kato et al. 2022). In summary, while NREM patterns are consistent in all frequency bands tested, aligning with the pattern of ripple incidence, in WAKE low and high gamma power show different relationships to ultraslow 5-HT phase.

      In the manuscript, we have used the data in both Figure S5 and S6 (unit activity relative to ultraslow 5-HT oscillations), to argue against the idea that our coherence findings result from a lack of activity in the rising phase (see next question), which would have the effect of ‘artificially’ reducing coherence in the falling phase relative the rising phase. The text can be found in Results: 5-HT and hippocampal cortical coherence, paragraph #2.

      The results presented in Figure 5 could be puzzling and need to be further discussed: if the ripple band activity is weak during the rising phase, in what circumstances the coherence between cortex and CA1 is specifically very strong in this band?

      As mentioned in the previous answer, we have addressed this concern in Results: 5-HT and hippocampal-cortical coherence, paragraph #2. In summary, it is true that the higher coherence in rising phase than in the falling phase for the highest frequency band (termed ‘high frequency oscillation’ (HFO), 100-150 Hz) could be unexpected, given that ripples occur largely during the falling phase. A few points could help explain this finding. Firstly, it should be noted that power in the 100-150 Hz band can arise from physiological activity outside of ripples, such as filtered non-rhythmic spike bursts (Liu et al. 2022), whose coherent occurrence in the rising phase could explain the coherence findings. Secondly, coherence is a compound measure which is affected by both phase consistency and amplitude covariation (Srinath and Ray 2014), thus from only amplitude one cannot predict coherence. Furthermore, HFO power in the cortex is highest near the peak of ultraslow 5-HT oscillations (Figure S5D), as opposed to the falling phase peak in the hippocampus. This shows a lack of covariation in amplitude by phase between the hippocampus and cortex at this frequency band. An alternative explanation of our findings regarding coherence could be that in the rising phase, there is simply little to no activity, which is easier to ‘synchronize’ than bouts of high activity. Hippocampal unit activity in the rising phase (Figure S6) suggests however, that it is not likely to be the absence of activity supporting higher coherence in the rising phase across frequencies. Additional experiments using high density recordings should be conducted to examine 5-HT ultraslow oscillations and their role in gating activity across brain regions, though these results strongly suggest some role exists.

      Reviewer #2 (Recommendations for the authors):

      I would like to offer two comments. I believe that these are not unusual requests, and thus I would like the authors to respond.

      (1) It would be prudent to investigate the possibility that the observed correlation between ultraslow and hippocampal ripples/microarousals is merely superficial and that there are unidentified confounding factors at play. For example, it would be beneficial to provide evidence that administering a serotonin receptor inhibitor result in the disappearance of the slow oscillation of ripples and microarousals, or that the correlation with ultraslow is no longer present. Please note that the former experiments do not require GRAB5-HT3.0 imaging.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3. We would further like to note that given the large number of serotonin receptors and the lack of selectivity of many serotonin receptor antagonists, a pharmacological approach would be difficult, though the results certainly useful. Finally, we highlight the psilocin study, which reported changes in the rhythmic occurrence of microarousals, and therefore likely ultraslow oscillations, after administering a 5-HT2a receptor agonist, suggesting a potential causal effect of 5-HT (via 5-HT2a receptor) on MA occurrence (Thomas et al. 2022).

      (2) The slow frequency appears to be associated with the default mode network as observed in fMRI signals. The neural basis of the default mode network remains unclear; therefore, a more detailed examination of this possibility would be beneficial.

      We agree that it would be interesting to investigate the role of 5-HT in the neural basis of the DMN.

      The DMN as described in humans (Raichle et al. 2001) and rodents (Lu et al. 2012) may indeed include some parts of the hippocampus and perhaps some of our neocortical recordings could also be considered part of the DMN. The fact that the activity across the inter-connected brain structures of the DMN is correlated at ultraslow time scales (Gutierrez-Barragan et al. 2019, Mantini et al. 2007), as well as serotonin’s ability to modulate the DMN is intriguing (Helmbold et al. 2016). Further studies simultaneously recording DMN activity via fMRI and electrical activity via silicon probes, as done in Logothetis et al. 2001, could elucidate further a potential link between ultraslow oscillations and the DMN, with serotonergic modulation as a means to understand any potential contribution of serotonin.

      Reviewer #3 (Recommendations for the authors):

      (1) The impact of the study would benefit from an experiment causally testing the effect of hippocampal 5-HT levels on hippocampal physiology, e.g. using optogenetic manipulations.

      We agree that causality claims cannot be made with our data and acknowledge the interest in exploring causal interactions between ultraslow serotonin oscillations and the correlated activity we measure. We address this point in depth in our answer to Reviewer #2, Weaknesses #3.

      (2) Data presentation: the figures are of poor resolution, making some diagram details and, more importantly, some example traces (e.g. Figure 1A, right) impossible to see. This should be corrected by either increasing figure resolution or making important figure elements large enough to be readable.

      We apologize for the poor resolution and have corrected it in the updated version of the manuscript.

      (3) Differences in some figure panels are not statistically assessed: Figure 1H (differences in spectrum peak power), Figure 3E1 & Figure 3E3 (directional bias of the circular distributions), Figure 4C (difference from 0 mean).

      We acknowledge this oversight and have added statistical tests for all three figures, as well as further information regarding the models used in Methods: Statistics.

      (4) Lines 279-280: the claim that the study shows "organization of activity by ultraslow oscillations of 5-HT" implies a causal role of 5-HT in organizing hippocampal activity. I suggest that this statement be toned down to reflect the correlational nature of the presented evidence.

      We have rephrased the sentence in question to the following: “In our study, including both NREM and WAKE periods allowed us to additionally show that the temporal organization of activity relative to ultraslow 5-HT oscillations operates according to the same principles in both states...”, which we believe better reflects the temporal correlation we describe.

      (5) While the study claims to use the EMG (i.e. electromyograph) signal, it does not describe any electrodes placed inside the muscle in the methods section. The SleepScoreMaster toolbox used in the study estimates the EMG using high-frequency activity correlated across recording channels, so I assume this is how this signal was obtained. While such activity may well reflect muscular noise to some degree, it is an indirect measure as the electrodes are not in the muscle. Since the EMG signal is central to the message of the manuscript, the method for calculating it should be described in the methods section and it should be explicitly labelled as an indirect measure in the main text, e.g. by referring to this signal as pseudo-EMG.

      We agree and have added explanatory text to the State Scoring subsection in Methods. Given that the EMG we refer to is derived from intracranial data, and not from traditional EMG probes, we now refer to the EMG as intracranial EMG, or icEMG for short, throughout the main text.

      (6) Is ripple frequency or ripple duration different across the rising and falling phases of the ultraslow oscillation?

      We have now investigated this suggestion in Figure S4, where we show that ripple frequency is higher in the falling phase than rising phase, while ripple duration appears to show no phase preference.

      (7) Lines 315-317: I am not sure why the manuscript refers to the coupling between EMG and 5-HT levels as 'puzzling' given that, as stated, the locomotion-inducing effects of 5-HT are well documented. While the fact that even non-locomotory motor activity may be associated with 5-HT rise is certainly interesting (although not sure if 'puzzling'), the manuscript does not directly compare the association of 5-HT levels with locomotory and non-locomotory EMG spikes. Thus, I think this discussion point is not fully warranted.

      We agree and have rephrased the discussion point in question to reflect that the EMG link to serotonin oscillations is not necessarily surprising, given both the literature linking 5-HT and spontaneous movement in the hippocampus, as well as the involvement of 5-HT in repetitive movements, where the role for a regularly-occurring oscillation is perhaps more intuitive.

      (8) Line 441: Reference #67 does not describe the use of fiber photometry.

      The reviewer is to correct to point out this typo, which has been now corrected. The reference in question should be 64, where fiber photometry experiments are described. For further clarity, we have changed our referencing scheme to include authors and years in in-text references.

      (9) In Figures 3E1-3, the phase has different bounds than in the other Figures in the manuscript (0:360 vs -180:180), this should be corrected for consistency.

      We agree and have made changes so that all figures have a phase range of -180 to 180°.

      References

      (1) Z. M Aghajan, G. Kreiman, I. Fried, Minute-scale periodicity of neuronal firing in the human entorhinal cortex. Cell Rep 42, 113271 (2023).

      (2) M.A. Bunin, R.M. Wightman (1998). Quantitative Evaluation of 5-Hydroxytryptamine (Serotonin) Neuronal Release and Uptake: An Investigation of Extrasynaptic Transmission. J. Neurosci. 18 (13) 4854-4860

      (3) F. Cazettes, D. Reato, J. P. Morais, A. Renart, Z. F. Mainen, Phasic Activation of Dorsal Raphe Serotonergic Neurons Increases Pupil Size. Curr Biol 31, 192-197.e4 (2021).

      (4) Cole SR, Voytek B. Brain Oscillations and the Importance of Waveform Shape. Trends Cogn Sci. 21(2):137-149 (2017).

      (5) F. Deng, et al., Improved green and red GRAB sensors for monitoring spatiotemporal serotonin release in vivo. Nat Methods 21, 692–702 (2024).

      (6) C. Dong, et al., Psychedelic-inspired drug discovery using an engineered biosensor. Cell 184, 2779-2792.e18 (2021).

      (7) A. Eban-Rothschild, L. Appelbaum, L. de Lecea, Neuronal Mechanisms for Sleep/Wake Regulation and Modulatory Drive. Neuropsychopharmacol. 43, 937–952 (2018).

      (8) M. S. Fanselow, H.-W. Dong, Are the dorsal and ventral hippocampus functionally distinct structures? Neuron 65, 7–19 (2010).

      (9) J. S. Farrell, E. Hwaun, B. Dudok, I. Soltesz, Neural and behavioural state switching during hippocampal dentate spikes. Nature 1–6 (2024). https://doi.org/10.1038/s41586-024-07192-8.

      (10) De Filippo, R., & Schmitz, D. (2024). Transcriptomic mapping of the 5-HT receptor landscape. Patterns (New York, N.Y.), 5(10), 101048.

      (11) M. J. Fisher, et al., Neural mechanisms of sensory gating: Insights from human and animal studies. NeuroImage 207, 116374 (2020).

      (12) P. Fries, D. Nikolić, W. Singer, The gamma cycle. Trends in Neurosciences 30, 309–316 (2007).

      (13) S. Gonzalo Cogno, et al., Minute-scale oscillatory sequences in medial entorhinal cortex. Nature 625, 338–344 (2024).

      (14) D. Gutierrez-Barragan, M. A. Basson, S. Panzeri, A. Gozzi, Infraslow State Fluctuations Govern Spontaneous fMRI Network Dynamics. Current Biology 29, 2295-2306.e5 (2019).

      (15) K. Helmbold, et al., Serotonergic modulation of resting state default mode network connectivity in healthy women. Amino Acids 48, 1109–1120 (2016).

      (16) B. Jarosiewicz, B. L. McNaughton, W. E. Skaggs, Hippocampal Population Activity during the Small-Amplitude Irregular Activity State in the Rat. J. Neurosci. 22, 1373–1384 (2002).

      (17) J. T. Jordan, The rodent hippocampus as a bilateral structure: A review of hemispheric lateralization. Hippocampus 30, 278–292 (2020).

      (18) T. Kato, et al., Oscillatory Population-Level Activity of Dorsal Raphe Serotonergic Neurons Is Inscribed in Sleep Structure. J. Neurosci. 42, 7244–7255 (2022).

      (19) M.A. Kim, H. S. Lee, B. Y. Lee, B. D. Waterhouse, Reciprocal connections between subdivisions of the dorsal raphe and the nuclear core of the locus coeruleus in the rat. Brain Research 1026, 56–67 (2004).

      (20) C. Kjaerby, et al., Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci 25, 1059–1070 (2022).

      (21) S. Lecci, et al., Coordinated infraslow neural and cardiac oscillations mark fragility and offline periods in mammalian sleep. Sci Adv 3, e1602026 (2017).

      (22) A. A. Liu, et al., A consensus statement on detection of hippocampal sharp wave ripples and differentiation from other fast oscillations. Nat Commun 13, 6000 (2022).

      (23) M.-T. Lo, P.-H. Tsai, P.-F. Lin, C. Lin, Y. L. Hsin, The nonlinear and nonstationary properties in eeg signals: probing the complex fluctuations by hilbert–huang transform. Adv. Adapt. Data Anal. 01, 461–482 (2009).

      (24) N. K. Logothetis, J. Pauls, M. Augath, T. Trinath, A. Oeltermann, Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150–157 (2001).

      (25) H. Lu, et al., Rat brains also have a default mode network. Proc Natl Acad Sci U S A 109, 3979–3984 (2012).

      (26) D. Mantini, M. G. Perrucci, C. Del Gratta, G. L. Romani, M. Corbetta, Electrophysiological signatures of resting state networks in the human brain. Proc Natl Acad Sci U S A 104, 13170– 13175 (2007).

      (27) J. E. Markowitz, et al., The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44-58.e17 (2018).

      (28) H. Miyawaki, Y. N. Billeh, K. Diba, Low Activity Microstates During Sleep. Sleep 40, zsx066 (2017).

      (29) B. Mlinar, A. Montalbano, L. Piszczek, C. Gross, R. Corradetti, Firing Properties of Genetically Identified Dorsal Raphe Serotonergic Neurons in Brain Slices. Front Cell Neurosci 10, 195 (2016).

      (30) A. Osorio-Forero, et al., Noradrenergic circuit control of non-REM sleep substates. Current Biology 31, 5009-5023.e7 (2021).

      (31) S. Panzeri, N. Brunel, N. K. Logothetis, C. Kayser, Sensory neural codes using multiplexed temporal scales. Trends in Neurosciences 33, 111–120 (2010).

      (32) M. E. Raichle, et al., A default mode of brain function. Proc Natl Acad Sci U S A 98, 676–682 (2001).

      (33) R. Srinath, S. Ray, Effect of amplitude correlations on coherence in the local field potential. J Neurophysiol 112, 741–751 (2014).

      (34) B. P. Staresina, J. Niediek, V. Borger, R. Surges, F. Mormann, How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nat Neurosci 26, 1429–1437 (2023).

      (35) C. W. Thomas, et al., Psilocin acutely alters sleep-wake architecture and cortical brain activity in laboratory mice. Transl Psychiatry 12, 77 (2022).

      (36) G. F. Turi, et al., Serotonin modulates infraslow oscillation in the dentate gyrus during Non-REM sleep. eLife 13 (2025).

      (37) J. Vazquez, H. A. Baghdoyan, Basal forebrain acetylcholine release during REM sleep is significantly greater than during waking. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 280, R598–R601 (2001).

      (38) J. Wan, et al., A genetically encoded sensor for measuring serotonin dynamics. Nat Neurosci 24, 746–752 (2021).

      (39) Y. Zhang, et al., Cholinergic suppression of hippocampal sharp-wave ripples impairs working memory. Proc. Natl. Acad. Sci. U.S.A. 118, e2016432118 (2021).

      (40) Y. Zhang, et al., Interaction of acetylcholine and oxytocin neuromodulation in the hippocampus. Neuron (2024).

    1. One play would probably seldom occupy more than an hour and a half; but often three plays were connected together in one grand whole called a trilogy, somewhat as the several parts of Shakespeare's historical plays are connected; and these were followed by a comic piece by the same poet, which might relieve the seriousness of so much tragedy. Each competitor, therefore, produced in these cases not one play, but a series of four, and several competitors followed one another throughout the day. Wearisome, dry, unimpassioned, all this may seem to us; but we must remember that to the Greek it meant religious service, literary culture, and the celebration of the national greatness. As he sat in the theatre, the gods of his country looked down approvingly from the Acropolis above, and his fellow-citizens, whom he loved with intense patriotism, were all about him. He might say of the assembly, what an old poet had said of the Ionians gathered for festival at Delos, that you would think them blessed with endless youth, so glorious they were and so blooming; and as the rocks under which he sat re-echoed to the applause of that great assembly, he must indeed have felt the thrill of sympathetic enthusiasm which Plato describes as produced by such occasions.

      The description of the trilogy in plays is alive even today in all forms of entertainment and I like how we can compare similarities to ages ago. Shakespear as an example is a good point of view. Instead of one play it was made into four while following up with competitors through the day. Greeks it was a religious service, literary culture and celebration of national greatness.

    2. All these facts—that the theatre was national, and religious, and rarely open—combined to make the audience on each occasion very numerous. It was a point of national pride, of religious duty, and of common prudence on the part of every citizen, not to miss the two great dramatic festivals of the year when their season came. Accordingly, we hear that thirty thousand people used to be present together; and we may infer from this, as well as from other indisputable evidence, the vast size of the theatre itself. The performance took place in the day-time, and lasted nearly all day, for several plays were presented in succession; and the theatre was open to the sky and to the fields, so that when a man looked away from the solemn half-mysterious representation of the legendary glories of his country, his eye would fall on the city itself, with its temples and its harbours, or on the rocky cliffs of Salamis and the sunny islands of the Ægean. Finally, the performance was musical, and so more like an opera than an ordinary play, though we shall see that even this resemblance is little more than superficial.

      Impressed on how the theatre became a national, religious outbreak of success while rarely open-combined. Like how it was prideful thing to do with a religious duty and yet a common prudence to show that no matter how big or small it was in a persons life at that time that it was no matter what apart of practically all their lives. Visualizing the size of the viewers to thirty thousand just to perform for them allows me to imagine the size of the theatre and the influence of power the theatre had at that time. Also to think people performed all day long just also adds to my view on the commitment these people in time took for this to be a big deal and a successful part of their lives and to others every day.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      We would like to thank Reviewer 1 for recognising the importance of our findings on the heterogeneity in bacterial responses to tachyplesin.

      (1) A double deletion of acrA and tolC (two out of the three components of the major constitutive RND efflux pump) reduces the appearance of the low accumulator phenotype, but interestingly, the single deletions have no effect, and a well-characterised inhibitor of RND efflux pumps also has no effect. The authors identify a two-component system, qseCB, that appears necessary for the appearance of low accumulators, but this system has pleiotropic effects on many cellular systems, with only tenuous connections to efflux. The selected pharmacological agents that could prevent the appearance of low accumulators do not offer clear insight into the mechanism by which low accumulators arise, because they have diverse modes of action.

      We have added that “QseBC, was previously inferred to mediate resistance to a tachyplesin analogue by upregulating efflux genes based on transcriptomic analysis and hyper susceptibility of ΔqseBΔqseC mutants[113]”. However, we have also acknowledged that “it is conceivable that the deletion of QseBC has pleiotropic effects on other cellular mechanisms involved in tachyplesin accumulation.” and that “it is also conceivable that sertraline prevented the formation of the low accumulator phenotype via efflux independent mechanisms”

      These amendments are reported on lines 525-527, 532-534 and 539-541 of our revised manuscript.

      (2) The transcriptomics data collected for low and high accumulator sub-populations are interesting, but in my opinion, the conclusions that can be drawn from these data remain overstated. It is not possible to make any claims about the total amount of "protein synthesis, energy production, and gene expression" on the basis of RNA-Seq data. The reads from each sample are normalised, so there is no information about the total amount of transcript. Many elements of total cellular activity are post-transcriptionally regulated, so it is impossible to assess from transcriptomics alone. Finally, the transcriptomic data are analysed in aggregated clusters of genes that are enriched for biological processes, for example: "Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators". However, this obscures the fact that these clusters include genes that are generally inhibitory of the process named, as well as genes that facilitate the process.

      We have now acknowledged that “that our data do not take into account post-transcriptional modifications that represent a second control point to survive external stressors.”

      These amendments are reported on lines 534-535 of our revised manuscript.

      The raw transcript counts can be found in Figure 3 – Source Data, we had added these data in our previous manuscript as requested by this reviewer.

      We would also like to clarify that we have analysed our transcriptomic data via both clustering (i.e. Figure 3) and direct comparison of genes of interest (Table S1) and transcription factors (i.e. genes that are generally inhibitory of the process named, as well as genes that facilitate the process, Figure S12).

      Finally, we would like to point out that in our revised manuscript (both this and its previous version) we are stating “Cluster 2 included processes involved in protein synthesis, energy production, and gene expression that were downregulated to a greater extent in low accumulators than high accumulators”. We do not think this is an overstatement, we do not use these data to make conclusions on the total amount of "protein synthesis, energy production, and gene expression".

      (3) The authors have added an experiment to attempt to assess overall metabolic activity in the low accumulator and high accumulator populations, which is a welcome addition. They apply the redox dye resazurin and observe lower resorufin (reduced form) fluorescence in the low accumulator population, which they take to indicate a lower respiration rate. This seems possible, however, an important caveat is that they have shown the low accumulator population to retain substantially lower amounts of multiple different fluorescent molecules (tachyplesin-NBD, propidium iodide, ethidium bromide) intracellularly compared to the high accumulator population. It seems possible that the low accumulator population is also capable of removing resazurin or resorufin from the intracellular space, regardless of metabolic rate. Indeed, it has previously been shown that efflux by RND efflux pumps influences resazurin reduction to resorufin in both P. aeruginosa and E. coli. By measuring only the retained redox dye using flow cytometry, the results may be confounded by the demonstrated ability of the low accumulator population to remove various fluorescent dyes. More work is needed to strongly support broad conclusions about the physiological states of the low and high accumulator populations. The phenomenon of the emergence of low accumulators, which are phenotypically tolerant to the antimicrobial peptide tachyplesin, is interesting and important even if there is still work to be done to understand the mechanism by which it occurs.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

      Reviewer 2:

      We would like to thank the reviewer for recognising that all their previous comments have now been satisfactorily addressed.

      (1) Some mechanistic questions regarding tachyplesin-accumulation and survival remain. One general shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´ cells. As the authors state themselves, this makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern of if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we had explicitly acknowledged this possibility on lines 281-285 (of the previous and current version of this manuscript).

      (2) The statement ´ Moreover, we found that the fluorescence of low accumulators decreased over time when bacteria were treated with 20 μg mL´ is, in my opinion, not supported by the data shown in Figure S4C. That figure shows that the abundance of ´low accumulator´ cells decreases over time. Following the rationale that protease K treatment may cleave surface associated/ extracellular tachyplesin-NDB, this should lead to a shift of ´low accumulator´ population to the left, indicating reduced fluorescence intensity per cell. This is not so case, but the population just disappears. However, after 120 min of treatment more cells appear in the ´high accumulator´ state. This result is somewhat puzzling.

      We agree with the reviewer that our previous discussion of this data could have been misleading. We have now reworded this part of the text as following: “We found that the fluorescence of high accumulators did not decrease over time when tachyplesin-NBD was removed from the extracellular environment and bacteria were treated with 20 μg mL<sup>-1</sup> (0.7 μM) proteinase K, a widely-occurring serine protease that can cleave the peptide bonds of AMPs [43–45] (Figure S4B and C). These data suggest that tachyplesin-NBD primarily accumulates intracellularly in high accumulators.”

      It is conceivable that extended exposure to proteinase K (i.e. we see a decrease in the abundance of low accumulators after 90 min treatment with proteinase K) increased the permeability to tachyplesin-NBD of low accumulators allowing tachyplesin-NBD to move from either the extracellular space or the membrane to the cell interior. However, we do not have data to prove this point.

      Therefore, we have now removed our claim that the data obtained using proteinase K suggest that tachyplesin-NBD accumulates primarily in the membranes of low accumulators. We believe that our two separate microscopy analyses provide more direct, stronger and less ambiguous evidence that tachyplesin-NBD accumulates primarily in the membranes of low accumulators.

      (3) The authors used the metabolic dye resazurin to measure the metabolic activity of low vs. high accumulators. I am not entirely convinced that the lower fluorescence resorufin fluorescence in tachyplesin-NBD accumulators really indicates lower metabolic activity, since a cell's fluorescence levels would also be affected by the cellular uptake and efflux. It appears plausible that the lower resorufin-fluorescence may result from reduced accumulation/increased efflux in the ‘low-tachyplesin NBD´ population.

      We have now clarified that these assays were performed in the presence of 50 μM CCCP and that “CCCP was included to minimise differences in efflux activity and preserve resorufin retention between low and high accumulators, though some variability in efflux may still persist.” We have now added this information on lines 401-406. This information was only present in the caption of Figure S16 of our previous version of this manuscript.

      (4) P8 line 343. The text should refer to Figure. 13B, instead of 14B

      We have now changed the text accordingly on line 337.

      Reviewer 3:

      We would like to thank the reviewer for recognising that we have done a very impressive job in taking care of their comments.

      (1) Despite these advances, the contribution of efflux may require more direct evidence to further dissect whether efflux is necessary, sufficient, or contributory. The facts that the key low efflux mutant still retains a small fraction of survivors and that the inhibitors used may cause other physiological changes leading to higher efflux are still unaccounted for. The lipidomic and vesicle findings, while intriguing, remain descriptive, and direct tests of their functional relevance would further solidify the mechanistic models.

      We agree with the reviewers that more work needs to be done to fully understand this new phenomenon and we had already acknowledged in our previous version of this manuscript that other mechanisms could play a role in this new phenomenon, see lines 489-517 of the current manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this interesting and original paper, the authors examine the effect that heat stress can have on the ability of bacterial cells to evade infection by lytic bacteriophages. Briefly, the authors show that heat stress increases the tolerance of Klebsiella pneumoniae to infection by the lytic phage Kp11. They also argue that this increased tolerance facilitates the evolution of genetically encoded resistance to the phage. In addition, they show that heat can reduce the efficacy of phage therapy. Moreover, they define a likely mechanistic reason for both tolerance and genetically encoded resistance. Both lead to a reorganization of the bacterial cell envelope, which reduces the likelihood that phage can successfully inject their DNA.

      Strengths:

      I found large parts of this paper well-written and clearly presented. I also found many of the experiments simple yet compelling. For example, the experiments described in Figure 3 clearly show that prior heat exposure can affect the efficacy of phage therapy. In addition, the experiments shown in Figures 4 and 6 clearly demonstrate the likely mechanistic cause of this effect. The conceptual Figure 7 is clear and illustrates the main ideas well. I think this paper would work even without its central claim, namely that tolerance facilitates the evolution of resistance. The reason is that the effect of environmental stressors on stress tolerance has to my knowledge so far only been shown for drug tolerance, not for tolerance to an antagonistic species.

      Weaknesses:

      I did not detect any weaknesses that would require a major reorganization of the paper, or that may require crucial new experiments. However, the paper needs some work in clarifying specific and central conclusions that the authors draw. More specifically, it needs to improve the connection between what is shown in some figures, how these figures are described in the caption, and how they are discussed in the main text. This is especially glaring with respect to the central claim of the paper from the title, namely that tolerance facilitates the evolution of resistance. I am sympathetic to that claim, especially because this has been shown elsewhere, not for phage resistance but for antibiotic resistance. However, in the description of the results, this is perhaps the weakest aspect of the paper, so I'm a bit mystified as to why the authors focus on this claim. As I mentioned above, the paper could stand on its own even without this claim.

      Thank you for your feedback. We understand your concern regarding the central claim that tolerance facilitates the evolution of resistance, while the paper can stand on its own without this claim, we think it provides an important layer to the interpretation of our findings. Considering your comments, we plan to revise the title and adjust to “Heat Stress Induces Phage Tolerance in Bacteria”.

      More specific examples where clarification is needed:

      (1) A key figure of the paper seems to be Figure 2D, yet it was one of the most confusing figures. This results from a mismatch between the accompanying text starting on line 92 and the figure itself. The first thing that the reader notices in the figure itself is the huge discrepancy between the number of viable colonies in the absence of phage infection at the two-hour time point. Yet this observation is not even mentioned in the main text. The exclusive focus of the main text seems to be on the right-hand side of the figure, labeled "+Phage". It is from this right-hand panel that the authors seem to conclude that heat stress facilitates the evolution of resistance. I find this confusing, because there is no difference between the heat-treated and non-treated cells in survivorship, and it is not clear from this data that survivorship is caused by resistance, not by tolerance/persistence. (The difference between tolerance and resistance has only been shown in the independent experiments of Figure 1B.)

      Thank you for your helpful comment. Figure 2d presents colony counts from a plating assay following the phage killing experiment in Figure 2c. Bacteria collected after 0 and 2 hours of phage exposure were plated on both phage-free (−phage) and phage-containing (+phage) plates. The “−phage” condition reflects total survivors, while the “+phage” condition indicates the resistant subset.

      As seen in Figure 2d (left part), heat-treated bacteria showed markedly higher survival on phage-free plates than untreated cells, which were largely eliminated by phage. However, resistant colony counts on phage-containing plates were similar between two groups (as shown in figure 2d right part), suggesting that heat stress increased survival but did not promote resistance.

      To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates. This adjustment should eliminate any confusion and better reflect the experimental design.

      Figure 2F supports the resistance claim, but it is not one of the strongest experiments of the paper, because the author simply only used "turbidity" as an indicator of resistance. In addition, the authors performed the experiments described therein at small population sizes to avoid the presence of resistance mutations. But how do we know that the turbidity they describe does not result from persisters?

      I see three possibilities to address these issues. First, perhaps this is all a matter of explaining and motivating this particular experiment better. Second, the central claim of the paper may require additional experiments. For example, is it possible to block heat induced tolerance through specific mutations, and show that phage resistance does not evolve as rapidly if tolerance is blocked? A third possibility is to tone down the claim of the paper and make it about heat tolerance rather than the evolution of heat resistance.

      Thank you for your thoughtful comment. We appreciate the opportunity to clarify the interpretation of Figure 2f and the rationale behind the experimental design. We agree that turbidity alone cannot fully distinguish resistance from persistence. However, our earlier experiments (Figures 2d and 2e) demonstrated that heat-treated survivors remained largely susceptible to phage, indicating that heat stress does not directly induce resistance. This led us to hypothesize that heat enhances phage tolerance, which in turn increases the likelihood of resistance emergence during subsequent infection.

      To test this, we used a low initial bacterial population (~10³ CFU per well) to minimize the chance of pre-existing resistance. Bacteria were exposed to phages at MOIs of 1, 10, and 100 and incubated for 24 hours in 100 µL volumes. This setup ensured:

      (1) The low initial population minimizes the presence of pre-existing resistant mutants, ensuring that any phage-resistant bacteria observed arise during the infection process.

      (2) The high MOI (≥ 1) ensures that each bacterial cell has a high probability of infection by at least one phage.

      (3) The small volume (100 µL per well) maximizes the interaction between bacteria and phages, ensuring rapid infection of susceptible bacteria, which leads to clear wells. If resistant mutants arise, they will grow and cause turbidity.

      Thus, the turbidity observed in heat-treated samples reflects de novo emergence and outgrowth of resistant mutants from a tolerant population. This assay supports the idea that heat-induced tolerance increases the probability of resistance evolution, rather than directly causing resistance.

      We have revised the text to better explain this experimental logic and adjust the framing of our conclusions accordingly.

      A minor but general point here is that in Figure 2D and in other figures, the labels "-phage" and "+phage" do not facilitate understanding, because they suggest that cells in the "-phage" treatment have not been exposed to phage at all, but that is not the case. They have survived previous phage treatment and are then replated on media lacking phage.

      Thank you for your valuable comment. To clarify, we have revised the labels in Figure 2d as follows: “Total” will replace “-phage” to indicate the total survivors from the phage killing assay, and “Resisters” will replace “+phage” to indicate the resistant survivors, which are detected on phage-containing plates.

      (2) Another figure with a mismatch between text and visual materials is Figure 5, specifically Figures 5B-F. The figure is about two different mutants, and it is not even mentioned in the text how these mutants were identified, for example in different or the same replicate populations. What is more, the two mutants are not discussed at all in the main text. That is, the text, starting on line 221 discusses these experiments as if there was only one mutant. This is especially striking as the two mutants behave very differently, as, for example, in Figure 5C. Implicitly, the text talks about the mutant ending in "...C2", and not the one ending in "...C1". To add to the confusion, the text states that the (C2) mutant shows a change in the pspA gene, but in Figure 5f, it is the other (undiscussed) mutant that has a mutation in this gene. Only pspA is discussed further, so what about the other mutants? More generally, it is hard to believe that these were the only mutants that occurred in the genome during experimental evolution. It would be useful to give the reader a 2-3 sentence summary of the genetic diversity that experimental evolution generated.

      Thank you for your thoughtful comment. In our heat treatment evolutionary experiment, we isolated six distinct bacterial clones, of which two are highlighted in the manuscript as representative examples. One clone, BC2G11C1, acquired both heat tolerance and phage resistance, while another clone, BC3G11C2, became heat-tolerant but did not develop resistance to phage infection. This variation highlights the inherent diversity in evolutionary responses when exposed to selective pressures. It demonstrates that not all evolutionary pathways lead to the same outcome, even under similar stress conditions. This variability is a key observation in our study, illustrating that different genetic adaptations may arise depending on the specific mutations or genetic context, and not every strain will evolve phage resistance in parallel with heat tolerance. We have updated the manuscript to better reflect this diversity in the evolutionary trajectories observed.

      Reviewer #2 (Public review):

      Summary:

      An initial screening of pretreatment with different stress treatments of K. pneumoniae allowed the identification of heat stress as a protection factor against the infection of the lytic phage Kp11. Then experiments prove that this is mediated not by an increase of phage-resistant bacteria but due to an increase in phage transient tolerant population, which the authors identified as bacteriophage persistence in analogy to antibiotic persistence. Then they proved that phage persistence mediated by heat shock enhanced the evolution of bacterial resistance against the phage. The same trait was observed using other lytic phages, their combinations, and two clinical strains, as well as E. coli and two T phages, hence the phenomenon may be widespread in enterobacteria.

      Next, the elucidation of heat-induced phage persistence was done, determining that phage adsorption was not affected but phage DNA internalization was impaired by the heat pretreatment, likely due to alterations in the bacterial envelope, including the downregulation of envelope proteins and of LPS; furthermore, heat treated bacteria were less sensitive to polymyxins due to the decrease in LPS.

      Finally, cyclic exposure to heat stress allowed the isolation of a mutant that was both resistant to heat treatment, polymyxins, and lytic phage, that mutant had alterations in PspA protein that allowed a gain of function and that promoted the reduction of capsule production and loss of its structure; nevertheless this mutant was severely impaired in immune evasion as it was easily cleared from mice blood, evidencing the tradeoffs between phage/heat and antibiotic resistance and the ability to counteract the immune response.

      Strengths:

      The experimental design and the sequence in which they are presented are ideal for the understanding of their study and the conclusions are supported by the findings, also the discussion points out the relevance of their work particularly in the effectiveness of phage therapy and allows the design of strategies to improve their effectiveness.

      Weaknesses:

      In its present form, it lacks the incorporation of some relevant previous work that explored the role of heat stress in phage susceptibility, antibiotic susceptibility, tradeoffs between phage resistance and resistance against other kinds of stress, virulence, etc., and the fact that exposure to lytic phages induces antibiotic persistence.

      Thank you for your insightful comments. I appreciate your suggestion regarding the inclusion of relevant previous works. I have now incorporated additional citations to discuss these points, including studies on the relationship between heat stress and antibiotic resistance, as well as the tradeoffs between phage resistance and other stress factors.

      Reviewer #3 (Public review):

      PspA, a key regulator in the phage shock protein system, functions as part of the envelope stress response system in bacteria, preventing membrane depolarization and ensuring the envelope stability. This protein has been associated in the Quorum Sensing network and biofilm formation. (Moscoso M., Garcia E., Lopez R. 2006. Biofilm formation by Streptococcus pneumoniae: role of choline, extracellular DNA, and capsular polysaccharide in microbial accretion. J. Bacteriol. 188:7785-7795; Vidal JE, Ludewick HP, Kunkel RM, Zähner D, Klugman KP. The LuxS-dependent quorum-sensing system regulates early biofilm formation by Streptococcus pneumoniae strain D39. Infect Immun. 2011 Oct;79(10):4050-60.)

      It is interesting and very well-developed.

      (1) Could the authors develop experiments about the relationship between Quorum Sensing and this protein?

      (2) It would be interesting to analyze the link to phage infection and heat stress in relation to Quorum. The authors could study QS regulators or AI2 molecules.

      Thank you for your insightful comments and for bringing up the role of PspA in quorum sensing and biofilm formation. However, we would like to clarify a potential misunderstanding: the PspA discussed in our manuscript refers to phage-shock protein A, a key regulator in the bacterial envelope stress response system. This is distinct from the pneumococcal surface protein A, which has been associated with quorum sensing and biofilm formation in Streptococcus pneumoniae (as referenced in your comment).

      To avoid any confusion for readers, we will ensure that our manuscript explicitly states “phage-shock protein A (PspA)” at its first mention. We appreciate your feedback and hope this clarification addresses your concern.

      (3) Include the proteins or genes in a table or figure from lytic phage Kp11 (GenBank: ON148528.1).

      Thank you for your helpful suggestion. We have now included a figure, as appropriate summarizing the proteins of the lytic phage Kp11 (GenBank: ON148528.1) in supplementary Figure S1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Issues unrelated to those discussed in the public review

      (1) Figure 4a and its caption describe an evolution experiment, but they do not mention how many cycles of high-temperature treatment and growth this experiment lasted. I assume it lasted for more than one cycle, because the methods section mentions "cycles", but the number is not provided.

      Thank you for pointing this out. The evolutionary experiment shown in Figure 5a involved 11 cycles of high-temperature treatment and growth. We have now explicitly stated this in the figure legend to ensure clarity: BC: Batch culture, G: Evolution cycle number, C: Colony. BC2G11C1 refers to the first colony from batvh culture 2 after 11 rounds of heat treatment.

      (2) It is not clear what Figure 5F is supposed to show. What are the gray boxes? The caption claims that the figure shows non-synonymous mutations, but the only information it contains is about genes that seem to be affected by mutation. Judging from the mismatch between the main text and the figure, the mutants with these mutations may actually be mislabeled.

      Thank you for your careful review. Figure 5f highlights the non-synonymous mutations identified in the evolved strains. The gray boxes represent the ancestral strain’s whole genome without mutations, serving as a control. The corresponding labels indicate the specific mutations found in each evolved strain. We have clarified this in the figure caption to improve clarity. Additionally, we have carefully reviewed the labeling to ensure accuracy and consistency between the figure, main text, and sequencing data.

      (3) I think that the acronym NC, which is used in just about every figure, is explained nowhere in the paper. Spell out all acronyms at first use.

      Thank you for pointing this out. We have rivewed ensure that NC is clearly defined at its first mention in the text and figure legends to improve clarity. Additionally, we have reviewed the manuscript to ensure that all acronyms are properly introduced when first used.

      (4) The same holds for the acronym N.D. This is an especially important oversight because N.D. could mean "not determined" or "not detectable", which would lead to very different interpretations of the same figure.

      Thank you for your careful review. We have clarified the meaning of N.D., which stands for non-detectable, at its first use to avoid ambiguity and ensure accurate interpretation in the figure legend. Additionally, we have reviewed the manuscript to ensure that all acronyms are clearly defined.

      (5) The panel labels (a,b, etc.) in all figure captions are very difficult to distinguish from the rest of the text, and should be better highlighted, for example by using a bold font. However, this is a matter of journal style and will probably be fixed during typesetting.

      Thank you for your suggestion. We have adjusted the figure captions to better distinguish panel labels, such as using bold font, to improve readability and final formatting will follow the journal’s style during typesetting.

      (6) Line 224: enhanced insusceptibility -> reduced susceptibility.

      Thank you for your suggestion. We have revised “enhanced insusceptibility” to “reduced susceptibility” for clarity and precision.

      (7) Line 259: mice -> mouse.

      Thank you for catching this. We have corrected “mice” to “mouse”.

      Reviewer #2 (Recommendations for the authors):

      I have no concerns about the experimental design and conclusions of your work; however, I strongly recommend incorporating several relevant pieces of the literature related to your work, in the discussion of your manuscript, specifically:

      (1) Previous studies about the role of heat stress in phage infections, see:

      Greenrod STE, Cazares D, Johnson S, Hector TE, Stevens EJ, MacLean RC, King KC. Warming alters life-history traits and competition in a phage community. Appl Environ Microbiol. 2024 May 21;90(5):e0028624. doi: 10.1128/aem.00286-24. Epub 2024 Apr 16. PMID: 38624196; PMCID: PMC11107170.

      Thank you for your thoughtful comment. We have ensured to incorporate the study by Greenrod et al. (2024) into the discussion to enrich the context of our findings. As this article pointed out, a temperature of 42°C can indeed limit phage infection in bacteria, acting as a barrier from the phage’s perspective. Our study builds on this by demonstrating that bacteria pre-treated with high temperatures exhibit tolerance to phage infection. These findings, together with the work you referenced, underscore the importance of heat stress or elevated temperature in host-phage interactions, with 42°C being particularly relevant in the context of fever. We will make sure to clarify this connection in our revised manuscript.

      (2) The effect of heat stress and the tolerance/resistance against other antibiotics besides polymyxins, see:

      Lv B, Huang X, Lijia C, Ma Y, Bian M, Li Z, Duan J, Zhou F, Yang B, Qie X, Song Y, Wood TK, Fu X. Heat shock potentiates aminoglycosides against gram-negative bacteria by enhancing antibiotic uptake, protein aggregation, and ROS. Proc Natl Acad Sci U S A. 2023 Mar 21;120(12):e2217254120. doi: 10.1073/pnas.2217254120. Epub 2023 Mar 14. PMID: 36917671; PMCID: PMC10041086.

      Thank you for bringing this study to our attention. We have incorporated the findings from Lv et al. (2023) into the discussion of our manuscript, highlighting how sublethal temperatures may facilitate the killing of bacteria by antibiotics like kanamycin. This is consistent with our data showing enhanced susceptibility of heat-shocked bacteria to kanamycin. The study also provides insights into the potential role of PMF, which is relevant to our work on PspA, and strengthens the broader context of heat stress influencing both antibiotic resistance and tolerance.

      (3) Perhaps the most relevant overlooked fact was that recently it was demonstrated for E. coli, Klebsiella and Pseudomonas that pretreatment with lytic phages induced antibiotic persistence! Please discuss this finding and its implications for your work, see:

      Fernández-García L, Kirigo J, Huelgas-Méndez D, Benedik MJ, Tomás M, García-Contreras R, Wood TK. Phages produce persisters. Microb Biotechnol. 2024 Aug;17(8):e14543. doi: 10.1111/1751-7915.14543. PMID: 39096350; PMCID: PMC11297538.

      Sanchez-Torres V, Kirigo J, Wood TK. Implications of lytic phage infections inducing persistence. Curr Opin Microbiol. 2024 Jun;79:102482. doi: 10.1016/j.mib.2024.102482. Epub 2024 May 6. PMID: 38714140.

      Thank you for suggesting this important reference. We agree that the phenomenon of phage-induced bacterial persistence is highly relevant to our study. While our manuscript focuses on the role of heat stress in bacterial tolerance and resistance, we acknowledge that bacterial persistence against phages is an established concept. We have incorporated this finding into our discussion, emphasizing how persistence and tolerance can overlap in their effects on bacterial survival, especially under stress conditions like heat treatment. This will provide a more comprehensive understanding of how phage interactions with bacteria can lead to both persistence and resistance.

      (4) Finally, you observed a tradeoff pf the pspA* mutant increased phage/heat/polymyxin resistance and decreased immune evasion (perhaps by being unable to counteract phagocytosis), those tradeoffs between gaining phage resistance but losing resistance to the immune system, virulence impairment and resistance against some antibiotics had been extensively documented, see:

      Majkowska-Skrobek G, Markwitz P, Sosnowska E, Lood C, Lavigne R, Drulis-Kawa Z. The evolutionary trade-offs in phage-resistant Klebsiella pneumoniae entail cross-phage sensitization and loss of multidrug resistance. Environ Microbiol. 2021 Dec;23(12):7723-7740. doi: 10.1111/1462-2920.15476. Epub 2021 Mar 27. PMID: 33754440.

      Gordillo Altamirano F, Forsyth JH, Patwa R, Kostoulias X, Trim M, Subedi D, Archer SK, Morris FC, Oliveira C, Kielty L, Korneev D, O'Bryan MK, Lithgow TJ, Peleg AY, Barr JJ. Bacteriophage-resistant Acinetobacter baumannii are resensitized to antimicrobials. Nat Microbiol. 2021 Feb;6(2):157-161. doi: 10.1038/s41564-020-00830-7. Epub 2021 Jan 11. PMID: 33432151.

      García-Cruz JC, Rebollar-Juarez X, Limones-Martinez A, Santos-Lopez CS, Toya S, Maeda T, Ceapă CD, Blasco L, Tomás M, Díaz-Velásquez CE, Vaca-Paniagua F, Díaz-Guerrero M, Cazares D, Cazares A, Hernández-Durán M, López-Jácome LE, Franco-Cendejas R, Husain FM, Khan A, Arshad M, Morales-Espinosa R, Fernández-Presas AM, Cadet F, Wood TK, García-Contreras R. Resistance against two lytic phage variants attenuates virulence and antibiotic resistance in Pseudomonas aeruginosa. Front Cell Infect Microbiol. 2024 Jan 17;13:1280265. doi: 10.3389/fcimb.2023.1280265. Erratum in: Front Cell Infect Microbiol. 2024 Mar 06;14:1391783. doi: 10.3389/fcimb.2024.1391783. PMID: 38298921; PMCID: PMC10828002.

      Thank you for highlighting these important studies. We have incorporated the work by Majkowska-Skrobek et al. (2021), Gordillo Altamirano et al. (2021), and García-Cruz et al. (2024) into the discussion to provide further context to the evolutionary trade-offs observed in our study. The findings in these studies, which describe the cross-sensitization to antimicrobials and the loss of multidrug resistance in phage-resistant bacteria, align with our observations of trade-offs in the pspA mutant. Specifically, our results show that while the pspA mutant exhibits increased resistance to phage, heat, and polymyxins, it also experiences a decrease in immune evasion and potential virulence. These trade-offs are significant in understanding the broader consequences of developing resistance to phages and other stressors.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Overall, the data presented in this manuscript is of good quality. Understanding how cells control RPA loading on ssDNA is crucial to understanding DNA damage responses and genome maintenance mechanisms. The authors used genetic approaches to show that disrupting PCNA binding and SUMOylation of Srs2 can rescue the CPT sensitivity of rfa1 mutants with reduced affinity for ssDNA. In addition, the authors find that SUMOylation of Srs2 depends on binding to PCNA and the presence of Mec1.

      Comments on revisions:

      I am satisfied with the revisions made by the authors, which helped clarify some points that were confusing in the initial submission.

      Thank you.

      Reviewer #2 (Public Review):

      This revised manuscript mostly addresses previous concerns by doubling down on the model without providing additional direct evidence of interactions between Srs2 and PCNA, and that "precise sites of Srs2 actions in the genome remain to be determined." One additional Srs2 allele has been examined, showing some effect in combination with rfa1-zm2. Many of the conclusions are based on reasonable assumptions about the consequences of various mutations, but direct evidence of changes in Srs2 association with PNCA or other interactors is still missing. There is an assumption that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects, which may not be the case. How SLX4 might interact with Srs2 is unclear to me, again assuming that the SLX4 defect is "surgical" - removing only one of its many interactions.

      Previous studies have already provided direct evidence for the interaction between Srs2 and PCNA through the Srs2’s PIM region (Armstrong et al, 2012; Papouli et al, 2005); we have added these citations in the text. Similarly. Srs2 associations with SUMO and Rad51 have also been demonstrated (Colavito et al, 2009; Kolesar et al, 2016; Kolesar et al., 2012), and these studies were cited in the text.

      We did not state that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects. We only assessed whether these previously characterized mutant alleles could mimic srs2∆ in rescuing rfa1-zm2 defects.

      We assessed the genetic interaction between slx4-RIM and srs2-∆PIM mutants, and not the physical interaction between the two proteins. As we described in the text, our rationale for this genetic test is based on that the reports that both slx4 and srs2 mutants impair recovery from the Mec1 induced checkpoint, thus they may affect parallel pathways of checkpoint dampening.

      One point of concern is the use of t-tests without some sort of correction for multiple comparisons - in several figures. I'm quite sceptical about some of the p < 0.05 calls surviving a Bonferroni correction. Also in 4B, which comparison is **? Also, admittedly by eye, the changes in "active" Rad53 seem much greater than 5x. (also in Fig. 3, normalizing to a non-WT sample seems odd).

      Claims made in this work were based only on pairwise comparison not multi-comparison. We have now made this point clearer in the graphs and in Method. As the values were compared between a wild-type strain and a specific mutant strain, or between two mutants, we believe that t-test is suitable for statistical analysis.

      Figure 4B, ** indicates that the WT value is significantly different from that of the slx4-RIM srs2-∆PIM double mutant and from that of srs2-∆PIM single mutant. We have modified the graph to indicate the pair-wide comparison. The 5-fold change of active Rad53 levels was derived by comparing the values between the srs2∆ PIM slx4<sup>RIM</sup>-TAP double mutant and wild-type Slx4-TAP. In Figure 3, normalization to the lowest value affords better visualization. This is rather a stylish issue; we would like to maintain it as the other reviewers had no issues.

      What is the WT doubling time for this strain? From the FACS it seems as if in 2 h the cells have completed more than 1 complete cell cycle. Also in 5D. Seems fast...

      Wild-type W303 strain has less than 90 min doubling time as shown by many labs, and our data are consistent with this. The FACS profiles for wild-type cells shown in Figures 3C, 4C, and 5C are consistent with each other, showing that after G1 cells entered the cell cycle, they were in G2 phase at the 1-hour time points, and then a percentage of the cells exited the first cell cycle by two hours.

      I have one over-arching confusion. Srs2 was shown initially to remove Rad51 from ssDNA and the suppression of some of srs2's defects by deleting rad51 made a nice, compact story, though exactly how srs2's "suppression of rad6" fit in isn't so clear (since Rad6 ties into Rad18 and into PCNA ubiquitylation and into PCNA SUMOylation). Now Srs2 is invoked to remove RPA. It seems to me that any model needs to explain how Srs2 can be doing both. I assume that if RPA and Rad51 are both removed from the same ssDNA, the ssDNA will be "trashed" as suggested by Symington's RPA depletion experiments. So building a model that accounts for selective Srs2 action at only some ssDNA regions might be enhanced by also explaining how Rad51 fits into this scheme.

      While the anti-recombinase function of Srs2 was better studied, its “anti-RPA” role in checkpoint dampening was recently described by us (Dhingra et al, 2021) following the initial report by the Haber group some time ago (Vaze et al, 2002). A better understanding of this new role is required before we can generate a comprehensive picture of how Srs2 integrates the two functions (and possibly other functions). Our current work addresses this issue by providing a more detailed understanding of this new role of Srs2.

      Single molecular data showed that Srs2 strips both RPA and Rad51 from ssDNA, but this effect is highly dynamic (i.e. RPA and Rad51 can rebind ssDNA after being displaced) (De Tullio et al, 2017). As such, generation of “deserted” ssDNA regions lacking RPA and Rad51 in cells can be an unlikely event. Rather, Srs2 can foster RPA and Rad51 dynamics on ssDNA. Additional studies will be needed to generate a model that integrates the anti-recombinase and the anti-RPA roles of Srs2.

      As a previous reviewer has pointed out, CPT creates multiple forms of damage. Foiani showed that 4NQO would activate the Mec1/Rad53 checkpoint in G1- arrested cells, presumably because there would be singlestrand gaps but no DSBs. Whether this would be a way to look specifically at one type of damage is worth considering; but UV might be a simpler way to look. As also noted, the effects on the checkpoint and on viability are quite modest. Because it isn't clear (at least to me) why rfa1 mutants are so sensitive to CPT, it's hard for me to understand how srs2-zm2 has a modest suppressive effect: is it by changing the checkpoint response or facilitating repair or both? Or how srs2-3KR or srs2-dPIM differ from rfa1-zm2 in this respect. The authors seem to lump all these small suppressions under the rubric of "proper levels of RPA-ssDNA" but there are no assays that directly get at this. This is the biggest limitation.

      CPT treatment is an ideal condition to examine how cells dampen the DNA damage checkpoint, because while most genotoxic conditions (e.g. 4NQO, MMS) induce both the DNA replication checkpoint and the DNA damage checkpoint, CPT was shown to only induced the latter (Menin et al, 2018; Minca & Kowalski, 2011; Redon et al, 2003; Tercero et al, 2003). Future studies examining 4NQO and UV conditions can further expand our understanding of checkpoint dampening in different conditions.

      We have previously provided evidence to support the conclusion that srs2 suppression of rfa1-zm is partly mediated by changing checkpoint levels (Dhingra et al., 2021). We cannot exclude the possibility that the suppression may also be related to changes of DNA repair; we have now added this note in the text.

      Regarding direct testing RPA levels on DNA, we have previously shown that srs2∆ increased the levels of chromatin associated Rfa1 and this is suppressed by rfa1-zm2 (Dhingra et al., 2021). We have now included chromatin fractionation data to show that srs2-∆PIM also led to an increase of Rfa1 on chromatin, and this was suppressed by rfa1-zm2 (new Fig. S2).

      Srs2 has also been implicated as a helicase in dissolving "toxic joint molecules" (Elango et al. 2017). Whether this activity is changed by any of the mutants (or by mutations in Rfa1) is unclear. In their paper, Elango writes: "Rare survivors in the absence of Srs2 rely on structure-specific endonucleases, Mus81 and Yen1, that resolve toxic joint-molecules" Given the involvement of SLX4, perhaps the authors should examine the roles of structure-specific nucleases in CPT survival?

      Srs2 has several roles, and its role in RPA antagonism can be genetically separated from its role in Rad51 regulation as we have shown in our previous work (Dhingra et al., 2021) and this notion is further supported by evidence presented in the current work. Srs2’s role in dissolving "toxic joint molecules” was mainly observed during BIR (Elango et al, 2017). Whether it is related to checkpoint dampening will be interesting to address in the future but is beyond of the scope of the current work that seeks to answer the question how Srs2 regulates RPA during checkpoint dampening. Similarly, determining the roles of Mus81 and Yen1 and other structural nucleases in CPT survival is a worthwhile task but it is a research topic well separated from the focus of this work.

      Experiments that might clarify some of these ambiguities are proposed to be done in the future. For now, we have a number of very interesting interactions that may be understood in terms of a model that supposes discriminating among gaps and ssDNA extensions by the presence of PCNA, perhaps modified by SUMO. As noted above, it would be useful to think about the relation to Rad6.

      Several studies have shown that Srs2’s functional interaction with Rad6 is based on Srs2-mediated recombination regulation (reviewed by (Niu & Klein, 2017). Given that recombinational regulation by Srs2 is genetically separable from the Srs2 and RPA antagonism (Dhingra et al., 2021), we do not see a strong rationale to examine Rad6 in this work, which addresses how Srs2 regulates RPA. With this said, this study has provided basis for future studies of possible cross-talks among different Srs2-mediated pathways.

      Reviewer #3 (Public Review):

      The superfamily I 3'-5' DNA helicase Srs2 is well known for its role as an anti-recombinase, stripping Rad51 from ssDNA, as well as an anti-crossover factor, dissociating extended D-loops and favoring non-crossover outcome during recombination. In addition, Srs2 plays a key role in in ribonucleotide excision repair. Besides DNA repair defects, srs2 mutants also show a reduced recovery after DNA damage that is related to its role in downregulating the DNA damage signaling or checkpoint response. Recent work from the Zhao laboratory (PMID: 33602817) identified a role of Srs2 in downregulating the DNA damage signaling response by removing RPA from ssDNA. This manuscript reports further mechanistic insights into the signaling downregulation function of Srs2.

      Using the genetic interaction with mutations in RPA1, mainly rfa1-zm2, the authors test a panel of mutations in Srs2 that affect CDK sites (srs2-7AV), potential Mec1 sites (srs2-2SA), known sumoylation sites (srs2-3KR), Rad51 binding (delta 875-902), PCNA interaction (delta 1159-1163), and SUMO interaction (srs2SIMmut). All mutants were generated by genomic replacement and the expression level of the mutant proteins was found to be unchanged. This alleviates some concern about the use of deletion mutants compared to point mutations. Double mutant analysis identified that PCNA interaction and SUMO sites were required for the Srs2 checkpoint dampening function, at least in the context of the rfa1-zm2 mutant. There was no effect of this mutants in a RFA1 wild type background. This latter result is likely explained by the activity of the parallel pathway of checkpoint dampening mediated by Slx4, and genetic data with an Slx4 point mutation affecting Rtt107 interaction and checkpoint downregulation support this notion. Further analysis of Srs2 sumoylation showed that Srs2 sumoylation depended on PCNA interaction, suggesting sequential events of Srs2 recruitment by PCNA and subsequent sumoylation. Kinetic analysis showed that sumoylation peaks after maximal Mec1 induction by DNA damage (using the Top1 poison camptothecin (CPT)) and depended on Mec1. This data are consistent with a model that Mec1 hyperactivation is ultimately leading to signaling downregulation by Srs2 through Srs2 sumoylation. Mec1-S1964 phosphorylation, a marker for Mec1 hyperactivation and a site found to be needed for checkpoint downregulation after DSB induction, did not appear to be involved in checkpoint downregulation after CPT damage. The data are in support of the model that Mec1 hyperactivation when targeted to RPA-covered ssDNA by its Ddc2 (human ATRIP) targeting factor, favors Srs2 sumoylation after Srs2 recruitment to PCNA to disrupt the RPA-Ddc2-Mec1 signaling complex. Presumably, this allows gap filling and disappearance of long-lived ssDNA as the initiator of checkpoint signaling, although the study does not extend to this step.

      Strengths:

      (1) The manuscript focuses on the novel function of Srs2 to downregulate the DNA damage signaling response and provide new mechanistic insights.

      (2) The conclusions that PCNA interaction and ensuing Srs2-sumoylation are involved in checkpoint downregulation are well supported by the data.

      Weaknesses:

      (1) Additional mutants of interest could have been tested, such as the recently reported Pin mutant, srs2-Y775A (PMID: 38065943), and the Rad51 interaction point mutant, srs2-F891A (PMID: 31142613).

      (2) The use of deletion mutants for PCNA and RAD51 interaction is inferior to using specific point mutants, as done for the SUMO interaction and the sites for post-translational modifications.

      (3) Figure 4D and Figure 5A report data with standard deviations, which is unusual for n=2. Maybe the individual data points could be plotted with a color for each independent experiment to allow the reader to evaluate the reproducibility of the results.

      Comments on revisions:

      In this revision, the authors adequately addressed my concerns. The only issue I see remaining is the site of Srs2 action. The authors argue in favor of gaps and against R-loops and ssDNA resulting from excessive supercoiling. The authors do not discuss ssDNA resulting from processing of onesided DSBs, which are expected to result from replication run-off after CPT damage but are not expected to provide the 3'-junction for preferred PCNA loading. Can the authors exclude PCNA at the 5'-junction at a resected DSB?

      We have now added a sentence stating that we cannot exclude the possibility that PCNA may be positioned at a 5’-junction, as this can be observed in vitro, albert that PCNA loading was seen exclusively at a 3’-junction in the presence of RPA (Ellison & Stillman, 2003; Majka et al, 2006).

      Recommendations For the authors:

      Reviewer #2 (Recommendations For the authors):

      A Bonferroni correction should be made for the multiple comparisons in several figures.

      Specific comments:

      l. 41. This is a too long and confusing sentence.

      Sentence shortened: “These data suggest that Srs2 recruitment to PCNA proximal ssDNA-RPA filaments followed by its sumoylation can promote checkpoint recovery, whereas Srs2 action is minimized at regions with no proximal PCNA to permit RPA-mediated ssDNA protection”.

      l. 60. Identify Ddc2 and Mec1 as ATRIP and ATR.

      Done.

      l. 125 "fails to downregulate RPA levels on chromatin and Mec1-mediated DDC..." fails to downregulate RPA and fails to reduce Mec1-mediated DDC?

      Sentence modified: “fails to downregulate both the RPA levels on chromatin and the Mec1-mediated DDC”

      l. 204 "consistent with the notion that Srs2 has roles beyond RPA regulation"... What other roles? It's stripping of Rad51? Removing toxic joint molecules? Something else?

      Sentence modified: “consistent with the notion that Srs2 has roles beyond RPA regulation, such as in Rad51 regulation and removing DNA joint molecules”.

      l. 249 "Significantly, srs2-ΔPIM and -3KR increased the percentage of rfa1-zm2 cells transitioning into the G1 phase" No. Just back to normal. As stated in l. 258: "258 We found that srs2-ΔPIM and srs2-3KR mutants on their own behaved normally in the two DDC assays described above." All of these effects are quite small.

      Sentence modified: “Compared with rfa1-zm2 cells, srs2-∆PIM rfa1-zm2 and srs2-3KR rfa1-zm2 cells showed increased percentages of cells transitioning into the G1 phase”.

      l. 468 "Our previous work has provided several lines of evidence to support that Rad51 removal by Srs2 is separable from the Srs2-RPA antagonism (Dhingra et al., 2021). What evidence? See my comment above about not having both proteins removed at the same time.

      We have addressed this point in our initial rebuttal and some key points are summarized below. In our previous report (Dhingra et al., 2021), we provided several lines of evidence to support the conclusion that Rad51 is not relevant to the Srs2-RPA antagonism. For example, while rad51∆ rescues the hyper-recombination phenotype of srs2∆ cells, rad51∆ did not affect the hyper-checkpoint phenotype of srs2∆. In contrast, rfa1-zm1/zm2 have the opposite effects, that is, rfa1zm1/zm2 suppressed the hyper-checkpoint, but not the hyper-recombination, phenotype of srs2∆ cells. The differential effects of rad51∆ and rfa1-zm1/zm2 were also seen for the ATPase dead allele of Srs2 (srs2K41A). For example, rfa1-zm2 rescued hyper-checkpoint and CPT sensitivity of srs2-K41A cells, while rad51∆ had neither effect. These and other data described by Dhingra et al (2021) suggest that Srs2’s effects on checkpoint vs. recombination can be separated genetically. Consistent with our conclusion summarized above, deleting the Rad51 binding domain in Srs2 (srs2-∆Rad51BD) has no effect on rfa1-zm2 phenotype in CPT (Fig. 2D). This data provides yet another evidence that Srs2 regulation of Rad51 is separable from the Srs2RPA antagonism.

      l. 525 "possibility, we tested the separation pin of Srs2 (Y775), which was shown to enables its in vitro helicase activity during the revision of our work..." ?? there was helicase activity during the revision of your work? Please fix the sentence.

      Sentence modified: “we tested the separation pin of Srs2 (Y775). This residue was shown to be key for the Srs2’s helicase activity in vitro in a report that was published during the revision of our work (Meir et al, 2023).”

      Fig. 3. "srs2-ΔPIM and -3KR allow better G1 entry of rfa1-zm2 cells." is it better entry or less arrest at G2/M? One implies better turning off of a checkpoint, the other suggests less activation of the checkpoint.

      This is a correct statement. For all strains examined in Figure 3, cells were seen in G2/M phase after 1-hour CPT treatment, suggesting proper arrest.

      References:

      Armstrong AA, Mohideen F, Lima CD (2012) Recognition of SUMO-modified PCNA requires tandem receptor motifs in Srs2. Nature 483: 59-63

      Colavito S, Macris-Kiss M, Seong C, Gleeson O, Greene EC, Klein HL, Krejci L, Sung P (2009) Functional significance of the Rad51-Srs2 complex in Rad51 presynaptic filament disruption. Nucleic Acids Res 37: 6754-6764.

      De Tullio L, Kaniecki K, Kwon Y, Crickard JB, Sung P, Greene EC (2017) Yeast Srs2 helicase promotes redistribution of single-stranded DNA-bound RPA and Rad52 in homologous recombination regulation. Cell Rep 21: 570-577

      Dhingra N, Kuppa S, Wei L, Pokhrel N, Baburyan S, Meng X, Antony E, Zhao X (2021) The Srs2 helicase dampens DNA damage checkpoint by recycling RPA from chromatin. Proc Natl Acad Sci U S A 118: e2020185118

      Elango R, Sheng Z, Jackson J, DeCata J, Ibrahim Y, Pham NT, Liang DH, Sakofsky CJ, Vindigni A, Lobachev KS et al (2017) Break-induced replication promotes formation of lethal joint molecules dissolved by Srs2. Nat Commun 8: 1790

      Ellison V, Stillman B (2003) Biochemical characterization of DNA damage checkpoint complexes: clamp loader and clamp complexes with specificity for 5' recessed DNA. PLoS Biol 1: E33

      Kolesar P, Altmannova V, Silva S, Lisby M, Krejci L (2016) Pro-recombination Role of Srs2 Protein Requires SUMO (Small Ubiquitin-like Modifier) but Is Independent of PCNA (Proliferating Cell Nuclear Antigen) Interaction. J Biol Chem 291: 7594-7607.

      Kolesar P, Sarangi P, Altmannova V, Zhao X, Krejci L (2012) Dual roles of the SUMO-interacting motif in the regulation of Srs2 sumoylation. Nucleic Acids Res 40: 7831-7843.

      Majka J, Binz SK, Wold MS, Burgers PM (2006) Replication protein A directs loading of the DNA damage checkpoint clamp to 5'-DNA junctions. J Biol Chem 281: 27855-27861

      Meir A, Raina VB, Rivera CE, Marie L, Symington LS, Greene EC (2023) The separation pin distinguishes the pro- and anti-recombinogenic functions of Saccharomyces cerevisiae Srs2. Nat Commun 14: 8144

      Menin L, Ursich S, Trovesi C, Zellweger R, Lopes M, Longhese MP, Clerici M (2018) Tel1/ATM prevents degradation of replication forks that reverse after Topoisomerase poisoning. EMBO Rep 19: e45535

      Minca EC, Kowalski D (2011) Replication fork stalling by bulky DNA damage: localization at active origins and checkpoint modulation. Nucleic Acids Res 39: 2610-2623

      Niu H, Klein HL (2017) Multifunctional roles of Saccharomyces cerevisiae Srs2 protein in replication, recombination and repair. FEMS Yeast Res 17: fow111

      Papouli E, Chen S, Davies AA, Huttner D, Krejci L, Sung P, Ulrich HD (2005) Crosstalk between SUMO and ubiquitin on PCNA is mediated by recruitment of the helicase Srs2p. Mol Cell 19: 123-133

      Redon C, Pilch DR, Rogakou EP, Orr AH, Lowndes NF, Bonner WM (2003) Yeast histone 2A serine 129 is essential for the efficient repair of checkpoint-blind DNA damage. EMBO Rep 4: 678-684

      Tercero JA, Longhese MP, Diffley JFX (2003) A central role for DNA replication forks in checkpoint activation and response. Mol Cell 11: 1323-1336

      Vaze MB, Pellicioli A, Lee SE, Ira G, Liberi G, Arbel-Eden A, Foiani M, Haber JE (2002) Recovery from checkpointmediated arrest after repair of a double-strand break requires Srs2 helicase. Mol Cell 10: 373-385

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I In this manuscript, Jiao D et al reported the induction of synthetic lethal by combined inhibition of anti-apoptotic BCL-2 family proteins and WSB2, a substrate receptor in CRL5 ubiquitin ligase complex. Mechanistically, WSB2 interacts with NOXA to promote its ubiquitylation and degradation. Cancer cells deficient in WSB2, as well as heart and liver tissues from Wsb2-/- mice exhibit high susceptibility to apoptosis induced by inhibitors of BCL-2 family proteins. The anti-apoptotic activity of WSB2 is partially dependent on NOXA.

      Overall, the finding, that WSB2 disruption triggers synthetic lethality to BCL-2 family protein inhibitors by destabilizing NOXA, is rather novel. The manuscript is largely hypothesis-driven, with experiments that are adequately designed and executed. However, there are quite a few issues for the authors to address, including those listed below.

      Specific comments:

      (1) At the beginning of the Results section, a clear statement is needed as to why the authors are interested in WSB2 and what brought them to analyze "the genetic co-dependency between WSB2 and other proteins".

      We thank the reviewer for raising this important point. We agree that a clear rationale should be provided at the beginning of the Results section. As reported in previous studies [Ref: 1, 2, 3], strong synthetic interactions have been observed between WSB2 and several mitochondrial apoptosis-related factors, including MCL-1, BCL-xL, and MARCH5. We have referenced these findings in the Discussion section. Motivated by these studies, we became interested in the role of WSB2 and aimed to investigate the specific mechanisms underlying its synthetic lethality with anti-apoptotic BCL-2 family members. We will revise the beginning of the Results section to clearly state this rationale.

      (1) McDonald, E.R., 3rd et al. Project DRIVE: A Compendium of Cancer Dependencies and Synthetic Lethal Relationships Uncovered by Large-Scale, Deep RNAi Screening. Cell 170, 577-592 e510 (2017).

      (2) DeWeirdt, P.C. et al. Genetic screens in isogenic mammalian cell lines without single cell cloning. Nat Commun 11, 752 (2020).

      (3) DeWeirdt, P.C. et al. Optimization of AsCas12a for combinatorial genetic screens in human cells. Nat Biotechnol 39, 94-104 (2021).

      (2) In general, the biochemical evidence supporting the role of WSB2 as a SOCS box-containing substrate-binding receptor of CRL5 E3 in promoting NOXA ubiquitylation and degradation is relatively weak. First, since NOXA binds to WSB2 on its SOCS box, which consists of a BC box for Elongin B/C binding and a CUL5 box for CUL5 binding, it is crucial to determine whether the binding of NOXA on the SOCS box affects the formation of CRL5WSB2 complex. The authors should demonstrate the endogenous binding between NOXA and the CRL5WSB2 complex. Additionally, the authors may also consider manipulating CUL5, SAG, or ElonginB/C to assess if it would affect NOXA protein turnover in two independent cell lines.

      We thank the reviewer for raising this important point. To determine whether endogenous NOXA binds to the intact CRL5<sup>WSB2</sup> complex, we performed co-immunoprecipitation assays using an antibody against NOXA. Indeed, NOXA co-immunoprecipitated with all subunits of the CRL5<sup>WSB2</sup> complex (Figure 2—figure supplement 1D), suggesting that NOXA binding to WSB2 does not disrupt interactions between WSB2 and the other CRL5 subunits. Moreover, depletion of CRL5 complex components (RBX2/SAG, CUL5, ELOB, or ELOC) through siRNAs in C4-2B or Huh-7 cells also resulted in a marked increase in NOXA protein levels.

      Second, in all the experiments designed to detect NOXA ubiquitylation in cells, the authors utilized immunoprecipitation (IP) with FLAG-NOXA/NOXA, followed by immunoblotting (IB) with HA-Ub. However, it is possible that the observed poly-Ub bands could be partly attributed to the ubiquitylation of other NOXA binding proteins. Therefore, the authors need to consider performing IP with HA-Ub and subsequently IB with NOXA. Alternatively, they could use Ni-beads to pull down all His-Ub-tagged proteins under denaturing conditions, followed by the detection of FLAG-tagged NOXA using anti-FLAG Ab. The authors are encouraged to perform one of these suggested experiments to exclude the possibility of this concern. Furthermore, an in vitro ubiquitylation assay is crucial to conclusively demonstrate that the polyubiquitylation of NOXA is indeed mediated by the CRL5WSB2 complex.

      We appreciate the reviewer for raising these important considerations regarding our ubiquitylation assays. We fully acknowledge the reviewer's concern that classical ubiquitination assays could potentially detect ubiquitination of proteins interacting with NOXA. However, we would like to clarify that our experimental conditions effectively mitigate this issue. Specifically, cells were lysed using buffer containing 1% SDS followed by boiling at 105°C for 5 minutes. These rigorous denaturing conditions ensure disruption of non-covalent protein interactions, thereby effectively eliminating the possibility of detecting ubiquitination signals from NOXA-associated proteins.

      Regarding the suggestion to perform an in vitro ubiquitination assay, we agree this experiment would indeed provide additional evidence. However, due to significant technical complexities associated with reconstituting CRL5-based E3 ubiquitin ligase activity in vitro—which would require the expression and purification of at least six recombinant proteins—such experiments are rarely performed in this context. Furthermore, NOXA is uniquely localized as a membrane protein on the mitochondrial outer membrane, posing additional significant challenges for protein expression and purification. Given the robustness of our current in vivo ubiquitylation assay under stringent denaturing conditions, we believe our existing data sufficiently and conclusively demonstrate NOXA ubiquitination mediated by the CRL5<sup>WSB2</sup> complex.

      (3) In their attempt to map the binding regions between NOXA and WSB2, the authors utilized exogenous proteins of both WSB2 and NOXA. To strengthen their findings, it would be more convincing to perform IP with exogenous wt/mutant WSB2 or NOXA and subsequently perform IB to detect endogenous NOXA or WSB2, respectively. Additionally, an in vitro binding assay using purified proteins would provide further evidence of a direct binding between NOXA and WSB2.

      We thank the reviewer for raising these important issues. In response to the reviewer’s suggestion to map the binding regions between NOXA and WSB2 more convincingly, we have indeed performed semi-endogenous Co-IP assays, which yielded results consistent with our exogenous protein experiments (Figure 3—figure supplement 1A, B). Concerning the recommendation to further validate direct interaction using purified recombinant proteins, we encountered substantial technical difficulties in obtaining pure and soluble recombinant WSB2 protein. Additionally, given that NOXA is an outer mitochondrial membrane protein and the interaction occurs on mitochondria, we believe that an in vitro binding assay may have limited physiological relevance. We hope the reviewer can appreciate these practical challenges and our current evidence supporting the strong interaction between NOXA and WSB2.

      Reviewer #2 (Public Review):

      Summary:

      Exploring the DEP-MAP database and two drug-screen databases, the authors identify WSB2 as an interactor of several BCL2 proteins. In follow-up experiments, they show that CRL5/WSB2 controls NOXA protein levels via K48 ubiquitination following direct protein-protein interaction, and cell death sensitivity in the context of BH3 mimetic treatment, where WSB2 depletion synergizes with drug treatment.

      Strengths:

      The authors use a set of orthogonal methods across different model cell lines and a new WSB2 KO mouse model to confirm their findings. They also manage to correlate WSB2 expression with poor prognosis in prostate and liver cancer, supporting the idea that targeting WSB2 may sensitize cancers for treatment with BH3 mimetics.

      Weaknesses:

      The conclusions drawn based on the findings in cancer patients are very speculative, as regulation of NOXA cannot be the sole function of CRL5/WSB2 and it is hence unclear what causes correlation with patient survival. Moreover, the authors do not provide a clear mechanistic explanation of how exactly higher levels of NOXA promote apoptosis in the absence of WSB2. This would be important knowledge, as usually high NOXA levels correlate with high MCL1, as they are turned over together, but in situations like this, or loss of other E3 ligases, such as MARCH, the buffering capacity of MCL1 is outrun, allowing excess NOXA to kill (likely by neutralizing other BCL2 proteins it usually does not bind to, such as BCLX). Moreover, a necroptosis-inducing role of NOXA has been postulated. Neither of these options is interrogated here.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2J. The authors showed that "the mRNA levels of NOXA were even reduced in WSB2-KO cells compared to parental cells". What is the possible mechanism? This point should at least be discussed.

      We thank the reviewer for raising these important issues. The underlying mechanisms for the significantly lower mRNA levels of NOXA following the KO of WSB2 are not fully understood at present. However, we propose that this could represent a form of negative feedback regulation at the level of gene expression. Specifically, when the protein levels of BNIP3/3L rise sharply, it may activate mechanisms that suppress their own mRNA synthesis or stability, serving as a buffering system to prevent further protein accumulation. Such negative feedback loops may be critical for maintaining cellular homeostasis and avoiding excessive protein production. Moreover, this phenomenon is frequently observed in other studies investigating substrates targeted by E3 ubiquitin ligases for degradation. We have elaborated on this point in the Discussion section.

      (2) Figure 2M. A previous study has clearly demonstrated that NOXA is subjected to ubiquitylation and degradation by CRL5 E3 ligase (PMID: 27591266). This paper should be cited. Also, in that publication, NOXA ubiquitylation is via the K11 linkage, not the K48 linkage. The authors should include K11R mutant in their assay.

      We thank the reviewer for raising this important issue. We thank the reviewer for suggesting the relevant reference (PMID: 27591266), which we have now cited accordingly. Additionally, we would like to clarify that our new in vivo ubiquitination assays included the K11R and K11-only ubiquitin mutants, and our data demonstrate that WSB2-mediated NOXA ubiquitination indeed involves the K11 linkage ubiquitination(Figure 2—figure supplement 1E).

      (3) Figure 3H, J. The authors stated, "By mutating these lysine residues to arginine, we found that WSB2-mediated NOXA ubiquitination was completely abolished". Which one of the three lysine residues is playing the dominant role?

      We thank the reviewer for raising this important issue. To address this, we generated FLAG-NOXA mutants individually substituting lysine residues K35, K41, and K48 with arginine. In vivo ubiquitination assays demonstrated that lysine 48 (K48) is the predominant residue responsible for WSB2-mediated NOXA ubiquitination (Figure 3—figure supplement 1C).

      (4) Figure 3N. The authors need to show that the fusion peptide containing C-terminal NOXA peptide competitively inhibits the interaction between endogenous WSB2 and NOXA and extends the protein half-life of NOXA, leading to NOXA accumulation.

      We sincerely thank the reviewer for raising these important issues. As suggested, we investigated whether the fusion peptide containing the C-terminal NOXA sequence competitively disrupts the interaction between endogenous WSB2 and NOXA, subsequently influencing NOXA stability. Our results demonstrated that treatment with this fusion peptide indeed significantly reduced the endogenous interaction between WSB2 and NOXA (Figure 3—figure supplement 1D). Furthermore, we observed that the peptide dose-dependently increased endogenous NOXA protein levels and prolonged its protein half-life, thereby resulting in the accumulation of NOXA (Figure 3N; Figure 3—figure supplement 1E, F). These findings collectively indicate that the fusion peptide competitively inhibits the WSB2-NOXA interaction, stabilizes NOXA protein, and enhances its accumulation.

      (5) Figure 4. a) It would be better to investigate whether WSB2 knockdown can sensitize cancer cells to the treatment with ABT-737 or AZD5991, evidenced by a decrease in both IC50 values and clonogenic survival rates and whether such sensitization is dependent on NOXA. b) The authors need to show the levels of cleaved caspase-3/7/9 and the percentages of apoptotic cells in shNC cells upon silencing of WSB2 in Figure 4A-F. c) It will be more convincing to repeat the experiment to show synthetic lethality by WSB2 disruption and MCL-1 inhibitor AZD5991 treatment using another cell line, such as WSB2-deficient Huh-7 cells in Figure 4 I&J.

      We sincerely thank the reviewer for these valuable and constructive suggestions. Regarding point (a): We believe that our current Western blot and flow cytometry data (Figure 4G–L) have already provided strong evidence that WSB2 depletion enhances apoptosis in response to ABT-737 and AZD5991. Therefore, we consider that additional IC50 and clonogenic survival assays, while informative, may not be essential for supporting our conclusion. Furthermore, as shown in Figure 5A–F, we found that silencing NOXA largely, though not completely, reversed the enhanced apoptosis triggered by these inhibitors in WSB2-deficient cells, suggesting that the sensitization effect is at least partially dependent on NOXA.

      Regarding point (b): We have shown that WSB2 knockout alone had no impact on the levels of cleaved caspase-3/7/9 or the percentages of apoptotic cells in Huh-7 and C4-2B cells (Figure 4G-L and Figure 4—figure supplement 1A-D), indicating that WSB2 loss does not induce apoptosis on its own under basal conditions.

      Regarding point (c): We appreciate the reviewer’s suggestion and have now repeated the experiment in WSB2 knockout Huh-7 cells. The new results further support the synthetic lethality between WSB2 loss and AZD5991 treatment (Figure 4—figure supplement 1C, D).

      (6) Figure 5A/C/E. The effect of siNOXA is minor, if any, for cleavage of caspases. The same thing for Figure 6F/H.

      We appreciate the reviewer’s insightful observation regarding the relatively modest effect of shNOXA on caspase cleavage in Figures 5A/C/E and Figures 6F/H. Indeed, we acknowledge that the reduction in caspase cleavage following NOXA knockdown is moderate. However, consistent with our discussions in the manuscript, NOXA knockdown significantly—but not completely—rescued the increased apoptosis observed in WSB2-deficient cells treated with BCL-2 family inhibitors. This suggests that while NOXA plays a notable role, additional mechanisms or unidentified targets may also be involved in WSB2-mediated regulation of apoptosis.

      (7) Figure 5 I&J. The authors may consider performing IHC staining, immunofluorescence, or WB analysis to show the levels of NOXA and cleaved caspases or PARP in xenograft tumors. This would provide in vivo evidence of significant apoptosis induction resulting from the co-administration of ABT-737 and R8-C-terminal NOXA peptide.

      We appreciate the reviewer's thoughtful suggestion regarding additional immunohistochemical or immunofluorescence analyses in xenograft tumors. However, due to current limitations in available antibodies suitable for reliable detection of NOXA by IHC and IF, we are unable to perform these experiments. We greatly appreciate the reviewer's understanding of this technical constraint. Nevertheless, our existing data collectively supports the conclusion that the combination of ABT-737 and R8-C-terminal NOXA peptide significantly enhances apoptosis in vivo.

      (8) Figure 7. Does an inverse correlation exist between the protein levels of WSB2 and NOXA in RPAD or LIHC tissue microarrays? On page 12, in the first paragraph, Figure 7M-P was cited incorrectly.

      We sincerely thank the reviewer for raising this important issue. As mentioned above, due to current limitations regarding the availability of suitable antibodies that can reliably detect NOXA by IHC, we regret that it is not feasible to experimentally address this question at this time.

      Additionally, we have carefully corrected the citation error involving Figure 7M-P on page 12, as pointed out by the reviewer.

      (9) Figure S1D. BCL-W levels were reduced upon WSB2 overexpression, which should be acknowledged.

      We sincerely thank the reviewer for raising this important issue. We acknowledge that BCL-W protein levels were slightly reduced upon WSB2 overexpression in Figure S1D. However, this effect is distinct from the pronounced reduction observed in NOXA protein levels. We have revised the manuscript to clarify this point. Additionally, we recognize that transient overexpression systems may occasionally lead to non-specific or artifactual changes. Our exogenous expression and co-immunoprecipitation experiments did not support an interaction between BCL-W and WSB2. Therefore, the observed reduction of BCL-W under these conditions may not reflect a physiologically relevant regulation.

      (10) Figure S4. Given WSB2 KO mice are viable; the authors may consider determining whether these mice are more sensitive to radiation-induced tissue damage or but more resistant to radiation-induced tumorigenesis?

      We sincerely thank the reviewer for this insightful and biologically meaningful suggestion. We agree that investigating the potential role of WSB2 in radiation-induced tissue damage and tumorigenesis would be of great interest. However, conducting such experiments requires access to specialized irradiation facilities, which are currently unavailable to us. Nevertheless, we recognize the value of this line of investigation and plan to explore it in our future studies.

      (11) All data were displayed as mean{plus minus}SD. However, for data from three independent experiments, it is more appropriate to present the results as mean{plus minus}SEM, not mean{plus minus}SD.

      We sincerely thank the reviewer for highlighting this important issue. In line with the reviewer's suggestion, we have revised the manuscript accordingly and now present data from three independent experiments as mean ± SEM.

      (12) The figure legends require careful review: i) The low dose of ABT-199 (Figure 6H) and the dose of ABT-199 used in Figure 6I are missing. ii) The legends for Figure S1D-E are incorrect. iii) The name of the antibody in the legend of Figure S3C is incorrect.

      We sincerely thank the reviewer for raising these important issues. We have carefully corrected all the errors mentioned. In addition, we have thoroughly reviewed the manuscript to prevent similar errors.

      Reviewer #2 (Recommendations For The Authors):

      The authors focus on NOXA, after initially identifying WSB2 to interact with several BCL2 proteins. The rationale behind this is that WSB2 depletion or overexpression affects NOXA levels, but none of the other BCL2 proteins tested, as stated in the text. Yet, BCLW is also depleted upon overexpression of WSB2 (Supplementary Figure 1). How does this phenomenon relate to the sensitization noted, is BCL-W higher in WSB2 KO cells? It does not seem so though. This warrants discussion.

      We appreciate the reviewer for raising this important issue. Our results showed that overexpression of WSB2 markedly reduced NOXA levels, while the levels of other BCL-2 family proteins remained unaffected or minimally affected, such as BCL-W (Figure 2—figure supplement 1A). Furthermore, depletion of WSB2 through shRNA-mediated KD or CRISPR/Cas9-mediated KO in C4-2B cells or Huh-7 cells led to a marked increase in the steady-state levels of endogenous NOXA, without affecting other BCL-2 family proteins examined, included BCL-W (Figure 2A-C, Figure 2—figure supplement 2A, B).

      If WSB2 depletion does not affect MCL1 levels, how does excess NOXA actually kill? Does it bind to any (other) prosurvival proteins under conditions of WSB2 depletion? Is the MCL1 half-life changed?

      We appreciate the reviewer for raising this important point. NOXA is a BH3-only protein known to promote apoptosis primarily by binding to and neutralizing anti-apoptotic BCL-2 family members, especially MCL-1, via its BH3 domain. It can inhibit MCL-1 either through competitive binding or by facilitating its ubiquitination and subsequent proteasomal degradation. In our system, the total protein levels of MCL-1 remained unchanged in WSB2 knockout cells, suggesting that NOXA may not be promoting apoptosis through enhanced MCL-1 degradation. Instead, we speculate that the accumulation of NOXA in WSB2-deficient cells enhances apoptosis by sequestering MCL-1 through direct binding, thereby freeing pro-apoptotic effectors such as BAK and BAX. In line with our observations, Nakao et al. reported that deletion of the mitochondrial E3 ligase MARCH5 led to a pronounced increase in NOXA expression, while leaving MCL-1 protein levels unchanged in leukemia cell lines (Leukemia. 2023 ;37:1028-1038., PMID: 36973350).

      Additionally, NOXA has been reported to interact with other anti-apoptotic proteins, including BCL-XL. It is therefore possible that under conditions of WSB2 depletion, excess NOXA may also bind to BCL-XL and relieve its inhibition of BAX/BAK, further contributing to apoptosis. Future experiments assessing NOXA binding partners in WSB2-deficient cells would help clarify this mechanism.

      I think some initial insights into the mechanism underlying the sensitization would add a lot to this study. Is there a role of BFL1/A1 in any of these cell lines, as it can also rather selectively bind to NOXA and is sometimes deregulated in cancer?

      We appreciate the reviewer for raising this important issue. While BFL1/A1 is indeed another anti-apoptotic BCL-2 family member that can selectively bind to NOXA and has been implicated in cancer, our study primarily focuses on the WSB2-NOXA axis. However, given its potential involvement in apoptosis regulation, it would be an interesting direction for future studies to explore whether BFL1/A1 contributes to NOXA-mediated sensitization in specific cellular contexts.

      Otherwise, this is a very nice and convincing study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor coreceptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      Thank you for your suggestion. In the Materials and Methods, we mention how we selected the target region and evaluated potential off-target sites by Exonerate and CHOPCHOP. Neither of these methods found potential off-target sites with a more-than-17-nt alignment identity. Therefore, we assumed no off-target effect in our Orco knockout. Furthermore, we did not find any developmental differences between wildtype and knockout caterpillars when these were reared on leaf discs in Petri dishes (Fig S4). We will further highlight this information on the off-target evaluation in the Results section.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orcoexpressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      Thank you for pointing this out. The figure shows only a qualitative comparison between WT and KO and we did not aim to determine the total number of Orco positive neurons in the maxillary palps or antennae of WT and KO caterpillars, but please see our previous work for the neuron numbers in the caterpillar antennae (Wang et al., 2024). We did indeed find more than one neuron in the maxillary palps, but as these were in very different image planes it was not possible to visualize them together. However, we will add a few sentences in the Results and Discussion section to explain the results of the maxillary palp Orco staining.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      Thank you for pointing this out. The four glomeruli in Figure 1G and 1H are not strictly corresponding. We circled these glomeruli to highlight them, as they are the best visualized and clearly shown in this view. In this study, we only counted the number of glomeruli in both WT and KO, however, we did not clarify which glomeruli are missing in the KO caterpillar brain. We will further clarify this in the figure legend.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      Thank you for your suggestion. We do agree with your suggestion, and we will consider moving this part to the supplementary information. Regarding larval olfactory response, we unfortunately failed to record any spikes using single sensillum recordings due to the difficult nature of the preparation; however we do believe that this would be an interesting avenue for further research.

      (5)Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      Thank you for pointing this out. The sentence is “We compared the behaviors of both WT and Orco KO caterpillars in response to clean air, a healthy plant and a caterpillar-infested plant”. We tested these three stimuli in two comparisons: healthy plant vs no plant, infested plant vs no plant. The two comparisons are shown in Figure 3C separately. We will aim to describe this more clearly in the revised version of this manuscript.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      Thank you for pointing this out. We did not make a comparison between the data of Figures 3A and 3E since the two experiments were not conducted at the same time due to the limited space in our BioSafety III greenhouse. We do agree that the weight decrease in Figure 3E is partly due to the reduced caterpillar growth shown in Figure 3A. However, we are confident that the additional decrease in caterpillar weight shown in Figure 3E is mainly driven by the presence of disarmed parasitoids. To be specific, the average weight in Figure 3A is 0.4544 g for WT and 0.4230 g for KO, KO weight is 93.1% of WT caterpillars. While in Figure 3E, the average weight is 0.4273 g for WT and 0.3637 g for KO, KO weight is 85.1% of WT caterpillars. We will discuss this interaction between caterpillar growth and the effect of the parasitoid attacks more extensively in the revised version of the manuscript.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      We are happy that you highlight this point. When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasps (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      Thank you for the suggestion. We assume you mean Figure 4D/4E instead of Figure 4B. In Figure 4B, many of the identified chemical compounds are essentially plant volatiles, especially those from caterpillar frass and caterpillar spit. In Figure 4D/4E, most of the tested chemicals are derived from plants. But indeed, we did not include ITCs, based on information from the EAG results in Figures 2A & 2B. Butterfly antennae did not respond strongly to ITCs, so we did not include ITCs in the larval behavioural tests. Instead, the tested chemicals in Figure 4D/4E either elicit high EAG responses of butterflies or have been identified as “important” by VIP scores in the chemical analyses. In the EAG results of Plutella xylostella (Liu et al., 2020), moths responded well to a few ITCs, the tested ITCs in our study are actually adopted from this study except for those that were not available to us. However, butterflies did not show a strong response to the tested ITCs; therefore, we did not include ITCs because we expected that Pieris brassicae caterpillars are not likely to show good responses to ITCs. We will add this explanation to the revised version of our manuscript.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      We will add more detailed descriptions for the setup and method in the Materials and Methods.

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

      Thank you for pointing this out. We used both clean filter paper and clean filter paper with 10 μL paraffin oil as negative controls, but we did not find a significant difference between the two controls. Therefore, in the EAG results of Figure 2A/2B, we presented paraffin oil as one of the tested chemicals. We will re-run our statistical tests with paraffin oil as negative control, although we do not expect any major differences to the previous tests.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      (1) I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      We do agree with your comment that both mechanisms may be at work in nature and we do address this in the Discussion section. In our study, we did find that wildtype caterpillars were more efficient in locating their food source and did grow faster on full plants than knockout caterpillars. This faster growth will enable wildtype caterpillars to more quickly outgrow the life-stages most vulnerable to the parasitoids (L1 and L2). The olfactory system therefore supports the escape from parasitoids indirectly by enhancing feeding efficiency directly.

      Figure 3D shows that WT caterpillars prefer infested plants without parastioids to infested plants with parasitoids. In addition, we observed that caterpillars move frequently between different leaves. Therefore, we speculate that WT caterpillars make use of volatiles from the plant or from (parasitoid-exposed) conspecifics via their spit or faeces to avoid parts of the plant potentially attracting natural enemies. Knockout caterpillars are unable to use these volatile danger cues and therefore do not avoid plant parts that are most attractive to their natural enemies, making KO caterpillars more susceptible and leading to more natural enemy harassment. Through this, olfaction also directly impacts the ability of a caterpillar to find an enemy-free feeding site.

      We think that olfaction supports the enemy avoidance of caterpillars via both these mechanisms, although at different time scales. Unfortunately, our analysis was not detailed enough to discern the relative importance of the two mechanisms we found. However, we feel that this would be an interesting avenue for further research. Moreover, we will sharpen our discussion on the potential importance of the two different mechanisms in the revised version of this manuscript.

      (2) My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      We will revise the sample size in the text to make it more clear.

      (3) I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

      Thank you for pointing this out. We will provide more detailed test statistics in the main text and in the supplementary materials of the revised version of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract

      Line 24: "optimal food plant" should be changed to "optimal food plants"

      Thank you for the suggestion, we will revise it.

      (2) Introduction

      Lines 44-46: The sentence should be rephrased.

      Thank you for the suggestion, we will revise it.

      Line 50: "are" should be changed to "is".

      Thank you for the suggestion, we will revise it.

      Lines 57 and 58: Please provide the Latin names of "brown planthoppers" and "striped stem borer".

      Thank you for the suggestion, we will revise it.

      Line 85: "investigate the influence of odor-guided behavior by this primary herbivore on the next trophic levels"; similarly, Line 160: "investigate if caterpillars could locate the optimal host-plant when supplied with differently treated plants". These sentences are not very accurate in describing the relevant experiments. A: Thank you for the suggestion, we will revise them.

      Reviewer #2 (Recommendations for the authors):

      (1) L53 Remove the "the" from "Under the strong selection pressure"

      Thank you for the suggestion, we will revise it.

      (2) L80 I suggest adding a reference for the spitting behaviour, e.g. Muller et al 2003.

      Thank you for the suggestion, we will add it.

      (3) L89 establishing a homozygous KO insect colony.

      Thank you for the suggestion, we will revise it.

      (4) L107 perhaps this goes against the journal style but I always like to see acronyms explained the first time they are used.

      Thank you for the suggestion, we will try to make it more understandable.

      (5) L146-148 sentence difficult to read - consider rephrasing.

      Thank you for the suggestion, we will revise it.

      (6) L230 do you mean still produce? Rather than still reproduce?

      Thank you for the suggestion, we will revise it.

      (7) L233 missing an and before "a greater vulnerability to the parasitoid wasp".

      Thank you for pointing this out, we will revise it.

      (8) L238 malfunctional is a strange word choice.

      Thank you for pointing this out, we will revise it.

      (9) L181 - can the authors confirm that this lower survival was due to parasitism by the wasps?

      This question is similar to Q(7) of Reviewer 1, so we quote our answer for Q(7) here:

      When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasp (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (10) L474 - has it been tested if wasps still behave similarly after their ovipositor has been removed?

      Thank you for pointing out this issue. We did not strictly compare if disarmed and untreated wasps have similar behaviors. However, we did observe if disarmed wasps can actively move or fly after recovering from anesthesia before releasing into a cage, otherwise we would replace with another active one.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We have added a table as supplementary figure 3 that shows a comparison of all candidates. While there are differences in both proteomes, components such as ZO proteins and the endocytosis machinery are clearly conserved.

      No description of how mass spectrometry was done and what type of validation was done.

      We have contacted the mass spec facility we worked with and added a paragraph explaining the mass spec. procedure in the material and methods section.

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary. This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina). This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins.

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa. The background mainly originates from these two proteins. Therefore, it is easy to distinguish specific hits from non-specific background.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 151. There are many gap junctions at which Cx35b is not colocalized with Cx34.7.

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript. We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include a comprehensive summary of the results from the quantitative proteomics analyses, such as the number of proteins detected in each species and the number of proteins associated with each GO term. Additionally, a clear figure or table highlighting the specific proteins conserved between zebrafish and mice would strengthen the evidence for evolutionary conservation of proteins at electrical synapses.

      We have added the raw data we received from our mass spec facility including a comparison of all the candidates for different species. Supplementary figure 3.

      (2) A more detailed description of the number of experimental and/or technical replicates would improve the technical rigor of the study. For example, what was the rationale for using different log2 fold-change cutoffs in mice versus zebrafish? Are the replicates consistent in terms of protein enrichment?

      We have added raw data from individual experiments as a supplement (Excel spreadsheet). We have two replicates from zebrafish and two from mice. The first experiment in mice was conducted with fewer retinas and a different promoter (human synapsin promoter) and didn’t yield nearly as many candidates. We are currently running a third experiment with 35 mouse retinas which will most likely detect more candidates as we have identified currently. We can update the proteome in this paper once the analysis is complete. It is not feasible to conduct these experiments with multiple replicates at the same time, since the number of animals that have to be used is simply too high, especially since very specific genotypes are required that are difficult obtain.

      (3) It would be interesting to determine whether there are differences in the presence of candidate proteins between AII-AII gap junctions and AII-cone bipolar cell gap junctions. Given that the subcellular localization of AII-AII gap junctions differs from that of AII-cone bipolar cell gap junctions (with most AII-AII gap junctions located below AII-cone ones), histological validations of the proteins shown in Figure 6 can be repeated for AII-AII gap junctions. This would help reveal similarities or differences in the protein compositions of these two types of gap junctions.

      Thank you for this suggestion. We had similar plans. However, we realized that homologous gap junctions are difficult to recognize with GFP. The dense GFP labeling in the proximal IPL, where AII-AII gap junctions are formed, does not allow us to clearly trace the location of individual dendrites from different cells. Detecting AII-AII gap junctions would require intracellular dye Injections of neighboring AII cells. Unfortunately, we don’t have a set up that would allow this. Bipolar cell terminals, on the contrary, are a lot easier to detect with markers such as SCGN, which is why we decided to focus on AII/ONCB gap junctions.

      (4) In Figures 1 and 2, it would be helpful to clarify in the figure legends whether the proteins in the interaction networks represent all detected proteins or only those selected based on log2 fold-change or other criteria.

      Thank you for this suggestion! We have added a description in lines 643 and 662.

      (5) In Figure 1A (bottom panel), please include a negative control for the Neutravidin staining result from the non-labeling group.

      We only tested the biotinylation for wild type retinas in cell lysates and western blots as shown in figure 1C, which shows an entirely different biotinylation pattern.

      (6) In Figure 2B, please include the results of Neutravidin staining for both the labeling and non-labeling groups.

      Same comment: We see the differences in the biotinylation pattern on western blots, which is distinct for Cx36-EGFP and wild type retinas, although both genotypes were injected with the same AAV construct and the same dose of biotin. We hope that this provides sufficient evidence for the specificity of our approach.

      (7) In Figure 5B, the sizes of multiple proteins detected by Western blotting are inconsistent and confusing. For example, the size of Cx36 in the "FLAG-SJ2BP" panel differs from that in the other three panels. Additionally, in the "Myc-SIPA1L3+" panel, the size of SIPA1l3 appears different between the input and IP conditions.

      Thank you for pointing this out! The differences in the molecular weight can be explained by dimerization. We have indicated the position of the dimer and the monomer bands with arrows. Especially, when larger amounts of Cx36 are coprecipitated Cx36 preferentially occurs as a dimer. This can also be seen in our previous publication:

      S. Tetenborg et al., Regulation of Cx36 trafficking through the early secretory pathway by COPII cargo receptors and Grasp55. Cellular and Molecular Life Sciences 81, 1-17 (2024). Figure 1D

      The band that occurs above 150kDa in the SIPA1L3 input is most likely a non-specific product. The specific band for SIPA1L3 can be seen in the IP sample, which has the appropriate molecular weight. We often see much better immuno reactivity for the protein of interest in IP samples, because the protein is concentrated in these experiments which facilitates its detection.

      (8) How specific are the antibodies used for validating the proteins in this study? Given that many proteins, such as EPS15l1, HIP1R, SNAP91, GPrin1, SJ2BP, Syt4, show broad distribution in the IPL (Figure 3B, 4A, 6D), it is important to validate the specificity of these antibodies. Additionally, including negative controls in the histological validation would strengthen the reliability of the results.

      We carefully selected the antibodies based on western blot data, that confirmed that each antibody detected an antigen of appropriate size. Moreover, the distribution of the proteins mentioned is consistent with function of each protein described in the literature. EPS15L1 and GPrin1 for instance are both membrane-associated, which is evident in Hek cells. Figure 5C.

      A true negative control would require KO tissue and we don’t think that this is feasible at this point.

      (9) In Figure 7F, the model could be improved by highlighting which components may be conserved between zebrafish and mice, as well as which components are conserved between the AII-AII junction and AII-cone bipolar cell junction?

      Thank you for this suggestion. However, we don’t think that this is necessary as our study primarily focuses on the AII amacrine cell.

      Currently we are unable to distinguish differences in the composition of AII-AII and AII-ONCB junctions as described above.

      (10) Are there any functional measurements that could support the conclusion that "loss of Cx36 resulted in a quantitative defect in the formation of electrical synapse density complex"?

      The loss of electrical synapse density proteins is shown by these immunostaining comparisons. Functional measurements necessarily depend on the function of the electrical synapse itself, which is gone in the case of the Cx36 KO. It is not clear that a different functional measurement can be devised.

      Reviewer #3 (Recommendations for the authors):

      (1) It would be very helpful if there were page and line numbers on the manuscript.

      Line and page numbers have been added.

      (2) Typos in the 3rd paragraph, the sentence 'which is triggered by the influx of Calcium though non-synaptic NMDA...'

      Should it read '... Calcium THROUGH non-synaptic NMDA'?

      We have corrected this typo.

      (3) Figure 1B: please add a description of the top panels, 'Cx36 S293'.

      A description of the top panels has been added to the figure legend in line. Line 639.

      (4) Figure 1C: what do the arrows indicate?

      We apologize for the confusion. The arrows in the western blot indicate the position of the Cx35-V5-TurboID construct, which can be detected with streptavidin-HRP and the V5 antibody. We have added a description for these arrows to the figure legend. See line 641.

      (5) Related to the point in the 'Weakness', there are some descriptions of how well some of the gap junction-associated proteins colocalize with Cx36 in immunostaining. For example, 'In comparison to the scaffold proteins, however, the colocalization of Cx36 with each of these endocytic components, was clearly less frequent and more heterogenous, which appears to reflect different stages in the life cycle of Cx36' and 'All of these proteins showed considerable colocalization with Cx36 in AII amacrine cell dendrites'. It would be nice to see quantification data to support these claims.

      Thank you for this suggestion. We have added a colocalization analysis to figure 3 (C & D). We quantified the colocalization for the endocytosis proteins Eps15l1 and Hip1r. This quantification included a flipped control to rule out random overlap. For both proteins we confirmed true colocalization (Figure 3D).

      (6) In Figure 5B, it would be helpful if there were arrows or some kind in western blottings to indicate which bands are supposed to be the targeted proteins.

      We have added arrows in IP samples to indicate bands representing the corresponding protein.

      (7) In the sentence including 'for the PBM of Cx36, as it is the case for ZO-1', what is PBM?

      The PBM means PDZ binding motif. We have added an explanation for this abbreviation in line 244.

      (8) Please add a description of the Cx35b promoter construct in the Method section.

      The Cx35b Promoter is a 6.5kb fragment. We will make the clone available via Addgene to ensure that all details of the clone can be accessed via snapgene or alternative software.

    1. Author response:

      Reviewer #1:

      As this code was developed for use with a 4096 electrode array, it is important to be aware of double-counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas. Firstly, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code.

      We thank the reviewer for this insightful comment. We agree that signals from the same neuron may be collected by adjacent channels. To address this concern in our software, we plan to add a routine to SpikeMAP that allows users to discard nearby channels where spike count correlations exceed a pre-determined threshold. Because there is no ground truth to map individual cells to specific channels on the hd-MEA, a statistical approach is warranted.

      Secondly, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.

      This is a valid concern. To ensure that firing rates are relatively constant over the duration of a recording, we will plot average spike rates using rolling windows of a fixed duration. We expect that population firing rates will remain relatively stable across the duration of recordings.

      Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.

      We agree that further cycles of experiments could be performed with SOM, VIP, and other neuronal subtypes, and we hope that researchers will take advantage of SpikeMAP too. We will clarify this possibility in the Discussion section of the manuscript.

      Reviewer #2:

      Summary:

      While I find that the paper is nicely written and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons are interesting, spikeMAP does not seem to bring anything new to state-of-the-art solutions, and/or, at least, it would deserve to be properly benchmarked. I would suggest that the authors perform a more intensive comparison with existing spike sorters.

      We thank the reviewer for this comment. As detailed in Table 1, SpikeMAP is the only method that performs E/I sorting on large-scale multielectrodes, hence a comparison to competing methods is not currently possible. That being said, many of the pre-processing steps of SpikeMAP (Figure 1) involve methods that are already well-established in the literature and available under different packages. To highlight the contribution of our work and facilitate the adoption of SpikeMAP, we plan to provide a “modular” portion of SpikeMAP that is specialized in performing E/I sorting and can be added to the pipeline of other packages such as KiloSort more clearly.  This modularized version of the code will be shared freely along with the more complete version already available.

      Weaknesses:

      (1) The global workflow of spikeMAP, described in Figure 1, seems to be very similar to that of Hilgen et al. 2020 (10.1016/j.celrep.2017.02.038). Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters.

      We agree with the reviewers that there are indeed similarities between our work and the Hilgen et al. paper. However, while the latter employs optogenetics to stimulate neurons on a large-scale array, their technique does not specifically target inhibitory (e.g., PV) neurons as described in our work. We will clarify our paper accordingly.

      This is why, at the very least, the title of the paper is misleading, because it lets the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, with reference to spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce for me, or would deserve to be better explained (see other comments after).

      The title of our work will be edited to make it clear that while elements of the pipeline are well-established and available from other packages, we are the first to extend this pipeline to E/I sorting on large-scale arrays.

      (2) Regarding the putative location of the spikes, it has been shown that the center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods, such as monopolar triangulation or grid-based convolution, might have better performances. Can the authors comment on the choice of the Center of Mass as a unique way to triangulate the sources?

      We agree with the reviewer and will point out limits of the center-of-mass algorithm based on the article of Scopin et al (2024). Further, we will augment the existing code library to include monopolar triangulation or grid-based convolution as options available to end-users.

      (3) Still in Figure 1, I am not sure I really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What is special about the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?

      We will clarify these points. Specifically, the value of 90kHz was chosen because it provided a reasonable temporal characterization of spikes; this value, however, can be adjusted within the software based on user preference.

      (4) Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii.

      We will re-check Fig.2B which seems to have error in rendering, likely due to conversion from its original format.

      In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and does not really match state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once, and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower-dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?

      Here, the reviewer is suggesting that it may be better to perform PCA on several channels at once, since spikes can occur at several channels at the same time. To address this concern, small routine will be written allowing users to choose how many nearby channels to be selected for PCA.

      (5) About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrode? If so, this is a really strong assumption that should not be held in the context of spike sorting, because, since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration in Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines do not rely on k-means, to avoid any hard-coded number of clusters. Can the authors comment on that?

      It is true that k=2 is a pre-determined choice in our software. In practice, we found that k>2 leads to poorly defined clusters. However, we will ensure that this parameter can be adjusted in the software. Furthermore, if the user chooses not to pre-define this value, we will provide the option to use a Calinski-Harabasz criterion to select k.

      (6) I'm surprised by the linear decay of the maximal amplitude as a function of the distance from the soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the soma, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like.

      We share the reviewer’s concern and will add results that include a population of neurons to assess the robustness of this phenomenon.

      (7) In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none are mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)?

      We applied stringent criteria to exclude cells, and we will revise the main text to be clear about these criteria, which include a minimum spike rate and the use of LDA to separate out PCA clusters. For the cells that were retained, we will include SNR estimates.

      (8) Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells are higher than Excitatory ones, whilst they should be in theory.       

      We will include a comparison of firing rates for E and I neurons. It is possible that I cells are located at the border of the MEA due to the site of injections of the viral vector, and not because of an anatomical clustering of I cells per se. We will clarify the text accordingly.

      (9) For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518].

      As mentioned previously, Kilosort and related approaches do not address the problem of E/I identification (see Table 1). However, they do have pre-processing steps in common with SpikeMAP. We will add some specific comparison points – for instance, the use of k-means and PCA (which is more common across packages) and the use of cubic spline interpolation (which is less common). Further, we will provide a stand-alone E/I sorting module that can be added to the pipeline of other packages, so that users can use this functionality without having to migrate their entire analysis.

      (10) Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.

      We apologize for this issue. It seems there was a rendering problem when converting the figure from its original format. We will address this issue in the revised version of the manuscript.

      (11) I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open-access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about? Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.

      We will mention how many flashes/animals/slices were employed in the GT data and provide open access to these data.

      (12) While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rate patterns for excitatory and inhibitory cells, and thus, the authors could test how good they are in discriminating the two subtypes.

      We thank the reviewer for the suggestion that SpikeMAP could be tested on artificially generated spike trains and will add the citation of the two papers mentioned. We hope future efforts will employ SpikeMAP on both synthetic and experimental data to explore the neural dynamics of E and I neurons in healthy and pathological circuits of the brain.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Concerning the grounding in experimental phenomenology, it would be beneficial to identify specific experiments to strengthen the model. In particular, what evidence supports reversible beta cell inactivation? This could potentially be tested in mice, for instance, by using an inducible beta cell reporter, treating the animals with high glucose levels, and then measuring the phenotype of the marked cells. Such experiments, if they exist, would make the motivation for the model more compelling.

      There is some direct evidence of reversible beta cell inactivation in rodent / in vitro models. We had already mentioned this in the discussion, but we have added some text emphasizing / clarifying the role of this evidence (lines 359–362).

      Others have also argued that some analyses of insulin treatment in conventional T2D, which has a stronger effect in patients with higher glucose before treatment, provides indirect evidence of reversal of glucotoxicity. We have also mentioned this in the revised paper (lines 284–285).

      For quantitative experiments, the authors should be more specific about the features of beta cell dysfunction in KPD. Does the dysfunction manifest in fasting glucose, glycemic responses, or both? Is there a ”pre-KPD” condition? What is known about the disease’s timescale?

      The answers to some of these questions are not entirely clear—patients present with very high glucose, and thus must be treated immediately. Due to a lack of antecedent data it is not entirely clear what the pre-KPD condition is, but there is some evidence that KPD is at least not preceded by diabetes symptoms. This point is already noted in the introduction of the paper and Table 1. However, we have added a small note clarifying that this does not rule out mild hyperglycemia, as in prediabetes (and indeed, as our model might predict) (lines 76–77). Similarly, due to the necessity of immediate insulin treatment, it is not clear from existing data whether the disorder manifests more strongly in fasting glucose or glucose response, although it is likely in both. (We might infer this since continuous insulin treatment does not produce fasting hypoglycemia, and the complete lack of insulin response to glucose shortly after presentation should produce a strong effect in glycemic response.) We believe our existing description of KPD lists all of the relevant timescales, however we have also slightly clarified this description in response to the first referee’s comments (lines 66–73, 83)

      The authors should also consider whether their model could apply to other conditions besides KPD. For example, the phenomenology seems similar to the ”honeymoon” phase of T1D. Making a strong case for the model in this scenario would be fascinating.

      This is an excellent idea, which had not occurred to us. We have briefly discussed this possibility in the remission (lines 281–291), but plan to analyze it in more detail in a future manuscript.

      Reviewer #1 (Recommendations for the author):

      Whenever simulation results are presented, parameter values should be specified right there in the figure captions.

      We have added the values of glucotoxicity parameters to the caption of Figure 2. In other figures, we have explicitly mentioned which panel of Figure 2 the parameters are taken from. Description of the non-glucotoxicity parameters is a bit cumbersome (there are a lot of them, but our model of fast dynamics is slightly different from Topp et al. so it does not suffice to simply say we took their parameters) so we have referred the reader to the Materials and Methods for those.

      I was confused by the language in Figure 4. Could the authors clarify whether they argue that: (1) the observed KPD behaviour is the result of the system switching from one stable state to another when perturbed with high glucose intake? (2) the observed KPD behaviour is the result of one of the steady states disappearing with high glucose intake?

      What we mean to say is that during a period of high sugar intake or exogeneous insulin treatment, one of the fixed points is temporarily removed—it is still a fixed point of the “normal” dynamics, but not a fixed point of the dynamics with the external condition added. Since when glucose (insulin) intake is high enough, only the low (high)-β fixed point is present, under one of these conditions the dynamics flow toward that fixed point. When the external influx of glucose/insulin is turned off, both fixed points are present again—but if the dynamics have moved sufficiently far during the external forcing, the fixed point they end up in will have switched from one fixed point to the other. We have edited the text to make this clearer (lines 153–185). Do note, however, that in response to both referee’s comments (see below), Figures 3 and 4 have been replaced with more illuminating ones. This specific point is now addressed by the new Figure 3.

      The adaptation of the prefactor ’c’ was confusing to me. I think I understood it in the end, but it sounded like, ”here’s a complication, but we don’t explain it because it doesn’t really matter”. I think the authors can explain this better (or potentially leave out the complication with ’c’ altogether?).

      Indeed, the existence of an adaptation mechanism is important for our overall picture of diabetes pathogenesis, but not for many of our analyses, which assume prediabetes. Nonetheless, we agree that the current explanation of it’s role is confusing because of its vagueness. We have elaborated the explanation of the type of dynamics we assume for c, adding an equation for its dynamics to the “Model” section of the Materials and methods, explained in lines 456–465. We have also amended Figure 1 to note this compensation.

      I expect the main impact of this work will be to get clinical practitioners and biomedical researchers interested in the intermediate timescale dynamics of β-cells and take seriously the possibility that reversible inactive states might exist. But this impact will only be achieved when the results are clearly and easily understandable by an audience that is not familiar with mathematical modelling. I personally found it difficult to understand what I was supposed to see in the figures at first glance. Yes, the subtle points are indeed explained in the figure captions, but it might be advantageous to make the points visually so clear that a caption is barely needed. For example, when claiming that a change in parameters leads to bistability, why not plot the steady state values as a function of that parameter instead of showing curves from which one has to infer a steady state?

      I would advise the authors to reconsider their visual presentation by, e.g., presenting the figures to clinical practitioners or biomedical researchers with just a caption title to test whether such an audience can decipher the point of the figure! This is of course merely a personal suggestion that the authors may decide to ignore. I am making this suggestion only because I believe in the quality of this work and that improving the clarity of the figures and the ease with which one can understand the main points would potentially lead to a much larger impact on the presented results.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader had to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. (These new figures are Fig. 3–5 in the revised manuscript.)

      Could the authors explicitly point out what could be learned from their work for the clinic? At the moment treatment consists of giving insulin to patients. If I understand correctly, nothing about the current treatment would change if the model is correct. Is there maybe something more subtle that could be relevant to devising an optimal treatment for KPD patients?

      This is another very good point. We have added a new figure (Fig. 7) in our results section showing how this model, or one like it, can be analyzed to suggest an insulin treatment schedule (once parameters for an individual patient can be measured), and added some discussion of this point (lines 224–240) as well as lifestyle changes our model might suggest for KPD patients to the discussion (lines 413–425).

      Similarly, could the authors explicitly point out how their model could be experimentally tested? For example, are the functions f(G) and g(G) experimentally accessible? Related to that, presumably the shape of those functions matters to reproduce the observed behaviour. Could the authors comment on that / analyze how reproducing the observed behaviour puts constraints on the shape of the used functions and chosen parameter values?

      g(G) has not been carefully measured in cellular data, however it could be in more quantative versions of existing experiments. Further, our model indeed requires some general features for the forms of f(G) and g(G) to produce KPD-like phenomena. We have added some comment on this to the discussion section of the revised manuscript (lines 367–372).

      Could the authors explicitly spell out which parameters they think differ between individual KPD patients, and which parameters differ between KPD patients and ’regular’ type 2 diabetics?

      In general we expect all parameters should vary both among KPD patients and between KPD / “conventional” T2D. The primary parameter determining whether KPD and conventional T2D, is seen, however, is the ratio kIN/kRE. We have elaborated on both these points in the revised mansuscript. (Lines 186–192, 250–257.)

      I was confused about the timescale of remission. At one point the authors write “KPD patients can often achieve partial remission: after a few weeks or months of treatment with insulin” but later the authors state that “the duration of the remission varies from 6 months to 10 years”.

      The former timescale is the typical timescale achieve remission. After remission is reached, however, it may or may not last—patients may experience a relapse, where their condition worsens and they again require insulin. We have edited the text to clarify this distinction (lines 66–73).

      When the authors talk about intermediate timescales in the main text could they specify an actual unit of time, such as days, weeks, or months as it would relate to the rate constants in their model for those transitions?

      We have done so (lines 86–87, figure 1 caption, figure 2 caption). Getting KPD-like behavior requires (at high glucose) the deactivation process to be somewhat faster than the reactivation process, so the relevant scales are between weeks (reactivation) and days (deactivation at high G).

      The authors state ”Our simple model of β-cell adaptation also neglects the known hyperglycemiainduced leftward shift in the insulin secretion curve f(G) in Eq. (2)) ”. This seems an important consideration. Could the authors comment on why they did not model this shift, and/or explicitly discuss how including it is expected to change the model dynamics?

      We agree that this process seems potentially relevant, as it seems to happen on a relatively fast timescale compared to glucose-induced β-cell death. It is, however, not so well characterized quantitatively that including it is a simple matter of putting in known values—we would be making assumptions that would complicate the interpretation of our results.

      It is clear that this effect will need to be considered when quanitatively modelling real patient data. However, it is also straightforward to argue that this effect by itself cannot produce KPD-like symptoms, and will only tend to reduce the rate of glucotoxocity necessary to produce bibstability. We have added a discussion of this in the revisions (lines 307–315). We have also, in general, expanded the discussion of the effects that each neglected detail we have mentioned is expected to have (lines 292–315).

      The authors end with a statement that their results may “contribute to explanation of other observations that involve rapid onset or remission of diabetes-like phenomena, such as during pregnancy or for patients on very low calorie diets.” Could the authors spell out exactly how their model potentially relates to these phenomena?

      Our thinking is that, even when another direct cause, such as loss of insulin resistance, is implicated in reversal of diabetes, some portion of the effect may be explained by reversal of glucotoxicity. This is indeed at this point just a hypothesis, but we have expanded on it briefly in the revision. (Lines 281–291.)

      Minor typos:

      In Figure 2.D the last zero of 200 on the axis was cut off.

      Line 359 - there is a missing word ”in the analysis”.

      We have fixed these typos, thanks.

      Reviewer #2 (Recommendations for the author):

      The manuscript could be significantly improved in two key areas: the presentation of the analysis, and the relation with experimental phenomenology.

      Regarding the analysis presentation, the figures could be substantially enhanced with minimal effort from the authors. At present, they are sparse, lack legends, and offer only basic analysis. The authors should consider presenting, for example, a bifurcation diagram for beta cell mass and fasting glucose levels as a function of kIN, and how insulin sensitivity and average meal intake modulate this relationship. The goal should be to present clear, testable predictions in an intuitive manner. Currently, the specific testable predictions of the model are unclear.

      The response to this question is copied from the reponses to related questions from the first referee.

      This is a very good point. We have made several changes. Firstly, we have added smaller panels showing the dynamics of β to Figure 2; previously, the reader thad to infer what was happening to β from G(t). Secondly, we have completely replaced the two figures showing dβ/dt, and requiring the reader to infer the fixed points of β, with bifurcation diagrams that simply show the fixed points of G and β. The new figures show through bifurcation diagrams how there are multiple fixed points in KPD, how glucose or insulin treatment force the switching of fixed points, and how the presence of bistability depends on the rate of glucotoxicity. We have also supplemented our phase diagram that shows the effects of SI and the total beta cell population with bifurcation diagrams showing β as SI and βTOT are varied. (These new figures are Fig. 3–5 in the present manuscript.) Finally, we have added another figure analyzing the model’s predictions for the optimal insulin treatment and the resulting time needed to achieve remission (Fig. 7)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Firstly, we would like to thank the reviewers for their time and efforts in critiquing this paper. The reviewers addressed our study to be significant, but also presented great suggestions to improve our manuscript, mainly the comparison of mRNA and eRNA for predicting subtype specificity and prognosis, the integration with independent validation datasets, etc. Our preliminary analyses showed that our classified mRNAs can predict subtypes better which is not surprising, as these subtypes were initially discovered using mRNA differences. Hence, we employed a novel approach of associating these classified mRNA and eRNA with distance and identified 71% classified eRNAs are associated with classified mRNAs. We also propose to integrate the datasets with PEGS (Briggs et al 2021) to achieve better mRNA-eRNA association and Perturb-seq validated regions to achieve functional validation of the eRNA loci. We believe that our potential improved integrative analyses will improve the novelty and power of our findings, as this is an unique approach which is employed in patient samples-based high resolution eRNA atlas for the first time. We have addressed most of the other major and minor comments of the reviewers and have provided the preliminary revised manuscript.

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary<br /> This study assesses eRNA activity as a classifier of different subtypes of breast cancer and as a prognosis tool. The authors take advantage of previously published RNA-seq data from human breast cancer samples and assess it more deeply, considering the cancer subtype of the patient. They then apply two machine learning approaches to find which eRNAs can classify the different breast cancer subtypes. While they do not find any eRNA that helps distinguish ductal vs. lobular breast cancers, their approach helps identify eRNAs that distinguish luminal A, B, basal and Her2+ cancers. They also use motif enrichment analysis and ChIP-seq datasets to characterize the eRNA regions further. Through this analysis, they observe that those eRNAs where ER binds strongest are associated with a poor patient prognosis.

      Major comments:

      Part of the rationale for this study is the previous observation that eRNAs are less associated with the prognosis of breast cancer patients in comparison to mRNAs and they claim that the high heterogeneity between breast cancer subtypes would mask the importance of eRNAs. In this study, the authors solely focus on eRNAs as a classification of breast cancer subtypes and prognostic tool and do not answer whether eRNAs or mRNAs are a better predictor of cancer subtypes and of prognosis. Since the answer and the tools are already in their hands, it would be important to also see a comparative analysis where they assess which of the two (mRNAs or eRNAs) is a better predictor.

      Response: We appreciate the reviewer for this valid point about comparing the prognostic eRNAs vs mRNAs. Our study doesn’t imply that eRNA markers are better than mRNAs in predicting subtype specificity and/or prognosis, but our motivation for working with eRNAs is that they can be used to define relevant transcriptional regulators and prognosis generally if they are subtyped. As the molecular subtypes in breast cancers were established using gene expression datasets, mRNAs would perform better as predictors of subtypes and or prognosis. However, identifying regulatory networks with emphasis on transcription factor binding motif analyses is not achievable using mRNA datasets. Analysing the active enhancer regions with eRNA transcription will provide high resolution landscape of TF and epigenetic networks. These sorts of analyses usually require ATAC-seq or H3K27ac datasets, but these assays need fresh frozen tissue material and laborious experimental designs compared to RNA-seq datasets. Furthermore, eRNA-transcribing enhancers represent highly active enhancers, while ATAC and H3K27ac datasets can identify all enhancers, which can be inactive or poised, but captured due to the dynamic nature of enhancers. We demonstrate that traditional RNA-seq datasets mapped on active enhancer regions showing eRNA transcription would be sufficient to identify the highly active TF network and gene-enhancer regulatory frameworks in a subtype-specific manner, hence emphasising the potential of eRNA studies.

      Hence, the scope of our study is not to establish which RNA can predict subtype and survival, but to demonstrate the potential of studying eRNAs in patient samples using traditional RNA-seq assays. This study would be beneficial for epigenetics biologists of how enhancer transcription can be associated with gene regulation through deregulated transcription factor networks in patients. The above section had been included in the discussion in the revised manuscript.

      As the comparative analyses suggested by the reviewer will substantiate the potential of eRNAs being studied as cancer prognostic markers, we performed identical methodologies with our machine learning approaches on the published TCGA mRNA-seq datasets, identify the subtype-specific mRNAs as well as prognostic mRNAs and perform the comparative analyses of eRNAs and mRNAs. As we expected, mRNAs indeed perform better in associating with subtype specificity than eRNAs as we could identify more subtype-specific mRNAs with better statistics metrics. The results exhibit great separation across subtypes (Basal, Her2, LumA/B) as well as Ductal vs Lobular.

      We believe that eRNA and mRNA are complementary but not comparative to predict subtype-specific survival. To address this in the revised manuscript, we performed an initial selection of the eRNAs associated with their corresponding subtype-specific mRNAs within 50 kb distance which can be integrated with the above analyses, based on the suggestion from reviewer 3. In our preliminary analysis, around 71% of eRNAs are associated with the subtype-specific mRNAs and we also observed an observable separation of ductal and lobular subtypes using this method.

      Furthermore, we integrated our enhancer RNAs with the key enhancer regions which show significant impact on gene transcription, as shown in single cell CRISPRi screens (Perturb-seq) datasets derived from ATAC-matched H3K27ac datasets verified on one ER+ and one ER- breast cancer cell lines (Wang et al., Genome Biology 2025, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03474-0) . Our initial analyses identified at least 29 regions from the Perturb-seq datasets overlapping with 72 and 5 eRNAs of subtype classification and Her2 survival respectively.

      For the revised manuscript, we will perform the mRNA-eRNA association in a detailed manner and include the data. We will also employ our well-established tool for associating mRNAs and noncoding elements, Peak set Enrichment in Gene Sets (PEGS, Briggs et al., F1000 research, 2021 https://f1000research.com/articles/10-570/v2 ). We hypothesise that this will improve the power of the classification models used in the study and will also provide gene-enhancer RNA interaction landscape in patient samples for the first time. Furthermore, we will integrate the activity of these eRNA-mRNA pairs with chromatin accessibility and enhancer activity using ATAC-seq and H3K27ac ChIP-seq datasets to establish more robust active regulatory networks in patient samples. We will also perform motif analyses on the published ATAC-seq peaks (performed on TCGA-BRCA patient samples, Corces et al., 2018) close to the eRNA loci to identify the TF networks with better precision, hopefully unravelling novel and relevant subtype-specific TFs in an efficient manner, better than our original work. Furthermore, as an experimental functional validation of our classified eRNAs, we will investigate the regulatory effect of 29 Perturb-seq overlapped regions. Hence, our revised manuscript will potentially provide a comprehensive validated list of enhancer RNA regions which are highly active, actively transcribing, subtype and survival specific regulatory networks in breast cancer patients for the first time.

      The authors run the umaps of Fig. 1C only taking the predictor eRNAs. It is then somewhat expected to observe a separation. Coming from a single-cell omics field, what I would suggest is to take the eRNA loci and compute a umap with the highly variable regions, perform clustering on it and assess how the cancer subtypes are structured within the data. This would give a first overview of how much segregation and structure one can have with this data. Having a first step of data exploration would also strengthen the paper. If the authors have tried it, could the authors comment on it?

      Response: We appreciate the reviewer for sharing their experience from single cell omics analysis. In our case, following the scRNA like pipeline is not appropriate, given the focus of our study on identifying markers on the already annotated subtypes. Basically, we aim to assess the quality of the identified markers (the quality is quantified by the statistics provided for random forest classification), and we see that the data is well-separated in PCA using only PC1 and PC2. We showed the umap (using PC1 and PC2) for better visualization in the original manuscript and we included the PCA plots in the revised manuscript.

      'neither measures could classify any distinct eRNAs for invasive ductal vs lobular cancer samples' S1B. Just by eye, I can see a potential enrichment of ductal on the left and on the right while lobular stays in the center. This suggests to me that, while perhaps each eRNA alone does not have the power to classify the lobular vs ductal subtype, perhaps there is a difference - which could result from a cooperative model of eRNA influence - that would need further exploration. Would a PCA also show enrichments of ductal vs. lobular in specific parts of the plot? It may be worth exploring the PC loadings to see which eRNAs could play an influence. In this regard, a more unbiased visual examination, as suggested in my previous point, could help clarify whether there could be an association of certain eRNAs that cannot be captured by ML.

      Response: The subtypes of cancer patients (Basal, Her2, LumA/B) possess clear differences in mRNA expression in breast cancer studies. Given the fixed annotations of the subtypes in the patient datasets, we applied our methodologies on mRNA datasets, and the results exhibited great separation across subtypes (Basal, Her2, LumA/B) as well as Ductal vs Lobular. In addition, 70% of subtype-specific eRNAs are located next to mRNA. This ensures that we detected proper eRNA markers. Furthermore, Random Forest is the standard and powerful non-linear classifier for these types of classifying questions. Therefore, we hypothesized that the data which can distinguish Ductal vs Lobular does not exist in the used eRNA dataset. We only detected 38 subtype-specific mRNAs using information gain with standard cutoff 0.05 which they have classifying power across ductal-lobular. With this standard cutoff only one eRNA-associated gene was detected. To explore more, we used low cutoff for information gain (0.01) and then took only the eRNAs which are located near classified mRNAs (up to 50KB). In this way, we detected 96 eRNA candidates linked to 8 classified mRNAs. These 96 eRNAs could, to some extent, classify ductal vs lobular (PCA plots attached above). This observation can further verify that if a more comprehensive eRNA dataset exists, we could detect better eRNA markers and cover more (probably all) mRNA markers. Hence, cooperative model of eRNA as suggested by the reviewer can't be achieved and random forest is one of the efficient tools to decipher the cooperation if it exists. Besides, as we demonstrated in this paper that eRNA is a complementary dataset to mRNA which can assist in the identification of regulatory networks. For the revision, we will provide more detailed eRNA-mRNA associations using integration with PEGS and Perturb-seq validated regions, in both subtype classification and survival and will motivate the potential similar studies for ductal vs lobular in the discussion.

      "we employed machine learning approaches on 302,951 eRNA loci identified from RNA-seq datasets from 1,095 breast cancer patient samples from previous studies" - the previous studies from which the authors take the data [11,12] highlight the presence of ~60K enhancers in the human genome and they use less than that in their analysis. Could the authors please clarify the differences in numbers with previous studies and give a reasoning?

      Response: ~300K enhancers are derived from ENCODE H3K27ac datasets which represents all active enhancer regions marked by H3K27ac (Hnisz et al., 2013). This is a high-resolution map of eRNA loci ever presented. In Chen et al 2020, 1,531 superenhancers representing 30K eRNA loci was utilised for exploratory analysis, and the findings were generalised back to the 300K set. 65K enhancer loci covers tissue-specific enhancers initially identified by FANTOM CAGE datasets and this subset provide limited regions of eRNA expression. Hence, our analyses on ~300K eRNA loci provide unbiased information on subtype specificity and gene-TF regulatory networks. The differences had been highlighted in the methods and results in the revised manuscript.

      Also, from the methods section, they discard many patient samples due to low QC, so, from what I understand, the number of samples analyzed in the end is 975 and not 1,095.

      Response: We thank the reviewer for pointing this out and we have updated the numbers in the revised manuscript.

      Minor comments:

      Can the authors please state the parameters of the umap in methods? Although it could be intrinsic to the dataset, data points are grouped in a way that makes me think that the granularity is too forced. Could the authors please show how the umap would behave with more lenient parameters? Or even with PCA?

      Response: We used ‘umap’ function from umap package (with default parameters) in R using only PC1 and PC2, hence the granularity is not forced. As suggested by the reviewer, we have now added PCA plots in the main figures (Fig. 1E) and moved all the umap plots to the Supplementary figures (Fig.S1B) in the revised manuscript.

      'Majority of the basal' -> The majority of the basal.

      Response: We thank the reviewers for noticing the typo and we corrected this in the revised manuscript.

      Significance

      This is a paper relevant in the cancer field, particularly for breast cancer research. The significance of the paper lies in digging into the breast cancer samples, taking the different existing subtypes into account to assess the contribution of eRNAs as a classifier and as a prognostic tool. The data is already available but it has not been studied to this degree of detail. It highlights the importance of characterizing cancer samples in more depth, considering its intrinsic heterogeneity, as averaging across different subtypes would mask biology. My expertise lies in gene regulation and single-cell omics. My contribution will therefore be more focused on the analysis and extraction of biological information. The extent of its specific relevance in cancer research falls beyond my expertise.

      Response: We appreciate the reviewer for understanding our efforts to bring out the importance of subtyping and to explore the association of eRNA in breast cancer transcriptional gene regulatory networks.

      Reviewer #2

      Evidence, reproducibility and clarity

      Summary<br /> Enhancer RNAs (eRNAs) are early indicators of transcription factor (TF) activity and can identify distinct molecular subtypes and pathological outcomes in breast cancer. In this study, Patel et al. analysed 302,951 polyadenylated eRNA loci from 1,095 breast cancer patients using RNA-seq data, applying machine learning (ML) to classify eRNAs associated with specific molecular subtypes and survival. They discovered subtype-specific eRNAs that implicate both established and novel regulatory pathways and TFs, as well as prognostic eRNAs -specifically, LumA and HER2-survival- that distinguish favorable from poor survival outcomes. Overall, this ML-based approach illustrates how eRNAs reveal the molecular grammar and pathological implications underlying breast cancer heterogeneity.

      Major comments

      1. The authors define 302,951 eRNA loci based on RNA-seq data, yet it is widely known that many enhancers reside in proximity to promoters or within intronic regions (examples presented in Fig. 3B and S3). Consequently, it seems likely that reads mapped to these regions might not truly represent eRNA signals but include mRNA contamination. Could the authors clarify how they ensured that the identified eRNAs were not confounded by mRNA reads? What fraction of these enhancer loci is promoter proximal or intronic? How does H3K4me3, a well-established and standardized active promoter histone mark, behave on these loci? The reviewer considers it important to confirm that the identified eRNAs are indeed of enhancer origin rather than promoter transcripts.

      Response: For this study, we utilised pan cancer atlas-based published work (Chen et al 2018 and 2020) where the abundant RNA signals on intronic and intergenic regions are included, and promoter-based signals are excluded. These studies utilise the advantage of identifying eRNAs on large sample size and the possibility of mRNA being on introns in 1000s of patient samples is very low. A clarification of this concern had been discussed in the Introduction of these studies as follows: “because eRNA reads associated with real enhancer activity recurrently accumulate, whereas background transcription noise tends to occur stochastically. The large number of RNA-seq reads obtained would compensate for the statistical power compromised by the low eRNA expression level typically observed in a single sample.” We included clarification of this concern in the discussion. Furthermore, as per the reviewer’s suggestion, we examined the distribution of the eRNA loci across the genome and found that majority of eRNA regions are located on introns and intergenic regions. This figure had been included in the Supplementary Fig. S6A.

      2. In Fig. 1B, the F measure (0.540) of the Basal subtype using the Logmc method contradicts its extremely high precision (1.000) and sensitivity (0.890). The authors need to clarify the exact formula or method used to compute F1 and the discrepancy in the reported metrics for this subtype and perhaps other subtypes as well.

      Response: We apologise for the mistake in this section and thank the reviewer for pointing this out. We included the formulas for each statistical metric in the method section of the manuscript. The F-measure was mentioned wrong which led to the confusion here. The figure had been corrected with the F-measure of 0.94 in the revised manuscript.

      3. As shown in Fig. 4C, S4B, and most, if not all, tracks of Fig. S3, ER binding regions are not annotated as eRNA loci. It seems, in this reviewer's opinion, very unlikely that this is because they generally lack eRNA expression, but rather they do not express polyadenylated eRNA (typically 1D eRNA), which is captured in this dataset. The reviewer posits that these enhancers produce more transient, non-polyadenylated 2D eRNA. It has been widely documented in prior studies that ER-bound enhancers exhibit bimodal eRNA expression patterns [e.g., Li, W. et al. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498, 516-520 (2013)]. Could the authors address this opinion and elaborate on how the restriction to polyadenylated transcripts might underrepresent enhancers regulated by ER and other TFs and whether this bias impacts the overall findings?

      Response: The authors appreciate the reviewer’s suggestion to address the caveats of using polyadenylated eRNAs to identify the ER binding patterns. TCGA eRNA atlas with polyadenylated eRNAs indeed possesses this disadvantage of using polyadenylated eRNAs for this study, however currently there are no data available with bidirectional transcripts in any breast cancer patient samples. The tools to profile these RNAs are not robust enough to be performed on frozen cancer tissue samples which are extremely limited in their size and availability. By utilising the polyadenylated eRNA-seq datasets, we might not only lose the accuracy of ER binding patterns, but also for other transcription factors which activate/associate with bimodal expression around enhancers. However, our integrative analysis on stable polyadenylated eRNA loci can still identify the most-relevant TF networks of each subtype.

      Furthermore, we validated this finding by analysing our own datasets of KAS-seq which represents any active transcribing bidirectional enhancers from MCF7 cell line. Independently, we also incorporated ATAC-seq, H3K27ac ChIP-seq, CAGE and GRO-seq data on the gene profiles in Fig. S3 to associate the eRNA regions identified in polyadenylated RNA datasets with ER binding sites in patients and published bidirectional transcripts in the preliminarily revised manuscript. We observed that all the ER binding sites are accompanied by open and active enhancer marks with bidirectional transcription (either GRO- or CAGE positive) but they are not on the exact location of eRNA regions. Subtype-specific eRNA regions close to genes like MLPH and XBP1 possess both active bidirectional transcribing ER bound sites far away (around 1.5 kb) from subtype-specific eRNA loci and bidirectional transcribing ER unbound sites. However, these distal ER binding sites are close to the regions from the list of 300K eRNA loci and they were simply not identified as subtype-specific regions. Hence, it can be true that the occupancy of ER might not be present on all subtype-specific eRNA loci, but our subtype-specific eRNA sites are representative of bidirectional transcription.

      Upon the suggestion from the reviewer, we discussed the potential of identifying TF networks by analysing the 1D eRNAs, in the revised manuscript.

      4. Despite the unsatisfied performance of the ML approach on classifying Her2 subtypes, the hierarchical clustering performed in Fig. 2A and S2A appears to show a reasonable separation of Her2 subtypes, showing as a clustered green band. Could the authors quantitatively assess how effective this clustering results and compare that to the ML outcome? (OPTIONAL)

      Response: The authors acknowledge this interpretation from the reviewers. Using both the measures, our ML platform can identify markers for Her2 subtype but some of the statistical metrics are poor. As the heatmaps were performed based on these identified Her2 markers, a separate analysis on this cluster would not be much informative. The poor metrics for Her2 classification was already justified, partly due to the low number of Her2+ patients in the cohort.

      5. In Fig. 4 and S4, the authors reported to have enriched binding or motif of TFs, e.g., FOXA1, AP-2, and E2A, specifically at enhancer loci with low eRNA level, which conflicts with their established roles as transcriptional activators. The reviewer asks for an address as to why these factors would be associated with basal low-eRNA regions and whether any additional data might clarify their functional role in these contexts.

      Response: The authors appreciate the reviewer’s concern, but we would like to clarify that eRNAs which are less expressed in basal subtype are classified as basal low. These regions show high expression in luminal patients. Hence, there is a strong overlap of basal low and luminal high regions. FOXA1 and AP2 factors are strongly established coactivators in luminal ER+ transcriptional signaling, hence they are associated with basal low eRNA regions. We clarified this in the discussion and provided more literature evidence in the revised manuscript to demonstrate the strong role of FOXA1 and AP2 factors in ER+ luminal breast cancer transcriptional response.

      6. Regarding Fig. 4B, the authors state that "ER binding occupies only the strongest ssDNA and GRO-seq-positive sites". Firstly, the GRO-seq data quality is poor with indiscernible peaks. This may be insufficient for a qualified representation of nascent eRNA expression. More importantly, it appears each heatmap is ranked independently, so top loci for ssDNA are not necessarily top loci for GRO-seq, ER, Pol-II, or H3K27ac. The reviewer requests clarification on how the authors plot these heatmaps and questions whether the statement is supported by the analysis as presented.

      Response: We acknowledge the reviewer’s concern and based on their suggestion, we utilised another set of GRO-seq datasets which is more deeply sequenced and published by the same lab. The average plot from these new datasets showed better profile. We also apologize for not providing enough details of how we generated the heatmaps in Fig. 4B. The heatmaps were made separately for each profile to auto scale with their own intensity levels but the order of the regions is based on KAS-seq intensity. The order of these regions was kept the same between each profile. Hence, top loci of ssDNA are not exact top loci of GRO, ER, H3K27ac and Polymerase but top loci of ssDNA also show similar high intensity in GRO, ER, H3K27ac and Polymerase, hence correlated. We also removed regions which belong to blacklisted regions of hg38 and the regions which were over-sequenced due to amplifications and showed weird signals. We provided the new heatmaps and profile plots in the revised manuscript with different clusters of KAS-seq intensity. We also updated the methods section to clarify how these heatmaps were made.

      7. In Fig. S4B and the third plot of 4C, the averaged histogram of ER binding appears in multiple sharp peaks with drastic asymmetric positioning around the enhancer centre, which is highly atypical of most published ER ChIP-seq profiles. Could the authors discuss possible "spatial syntax" or directional patterns of ER binding in relation to eRNA loci and cite any literature showing a similar pattern? Further evidence is required to substantiate these observations, as they are remarkably unique.

      Response: The authors agree with the reviewer’s point about asymmetric peaks of ER on the luminal specific eRNA regions. Due to the nature of the average profile plots and the number of regions explored here are so low, the profiles look asymmetrical and different than the published literature. Heatmaps lose their resolution when made on a very low number of regions. The focus of this analysis is to highlight that the ER is not binding to the centre of eRNA loci which is contradictory to the published findings from in vitro studies, but further away on these subtype-specific regions. We don’t have any solid evidence to demonstrate the directional patterns of ER binding related to this data. To avoid any confusion, we removed these average plots but focused on the already existing single gene profiles in Fig. S3 and discussed our interpretations in detail.

      Minor comments<br /> 1. When introducing eRNAs, the reviewer recommends mentioning that 1) eRNA levels correlate with enhancer activity and 2) eRNA expression precedes target gene transcription, thus reflecting upstream regulatory events. Relevant references include: Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010-1014 (2015); Carullo, N. V. N. et al. Enhancer RNAs predict enhancer-gene regulatory links and are critical for enhancer function in neuronal systems. Nucleic Acids Res. 48, 9550-9570 (2020); Kaikkonen, Minna U. et al. Remodeling of the Enhancer Landscape during Macrophage Activation Is Coupled to Enhancer Transcription. Mol. Cell 51, 310-325 (2013).

      Response: These are great recommendations from the reviewer, and we included the suggested publications in the Introduction section of the revised manuscript.

      2. H3K27ac is used initially to define these regulatory loci, and like eRNAs, H3K27ac also varies among patients. Which H3K27ac dataset(s) were used initially, and could this approach potentially overlook patient-specific enhancers? (OPTIONAL)

      Response: This is a totally valid point from the reviewer. The idea of this project is to define common subtype-specific enhancers which can be regulatory and prognostic, hence can be developed further as biomarkers providing benefit for more patients in the future. Hence, investigating the common enhancers which are activated in multiple normal and cancer cell lines defined by ENCODE is more valid than patient-specific enhancers whose activity might be influenced by specific genetic alterations. There is very limited availability of H3K27ac ChIP-seq datasets from cancer patients to explore the patient-specific enhancers, and our analyses were totally based on the published work, hence not possible to fully address this concern. The source of the H3K27ac ENCODE datasets (from 86 human cell lines and tissue samples) is clarified in the revised manuscript.

      3. In addition to the overall metrics displayed in Fig. 2B, could the authors provide precision and sensitivity values for LumA and LumB separately under the Logmc method, given the observation in Fig. 2E that LumA and LumB are not well separated in the UMAP projection?

      Response: The authors appreciate the suggestion from the reviewer. We have included the metrics separately for LumA and LumB in the revised manuscript in Fig. S1D.

      4. Could the author elaborate, in the discussion section, on why there is a substantial difference in ML performance depending on whether InfoGain or Logmc is used?

      Response: We have included the following text in the discussion to explain the differences between these two measures.

      “InfoGain measure work with the approach of binarization with k-means (k=2). It has the potential to capture both strongly expressed eRNAs which are differential between subtypes as well as low expressed sparser on and off eRNAs. In the first case, although eRNA is highly expressed in all patients, the higher expression mode becomes 1 and the lower expressed mode become 0. However, in case of low expression, more on and off expression, recentered logmc would not generate a striking high value. Furthermore, binarization is also a strong process to perform better clustering and classification, as distinguishing between data points gets better and clearer. “

      5. How does the expression pattern of Basal high, Basal low, Her2, and Lum eRNA clusters behave differentially in Basal, Her2, and LumA/B subtypes? Are Basal high eRNAs downregulated in Her2 or Lum subtypes, and vice versa? Since many downstream analyses rely on these eRNA clusters, it is suggested to include a heatmap and/or boxplot that displays how each eRNA category is expressed in each subtype to confirm that these definitions are consistent.

      Response: We thank the reviewers for this suggestion and apologise for not providing enough clarification on the expression of eRNAs in other subtypes. Indeed, Basal high expressed eRNA are expressed low in LumA and LumB and Basal low expressed eRNAs are expressed higher in lumA and lumB. Her2 subtype-specific eRNAs has a trend of expression between Basal and Lum, as it can be seen in the umap and PCA. Basically, the Basal high expressed eRNAs are Lum lower expressed eRNAs, and the Basal low expressed markers are Lum higher expressed markers. As per the suggestion from the reviewer, we provided heatmaps on eRNA expression of each subtype-specific with regulation in other subtype patients in figure S2F-K.

      Referee cross-commenting

      I share Reviewer #1's opinion that the manuscript should assess whether mRNA or eRNA is the stronger predictor of breast cancer subtypes and clinical outcomes. It will greatly improve the novelty if eRNA is shown to be a better indicator for cancer characterization.

      Also, I strongly concur with Reviewer #3 that the current informatics approach is superficial and that several conclusions are contentious. The authors need to resolve the inconsistencies in their ML statistics and the potentially misleading interpretations of the ChIPseq and motif enrichment results.

      It is further recommended that, building Reviewer #3's comment, the study integrate eRNA signatures with their proximal genes to address 1) whether genes located near these enhancers are differentially expressed-and correlated with enhancer activity-across cancer subtypes, and 2) whether it provides insights into understanding the enhancer-gene regulatory architecture in a subtype-specific context.

      Response: We thank reviewer 2 for cross-commenting on reviewer 1 and 3’s suggestions. Indeed, these are interesting points to cover and will increase the novelty of the study. Based upon these suggestions and discussed earlier for reviewer 1’s comments, we will explore the comparison of mRNAs vs eRNAs as predictor of cancer subtypes and prognosis and the association of genes-eRNAs in cis as discussed in other reviewer’s comments. Our preliminary analyses show a strong association of eRNA and mRNA specific to subtypes and an observable separation on subtypes which were harder to classify markers using eRNAs alone. Hence, we will improve these analyses, and the manuscript further as discussed above in the final revision.

      Significance

      General Assessment

      This study provides insights into the potential use of eRNA to classify breast cancer subtypes and refine prognostic markers. A strength is the integration of large-scale RNA-seq data with machine learning to identify eRNA signatures in biologically-meaningful patient samples, revealing both established and novel TF networks. The study also discovered eRNA clusters that correlate with the survival of patients, thus providing strong clinical implications. However, the ML approach yields several inconsistencies-for instance, unsatisfactory classification results for the Her2 subtype as well as the confused statistical metrics in the results. Furthermore, the ML model struggles to differentiate more nuanced molecular classes (e.g., LumA vs. LumB) and higher-level histological subtypes (e.g., lobular vs. ductal), thus limiting its power to dissect more delicate pathological and molecular mechanisms. Another limitation worth noting of this ML approach is the exclusive use of only polyadenylated eRNAs via RNA-seq, which excludes perhaps the more prominent 2D eRNA expressed in regulatory enhancers. Moreover, certain datasets appear to be of suboptimal quality, leading to assertions that would benefit from additional supporting evidence. Altogether, while the study offers a promising angle on eRNA-based tumor stratification, more robust experimental validations are needed to resolve inconsistencies and clarify the mechanistic underpinnings.

      Advance<br /> Conceptually, the study highlights the potential for eRNA-based signatures to capture regulatory variation beyond classical markers. However, the utility of these signatures is constrained by the focus on polyadenylated transcripts alone, likely underrepresenting key enhancer regions, and certain evidence presented in this study is not substantial enough to support some statements. While the work adds an important dimension to the understanding of enhancer biology in breast cancer, the resulting insights are partly hampered by limitations in data coverage and quality.

      Audience<br /> The primary audience includes cancer epigenetics, functional genomics, and bioinformatics researchers who are interested in leveraging eRNAs as biomarkers and dissecting complex regulatory networks in breast cancer. Clinically oriented scientists focusing on molecular diagnostics may also find relevance in the authors' approach to stratify subtypes and outcomes. The research is most relevant to a specialized audience within basic and translational cancer genomics, as well as computational biology groups interested in eRNA analysis.

      Field of Expertise

      I evaluate this manuscript as a researcher specializing in cancer epigenetics, functional genomics, and NGS-based data analysis. Parts of the manuscript touching on clinical outcome measures may require additional review from practicing oncologists.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study aims to classify prognostic and subtype-specific eRNAs in breast cancer, highlighting their potential as biomarkers.<br /> Data was analysed using existing machine learning algorithms,<br /> Data analysis is superficial and it is hard to understand the key significant findings.

      This is an important topic and a highly relevant approach to identifying RNA-based biomarkers.<br /> They analyse published RNAseq datasets by focusing on molecular subtype-specific eRNAs, enhancing clinical relevance and thereby addressing the heterogeneity of the cancer type (strength of the study).

      Weaknesses include: Most of the findings are purely correlation-based and also based on a reanalysis of published datasets; it would benefit from experimental validation to support their findings. Differential expression analysis of large datasets likely yields some differences in the transcriptome. How significant are these changes?<br /> Does the expression of eRNAs affect the expression of genes in cis? Although this analysis would provide some associated gene expression differences, it can also provide some insights into subtype-specific differences in gene expression programs.<br /> If the authors find experimental validations are not feasible, I recommend validating the eRNA signature in an independent dataset.

      Response: We acknowledge the weaknesses noticed by the reviewer from this study about the correlation-based analyses of published datasets. While the TCGA eRNA atlas datasets are reanalysed, these are the high-resolution maps ever published on eRNA expression on cancer patient samples, and our study is the first to establish the subtype specific classification of eRNAs. We believe that the eRNAs are biologically relevant, as they are strongly associated with the subtype-specific pathways and epigenetic regulators. Upon suggestion from the reviewers, we will explore the association of mRNAs and eRNAs in cis to establish further significance and relevance of the eRNAs we identified (discussed earlier in reviewer 1 comments).

      We would like to focus on studying the functional relevance of eRNAs as a separate project. In vitro studies to establish the knockdown of eRNAs are not straightforward due to the toxicity and non-specific targeting of the locked nucleic acids approach or Cas13-based RNA targeting. siRNA-based approaches don't target the nuclear eRNAs effectively, even though they were widely used by other labs to target eRNAs. Hence, a lot of effort on optimisations are needed to establish functional validation of our eRNAs, hence not under the scope and time frame of this study/revision. To provide validation and significance using independent datasets, we will explore the association of these factors with the expression of subtype-specific eRNAs further in our final revised manuscript using the tools explained above for reviewer 1 (PEGS and Perturb-seq integration). Integration of our classified eRNAs with the published Perturb-seq validated regions from ER+ and ER- breast cancer cell lines will provide the functional validation of patient-associated classified enhancer/eRNAs. Hence, our study would be the first to demonstrate the validated gene-enhancer regulatory networks from breast cancer patient datasets.

      Furthermore, we included the single gene visualisation profiles of independent datasets of ER ChIP-seq from different patients (Ross-Innes et al., 2012), ATAC-seq from TCGA patients (Corces et al., 2018), H3K27ac ChIP-seq datasets from cell lines (Theodorou et al., 2013 and Hickey et al., 2021) and GRO-seq and CAGE data published in MCF7 cells close to the eRNA regions and discussed their overlap with the eRNA regions in the revised manuscript. In the final revision, we will perform further detailed integration of all these profiles. Overall, our study will provide the integratory analysis of various independent epigenetic and functional profiles to validate our classified subtype and survival-specific eRNA regions.

      Here are major points; addressing these points in the revised version is important.

      From Figure 1B, what eRNAs were identified for LumB using log2MC?

      Response: The authors acknowledge the lack of analyses on LumB eRNAs in the original version of the manuscript. In the final revised manuscript after associating with mRNAs, we will provide the heatmaps, pathway analyses and other functional annotations for LumB specific eRNAs.

      Page 8 However, sensitivity and F-measure .... It would help to include the metrics for the number of patients in each subtype. The ratio of eRNAs/number of cases in each subtype would inform if the number of eRNAs is an outcome of no. of cases or subgroup-specific.

      Response: This is a great suggestion from the reviewer, and we included the number of patients for each subtype in the table in Fig. 1D. We observed that the basal patients are low in number, but we identified more basal eRNAs. Hence, the number of eRNAs identified in subtype-specific manner is not correlated to the number of patients in the cohort.

      Page 9 "Altogether, both measurements classify eRNAs efficiently based on subtypes, InfoGain allowed us to distinguish further samples based on high and low expression of eRNAs for basal subtype and performed better in statistical metrics" Based on statistical metrics, both models seem to be performing similarly except for Her2.

      Response: We apologise for this wrong interpretation. We corrected this in the revised manuscript at page 9.

      In Fig. 1B, the F-measure metrics are wrong for basal LogMC, as it is 0.94 rather than 0.54, which could lead to a misinterpretation of the model.

      Response: We apologise for the mistake in this figure, and we included the corrected heatmap in the revised manuscript.

      Many genome browser figures, including Figure S3. TFBS is not at the same site as eRNAs detected. Is there CAGE data to show that binding these TFs at these sites leads to the expression of eRNAs? That will give direct evidence that the eRNAs are transcribed due to these TFs

      Response: This is a great suggestion from the reviewer. We incorporated ATAC-seq, H3K27ac ChIP-seq, CAGE and GRO-seq data on the gene profiles in Fig. S3 to validate the activity of these ER binding sites in the preliminarily revised manuscript. We observed that all the ER binding sites are accompanied by open and active enhancer marks with bidirectional transcription (either GRO- or CAGE positive) but they are not on the exact location of eRNA regions (250-1000 bps away from the centre of ER binding site). Subtype-specific eRNA regions close to genes like MLPH and XBP1 possess active bidirectional transcribing ER binding sites far away from subtype-specific eRNA loci and also ER unbound sites. However, these distal ER binding sites are close to the regions from the list of 300K eRNA loci and they were simply not identified as subtype-specific regions.

      Page 10, There were 30 Her2-specific eRNA regions.... Do the same enhancers also regulate these genes as those from which eRNAs are transcribed? Is it cis-effect, or could these affect the trans-regulating of other genes?

      Response: We acknowledge the concern from the reviewer, however this is hard to be validated, as functional experiments to explore the 3D interactions of enhancers and gene promoters are not robust enough to be performed in patient samples and can't be performed within the revision time frame. In the final revised manuscript, we will explore the association of enhancers and promoters of ERBB2 with PEGS association as discussed above and with available HiC datasets in Her2+ cell lines (HCC1954, GSE167150, Kim et al., 2022 https://pubmed.ncbi.nlm.nih.gov/35513575/ )

      Minor comments:

      Page 8 "InfoGain meausure..." Fig. S2A also shows high and low expressed eRNAs for the basal group

      Response: We apologise for the lack of clarity here. InfoGain measure identifies both high and low expressed eRNAs in all patients showing similar pattern of regulation among patients. However, logmc derived eRNAs are highly expressed in most patients. Low expressed eRNAs could not be identified in logmc measure as strong as InfoGain regions. The text in the results had been edited in the revised manuscript to reflect better clarity on this point.

      Page 11, Our analyses also identified the role of another..... The statement is misleading as it is the enrichment of these TFs with the eRNAs<br /> Response: We included the word “enrichment” to clarify this statement.

      Page 13, "Around 90% of eRNAs are bidirectional and non-polyadenylated [53]. TCGA expression datasets are based on RNA-seq assays, which capture only non-polyadenylated RNAs. Thus, analysing the expression of eRNAs on mRNA-seq datasets might not be adequate". It is very confusing, please check<br /> Response: We apologise for the mistake, and this has been corrected in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      This is an important topic and a highly relevant approach to identifying RNA-based biomarkers.<br /> They analyse published RNAseq datasets by focusing on molecular subtype-specific eRNAs, enhancing clinical relevance and thereby addressing the heterogeneity of the cancer type (strength of the study).

    1. In both differentiated instruction and arts integration, the classroom’s physical environment is flexible. In arts integration, furniture is moved to allow for movement, theatrical or dance improvisations, or for various groupings. Students carry out routines for efficiently and quietly setting and re-setting furniture. Teachers organize materials and establish efficient routines for distribution and clean-up. The classroom reflects a student-centered focus with interesting displays documenting students’ creative process and the products they have created.

      I agree with this because, it is important for students to have a place that allows them to focus on their creative sides. As an example, if I may talk about. I was a theatre student through high school and one specific class came to mind. In Drama 4, we did this process called Libby Appel. Within this process was given a clean slate for us to work from. Our whole classroom space was cleared out and as students we had control. Not only was it great for self-expression, but it also taught us to show our creativity with a clean slate. We as student were given instructions but as control of our successes in the process. The reason I brought this up is because, it is important as educators to teach students to use their own creativity. I think that is when students learn the most.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      We thank the reviewer for the thoughtful summary and positive feedback. We appreciate the recognition of our integrative in vivo and in vitro approach. We're glad the reviewer found our findings on context-dependent gene regulation and developmental signaling valuable.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This result indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display higher levels. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions. Figure 3E only includes genes that are co-repressed by PRDM16 and BMP. CDKN1A’s H3K4me3 signals are consistent between the conditions, and thus it is not a PRDM16- or BMP-regulated gene. We use it as a negative control. 

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. We used BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed) as a negative control set of genes. By computationally determine the nearest TSS to a PRDM16 peak, we identified 24/31 co-repressed genes and 84/153 BMP-only-repressed genes, containing PRDM16 peaks in the E12.5 ChP data. Fisher’s Exact Test comparing the proportions yields the P-value = 0.015.

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked in these assays.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      In the revised manuscript we have included an individal channel of Wnt2b and mark the boundaries. We also provide full-view images and examples of spot segmentation in the new supplementary figure 8. 

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We have revised the text to improve clarity.

      Reviewer #1 (Recommendations for the authors):

      (1) Overall, the manuscript is not easy to read. Here are some causes of confusion for which the presentation could be cleaned up:

      We are grateful for the reviewer’s suggestion. In the revised manuscript, we have made efforts to improve the clarity of the text.

      (a) Part of the first section is confusing in that some statements seem contradictory, in particular:

      "there is no overall patterning defect of ChP and CH in the Prdm16 mutant" (line 125)

      "Prdm16 depletion disrupted the transition from neural progenitors into ChP epithelia" (line 144)

      It would be helpful if the authors could reformulate this more clearly.

      We modified the text to clarify that while the BMP-patterned domain is not affected, the transition of NSCs into ChP epithelial cells is compromised in the Prdm16 mutant.

      (b) Flag_PRDM16, PRDM16_expressing, PRDM16_E, PRDM16 OE all seem to refer to the same PRDM16 overexpressing cells, which is very confusing. The authors should use consistent naming. Moreover, it would be good if they renamed these all to PRDM16_OE to indicate expression is not endogenous but driven by a constitutive promoter.

      We appreciate the comment and agree that the use of multiple terms to refer to the same PRDM16-overexpressing condition was confusing. Our original intention in using Prdm16_E was to distinguish cells expressing PRDM16 from the two other groups: wild-type cells and Prdm16_KO cells, which both lack PRDM16 protein expression. However, we acknowledge that Prdm16_E could be misinterpreted as indicating expression from the endogenous Prdm16 promoter. To avoid this confusion and ensure consistency, we have now standardized the terminology and refer to this condition as Prdm16_OE, indicating Flag-tagged PRDM16 expression driven by a constitutive promoter.

      (c) Line 179 states "generated a cell line by infecting Prdm16_KO cells with the same viral vector, expressing 3xNSL_Flag". Do the authors mean 3xNLS_Flag_Prdm16, so these are the Prdm16_KO_E cells by the notation suggested above? Or is this a control vector with Flag only? The following paragraph refers to Supplementary Figure 2C-F where the same construct is called KO_CDH, suggesting this was an empty CDH vector, without Flag, or Prdm16. This is confusing.

      We appreciate the reviewer’s careful reading and helpful comment. We acknowledge the confusion caused by the inconsistent terminology. To clarify: in line 179, we intended to describe an attempt to generate a Prdm16_KO cell line expressing 3xNLS_Flag_Prdm16, not a control vector with Flag only. However, despite repeated attempts, we were unable to establish this line due to low viral efficiency and the vulnerability of Prdm16_KO cells to infection with the large construct. Therefore, these cells were not included in the subsequent analyses.

      The term KO_CDH refers to Prdm16_KO cells infected with the empty CDH control vector, which lacks both Flag and Prdm16. This is the line used in the experiments shown in Supplementary Fig. 2C–F. We have revised the text throughout the manuscript to ensure consistent use of terminology and to avoid this confusion.

      (2) The introductory statements on lines 53-54 could use more references.

      Thanks for the suggestion. We have now included more references.

      (3) It would be helpful if all structures described in the introduction and first section were annotated in Figure 1, or otherwise, if a cartoon were included. For example, the cortical hem, and fourth ventricle.

      Thanks for the suggestion. We have now indicated the structures, ChP, CH and the fourth ventricle, in the images in Figure 1 and Supplementary Figure 1.

      (4) In line 115, "as previously shown.." - to keep the paper self-contained a figure illustrating the genetics of the KO allele would be helpful.

      Thanks for the suggestion. We have now included an illustration of the Prdm16 cGT allele in Figure 1B.

      (5) In Figure 1D as costain for a ChP marker would be helpful because it is hard to identify morphologically in the Prdm16 KO.

      Appoligize for the unclarity. The KO allele contains a b-geo reporter driven by Prdm16 endogenous promoter. The samples were co-stained for EdU, b-Gal and DAPI. To distingquish the ChP domain from the CH, we used the presence of b b-Gal as a marker. We indicated this in the figure legend, but now have also clarified this in the revised text.

      (6) The details in Figure 1E are hard to see, a zoomed-in inset would help.

      A zoomed-in inset is now included in the figure.

      (7) Supplementary Figure 2A does not convincingly show that PRDM16 protein is undetectable since endogenous expression may be very low compared to the overexpression PRDM16_E cells so if the contrast is scaled together it could appear black like the KO.

      We appreciate the reviewer’s point and have carefully considered this concern. We concluded that PRDM16 protein is effectively undetectable in cultured wild-type NSCs based on direct comparison with brain tissue. Both cultured NSCs and brain sections were processed under similar immunostaining and imaging conditions. While PRDM16 showed robust and specific nuclear localization in embryonic brain sections (Fig. 1B and Supplementary Fig. 1A), only a small subset of cultured NSCs exhibited PRDM16 signal, primarily in the cytoplasm (middle panel of Fig. 2A). This stark contrast supports our conclusion that endogenous PRDM16 protein is either absent or significantly downregulated in vitro. Because of this limitation, we turned to over-expressing Prdm16 in NSC culture using a constitutive promoter. 

      (9) Line 182 "Following the washout step" - no such step had been described, maybe replace by "After washout of BMP".

      Yes, we have revised the text.

      (8) Line 214: "indicating a modest level" - what defines modest? Compared to what? Why is a few thousand moderate rather than low? Does it go to zero with inhibitors for pathways?

      Here a modest level means a lower level than to that after adding BMP4. To clarify this, we revised the description to “indicating endogenous levels of …”

      (9) The way qPCR data are displayed makes it difficult to appreciate the magnitude of changes, e.g. in Supplementary Figure 2B where a gap is introduced on the scale. Displaying log fold change / relative CT values would be more informative.

      We used a segmented Y-axis in Supplementary Figure 2B because the Prdm16 overexpression samples exhibited much higher experssion levels compared to other conditions. In response to this suggestion, we explored alternative ways to present the result, including ploting log-transformed values and log fold changes. However, these methods did not enhance the clarity of the differences – in fact, log scaling made the magnitude of change appear less apparent. To address this, we now present the overexpression samples in a separate graph, thereby eliminating the need for a broken Y-axis and improving the overall readability of the data.

      (10) Writing out "3 days" instead of 3D in Figure 2A would improve clarity. It would be good if the used time interval is repeated in other figures throughout the paper so it is still clear the comparison is between 0 and 3 days.

      We have changed “3D” to “3 days”. All BMP4 treatments in this study were 3 days.

      (11) Line 290: "we found that over 50% of SMAD4 and pSMAD1/5/8 binding peaks were consistent in Prdm16_E and Prdm16_KO cells, indicating that deletion of Prdm16 does not affect the general genomic binding ability of these proteins" - this only makes sense to state with appropriate controls because 50% seems like a big difference, what is the sample to sample variability for the same condition? Moreover, the next paragraph seems to contradict this, ending with "This result suggests that SMAD binding to these sites depends on PRDM16". The authors should probably clarify the writing.

      We appreciate the reviwer’s comment and agree that clarification was needed. Our point was that SMAD4 and pSMAD1/5/8 retain the ability to bind DNA broadly in the Prdm16 KO cells, with more than half of the original binding sites still occupied. This suggests that deletion of Prdm16 does not globally impair SMAD genomic binding. Howerever, our primary interest lies in the subset of sites that show differential by SMAD binding between wt and Prdm16 KO conditions, as thse are likely to be PRDM16-dependent. 

      In the following paragraph, we focused specifically on describing SMAD and PRDM16 co-bound sites. At these loci, SMAD4 and pSMAD1/5/8 showed reduced enrichment in the absence of PRDM16, suggesting PRDM16 facilitates SMAD binding at these particular regions. We have revised the text in the manuscript to more clearly distinguish between global SMAD binding and PRDM16-dependent sites.

      (12) Much more convincing than ChIP-qPCR for c-FOS for two loci in Figures 5F-G would be a global analysis of c-FOS ChIP-seq data.

      We agree that a global c-FOS ChIP-seq analysis would provide a more comprehensive view of c-FOS binding patterns. However, the primary focus of this study is the interaction between BMP signaling and PRDM16. The enrichment of AP-1 motifs at ectopic SMAD4 binding sites was an unexpected finding, which we validated using c-FOS ChIP-qPCR at selected loci. While a genome-wide analysis would be valuable, it falls beyond the current scope. We agree that future studies exploring the interplay among SMAD4/pSMAD, PRDM16, and AP-1 will be important and informative.

      (13) Figure 6A is hard to read. A heatmap would make it much easier to see differences in expression. Furthermore, if the point is to see the difference between ChP and CH, why not combine the different subclusters belonging to those structures? Finally, why are there 28 genes total when it is said the authors are evaluating a list of 31 genes and also displaying 6 genes that are not expressed (so the difference isn't that unexpressed genes are omitted)?

      For the scRNA-seq data, we chose violin plots because they display both gene expression levels and the number of cells that express each gene. However, we agree that the labels in Figure 6A were too small and difficult to read. We have revised the figure by increasing the font size and moved genes with low expression to  Supplementary Figure 5A. Figure 6A includes 17 more highly expressed genes together with three markers, and  Supplementary Figure 5A contains 13 lowly expressed genes. One gene Mrtfb is missing in the scRNA-seq data and thus not included. We have revised the description of the result in the main text and figure legends.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      We thank the reviewer for the thorough and thoughtful summary of our study. We’re glad the key findings and significance of our work were clearly conveyed, particularly regarding the role of PRDM16 in coordinating BMP and Wnt signaling during ChP development. We also appreciate the recognition of our integrated approach and the potential implications for understanding ChP-related diseases.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  Our results cannot rule out the presence of other co-factors.

      Reviewer #2 (Recommendations for the authors):

      Minor typo: [7, page 3] "sicne" should be "since".

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised some part of the text to improve clarity.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      We thank the reviewer for their positive feedback and thoughtful summary. We appreciate the recognition of our efforts to define the role of PRDM16 in BMP signaling and stem cell regulation, as well as the soundness of our experimental design and analysis.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We have revised the discussion according to the suggestions.

      Reviewer #3 (Recommendations for the authors):

      Specific minor recommendations:

      Page 18. Line 526: In a footnote, the authors point out a recent report which in parallel was investigating the link between PRDM16 and SMAD4. There is substantial non-overlap between these two papers. To aid the reader, I would encourage the authors to discuss that paper in the discussion section of the manuscript itself, highlighting any similarities/differences in the topic/results.

      Thanks for the suggestion. We now included the comparison in the discussion. One conclusion between our study and this publication is consistent, that PRDM16 functions as a co-repressor of SMAD4. However, the mechanims are different. Our data suggests a model in which PRDM16 facilitates SMAD4/pSMAD binding to repress proliferation genes under high BMP conditions. However, the other report suggests that SMAD4 steadily binds to Prdm16 promoter and switches regulatory functions depending on the co-factors. Together with PRDM16, SMAD4 represses gene expression, while with SMAD3 in response to high levels of TGF-b1, it activates gene expression. These differences could be due to different signaling (BMP versus TGF-b), contexts (NSCs versus Pancreatic cancers) etc.

      Page 3. Line 65: typo 'since'

      We appreciate the reviewer’s careful reading. We have now corrected the typo and revised the text to improve clarity.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Comments on introduction:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003_)._

      Comments on materials and methods:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      Comments on results:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      Comments on discussion:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Duilio M. Potenza et al. explores the role of Arginase II in cardiac aging, majorly using whole-body arg-ii knock-out mice. In this work, the authors have found that Arg-II exerts non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. The authors have used arg II KO mice and an in vitro culture system to study the role of Arg II. The authors have also reported the cell-autonomous effect of Arg-II through mitochondrial ROS in fibroblasts that contribute to cardiac aging. These findings are sufficiently novel in cardiac aging and provide interesting insights. While the phenotypic data seems strong, the mechanistic details are unclear. How Arg II regulates the IL-1b and modulates cardiac aging is still being determined. The authors still need to determine whether Arg II in fibroblasts and endothelial contributes to cardiac fibrosis and cell death. This study also lacks a comprehensive understanding of the pathways modulated by Arg II to regulate cardiac aging.

      We sincerely appreciate the valuable feedback provided by the reviewer. It's gratifying to hear that our work provided novel information on the role of arginase-II in cardiac aging which is a complex process involving various cell types and mechanisms. We have devoted considerable effort by performing new experiments to address the reviewer's comments and to delineate more detailed mechanisms of Arg-II in cardiac aging. Please, see below our specific answers to each point of the reviewers.

      Strengths:

      This study provides interesting information on the role of Arg II in cardiac aging.

      The phenotypic data in the arg II KO mice is convincing, and the authors have assessed most of the aging-related changes.

      The data is supported by an in vitro cell culture system.

      We appreciate this reviewer’s positive assessment on the strength of our study.

      Weaknesses:

      The manuscript needs more mechanistic details on how Arg II regulates IL-1b and modulates cardiac aging.

      We made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b precursor are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). Moreover, in the mouse bone-marrow-derived macrophages, LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation as illustrated in Suppl. Fig. 6G. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      The authors used whole-body KO mice, and the role of macrophages in cardiac aging is not studied in this model. A macrophage-specific arg II Ko would be a better model.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      Experiments need to validate the deficiency of Arg II in cardiomyocytes.

      As pointed out by this reviewer in the comment point 10, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, even RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      The authors have never investigated the possibility of NO involvement in this mice model.

      As above mentioned, we made great effort and have performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology. Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. The results show that Arg-II and iNOS can be upregulated by LPS independent of each other and iNOS slightly reduces Arg-II expression. However, both Arg-II and iNOS are required for IL-1b production upon LPS stimulation. For detailed results and discussion, please see answers to the comments point 2 or point 6 raised by this reviewer.

      A co-culture system would be appropriate to understand the non-cell-autonomous functions of macrophages.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We think that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media released from macrophages. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. Therefore, we are confident that our experimental model with conditioned medium is sufficiently enough to demonstrate a paracrine effect of cell-cell interaction (please also see answers to the comment point 16.

      The Myocardial infarction data shown in the mice model may not be directly linked to cardiac aging.

      As we have introduced and discussed in the manuscript, aging is a predominant risk factor for cardiovascular disease (CVD). Studies in experimental animal models and in humans provide evidence demonstrating that aging heart is more vulnerable to stressors such as ischemia/reperfusion injury and myocardial infarction as compared to the heart of young individuals. Even in the heart of apparently healthy individuals of old age, chronic inflammation, cardiomyocyte senescence, cell apoptosis, interstitial/perivascular tissue fibrosis, endothelial dysfunction and endothelial-mesenchymal transition (EndMT), and cardiac dysfunction either with preserved or reduced ejection fraction rate are observed. Our study is aimed to investigate the role of Arg-II in cardiac aging phenotype and age-associated cardiac vulnerability to stressors. Therefore, cardiac functional changes and myocardial infarction in response to ischemia/reperfusion injury are suitable surrogate parameters for the purpose.

      Reviewer #2 (Public Review):

      Summary:

      The results from this study demonstrated a cell-specific role of mitochondrial enzyme arginase-II (Arg-II) in heart aging and revealed a non-cell-autonomous effect of Arg-II on cardiomyocytes, fibroblasts, and endothelial cells through the crosstalk with macrophages via inflammatory factors, such as by IL-1b, as well as a cell-autonomous effect of Arg-II through mtROS in fibroblasts contributing to cardiac aging phenotype. These findings highlight the significance of non-cardiomyocytes in the heart and bring new insights into the understanding of pathologies of cardiac aging. It also provides new evidence for the development of therapeutic strategies, such as targeting the ArgII activation in macrophages.

      We're grateful for the reviewer's positive feedback, acknowledging the significant findings of our study on the role of arginase-II (Arg-II) in cardiac aging. We appreciate this reviewer’s insight into the therapeutic potential of targeting Arg-II activation in macrophages and are excited about the implications for future interventions in age-related cardiac pathologies. Thank you for recognizing the importance of our work in advancing our understanding of cardiac aging and potential therapeutic strategies.

      Strengths:

      This study targets an important clinical challenge, and the results are interesting and innovative. The experimental design is rigorous, the results are solid, and the representation is clear. The conclusion is logical and justified.

      We thank this reviewer for the positive comment.

      Weaknesses:

      The discussion could be extended a little bit to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several critical concerns, specifically about the mechanism of how Arg-II plays a role in cardiac aging.

      My major concerns are:

      (1) The authors have shown non-cell-autonomous effects on aging cardiomyocytes, fibroblasts, and endothelial cells mediated by IL-1b from aging macrophages. A macrophage-specific Arg-II knock-out mouse model is a suitable and necessary control to establish claims.

      We fully agree with this comment of the reviewer. Unfortunately, this macrophage specific arg-ii knockout animal model is not available, yet. Future research shall develop the macrophage-specific arg-ii<sup>-/-</sup> mouse model to confirm this conclusion with aging animals. Since Arg-II is also expressed in fibroblasts and endothelial cells and exerts cell-autonomous and paracrine functions, aging mouse models with conditional arg-ii knockout in the specific cell types would be the next step to elucidate cell-specific function of Arg-II in cardiac aging. We have pointed out this aspect for future research on page 19, lines 2 to 6.

      (2) This study suggests that Arg-II exerts its effect through IL-1b in cardiac ageing. However, all experiments performed to demonstrate the link between ArgII and IL-1β are correlative at best. The underlying molecular mechanism, including transcription factors involved in the regulation of IL-1β by arg-ii, has not been demonstrated.

      We sincerely appreciate this reviewer’s comment on the aspect! To make it clear, a causal role of Arg-II in promoting IL-1β production in macrophages is evidenced by the experimental results showing that old arg-ii<sup>-/-</sup> mouse heart has lower IL-1β levels than the age-matched wt mouse heart (Fig. 6A to 6D). We further showed that the cellular IL-1β protein levels and release are reduced in old arg-ii<sup>-/-</sup> mouse splenic macrophages as compared to the wt cells (Fig. 7A, 7C, and 7D). This result is further confirmed with the mouse macrophage cell line RAW264.7 (Suppl. Fig. 5A and suppl. Fig. 5C), in which we demonstrate that silencing arg-ii reduces IL-1β levels stimulated with LPS.

      According to this reviewer’s comment (see comment point 6), we made further effort to investigate possible involvement of iNOS in Arg-II-regulated IL-1β production in macrophages stimulated with LPS. We performed new experiments in human monocyte cell line (THP1) in which iNOS is not expressed and not inducible by LPS and arg-ii gene was knocked out by CRISPR technology in the cells.

      Moreover, murine bone-marrow derived macrophages in which inos gene was ablated, is also use for this purpose. We found that in the human THP1 monocytes in which Arg-II but not iNOS is induced by LPS (100 ng/mL for 24 hours) (Suppl. Fig. 6A), mRNA and protein levels of IL-1b are markedly reduced in arg-ii knockout THP1<sup>arg-ii<sup>-/-</sup></sup> as compared to the THP1<sup>wt</sup> cells (Suppl. Fig. 6B and 6C), further confirming that Arg-II promotes IL-1b production as also shown in RAW264.7 macrophages (Suppl. Fig. 5A and 5C). The results suggest that Arg-II promotes IL-1b production independently of iNOS. Moreover, the role of iNOS in IL-1b production was also studied in the mouse bone-marrow-derived macrophages in which inos gene is ablated. The results demonstrate that LPS-induced IL-1b production is inhibited by inos deficiency (BMDM<sup>inos-/-</sup> vs BMDM<sup>wt</sup>) (Suppl. Fig. 6D and 6E), while Arg-II levels are slightly enhanced in the BMDM<sup>inos-/-</sup> cells (Suppl. Fig. 6D and 6F). Since arginase and iNOS share the same metabolic substrate L-arginine, <sup>inos-/-</sup> is expected to increase IL-1b production. This is however not the case. A strong inhibition of IL-1β production in <sup>inos-/-</sup> macrophages is observed. These results implicate that iNOS promotes IL-1β production independently of Arg-II and the inhibiting effect of IL-1β by inos deficiency is dominant and able to counteract Arg-II’s stimulating effect on IL-1β production. Hence, our results demonstrate that Arg-II promotes IL-1β production in macrophages independently of iNOS. All together, these results suggest that iNOS slightly reduces Arg-II expression. Arg-II and iNOS can be upregulated by LPS independently. Both Arg-II and iNOS are required for IL-1b production upon LPS stimulation (This concept is illustrated in the Suppl. Fig. 6G). The new results are described on page 8, the last paragraph and page 9, the 1st paragraph, presented in Suppl. Fig.6. The legend to Suppl. Fig. 6 is described in the file “Supplementary figure legend-R”. The related experimental methods are updated on page 23, the last two paragraphs and page 26 the last paragraph. The results are discussed o page 14, the last paragraph and page 15, the first two paragraphs.

      (3) Figure 2: The authors have not validated the whole-body Arg-II knock-out mice for arg-ii ablation.

      Thanks for pointing out this missing information! We have added the information regarding genotyping of the mice in the method section on page 20, first paragraph. Moreover, Fig. 5C also confirms the genotyping of the non-cardiomyocyte cells isolated from wt and arg-ii<sup>-/-</sup> animals.

      (4) It is unclear why the authors have chosen to focus on IL-1β specifically, among other pro-inflammatory cytokines that were also downregulated in Arg-II-/- mice as demonstrated in Fig. 2A-D.

      We appreciate the reviewer's question, which provides an opportunity to delve deeper into our findings. In our investigation, we observed that aging is accompanied by elevated levels of various proinflammatory markers. Intriguingly, our data revealed that tnf-α remained unaffected by the ablation of arg-ii during aging in the heart tissues, while Il-1β showed a significant reduction in arg-ii<sup>-/-</sup> animals compared to age-matched wild-type (wt) mice (Fig. 2). Mcp1 is however a chemoattractant for macrophages and F4-80 serves as a pan marker for macrophages. Moreover, our previous studies demonstrate a relationship between Arg-II and IL-1β in vascular disease and obesity and age-associated renal and pulmonary fibrosis. Finally, IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials. Therefore, we have focused on IL-1β in this study. We have now explained and strengthened this aspect in the manuscript on page 7, the last two lines and page 8, the 1st paragraph as following:

      “Taking into account that our previous studies demonstrated a relationship of Arg-II and IL-1β in vascular disease and obesity (Ming et al., 2012) and in age-associated organ fibrosis such as renal and pulmonary fibrosis (Huang et al., 2021; Zhu et al., 2023), and IL-1β has been shown to play a causal role in patients with coronary atherosclerotic heart disease as shown by CANTOS trials (Ridker et al., 2017), we therefore focused on the role of IL-1β in crosstalk between macrophages and cardiac cells such as cardiomyocytes, fibroblasts and endothelial cells”.

      (5) Although macrophages are shown to be involved in cardiac ageing in the arg-ii mouse model, the authors have not estimated macrophage infiltration and expression of inflammatory or senescence markers in the hearts of these mice.

      Thank you very much for raising this important point! Taking the comments of the reviewer into account, we have performed new experiments, i.e., multiple immunofluorescent staining to analyze the infiltrated (CCR2<sup>+</sip>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects the infiltrated and resident macrophage populations in the aging heart and whether this is regulated by arg-ii<sup>-/-</sup>. The results show an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2G). This result is in accordance with the result of f4/80 gene expression shown in Fig. 2A, demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      Moreover, the aged-associated accumulation of the senescence cells as demonstrated by p16<sup>ink4</sup> positive cells is significantly reduced in arg-ii<sup>-/-</sup> animals. This new result is incorporated in the Fig. 1 as Fig. 1G and 1H and described / discussed on page 5, the 2nd paragraph and page 14, the 2nd last sentences of the 1st paragraph. The method of p16<sup>ink4</sup> staining is included in the method section on page 22, the 1st paragraph, line 7. The legend to Fig. 1 is revised accordingly.

      (6) Previously, Arg-II has been reported to serve a crucial role in ageing associated with reduced contractile function in rat hearts by regulating Nitric Oxide Synthase (PMID: 22160208). Elevated NO and superoxide have been shown to play crucial roles in the etiology of cardiovascular diseases (PMID: 24180388). Therefore, it is important to assess whether Nitric Oxide (NO) is involved in the aging-related phenotype in this mouse model.

      Following the reviewer's suggestion, we conducted new experiments to investigate the role of nitric oxide (NO) in the context of the effect of Arg-II-induced IL-1b production in macrophages. We have addressed this question in the response to the comment point 2.

      (7) Based on the results demonstrated in the study, ablation of Arg-II can be expected to cause a reduction in inflammation-associated phenotypes throughout the body at the multi-organ level. The observed improved cardiac phenotype could be an outcome of whole-body Arg-II ablation. It would be fruitful to develop a cardiac-specific Arg-II knockout mouse model to establish the role of Arg-II in the heart, independent of other organ systems.

      We agree with the comment of the reviewer on this point. Unfortunately, as explained above (see point 1), it is currently not possible for us to perform the requested experiments, due to lack of cardiac specific arg-ii-knockout mouse model. Moreover, such an approach is complicated by the absence of Arg-II in cardiomyocytes and the expression of Arg-II in multiple cells including endothelial cells, fibroblasts and macrophage of different origin (resident and monocyte-derived infiltrating cells). It’s thus difficult to generate a cardiac-specific gene knockout mouse. One shall investigate roles of cell-specific Arg-II in cardiac aging by generating cell-specific arg-ii<sup>-/-</sup> mice. We appreciate very this important aspect and have discussed issue on page 19, the lines 2 to 6.

      (8) Contrary to the findings in this paper, Arg-II has previously been reported to be essential for IL-10-mediated downregulation of pro-inflammatory cytokines, including IL-1β (PMID: 33674584).

      Thank you very much for mentioning this study! We have now discussed thoroughly the controversies as the following on page 15, the last paragraph and page 16, the 1st paragraph;

      “It is of note that a study reported that Arg-II is required for IL-10 mediated-inhibition of IL-1b in mouse BMDM upon LPS stimulation (Dowling et al., 2021), which suggests an anti-inflammatory function of Arg-II. The results of our present study, however, demonstrate that LPS enhances Arg-II and IL-1b levels in macrophages and knockout or silencing Arg-II reduces IL-1b production and release, demonstrating a pro-inflammatory effect of Arg-II. Our findings are supported by the study from another group, which shows decreased pro-inflammatory cytokine production including IL-6 and IL-1b in arg-ii<sup>-/-</sup> BMDM most likely through suppression of NFkB pathway, since arg-ii<sup>-/-</sup> BMDM reveals decreased activation of NFkB and IL-1b levels upon LPS stimulation (Uchida et al., 2023). Most importantly, our previous study also showed that re-introducing arg-ii gene back to the arg-ii<sup>-/-</sup> macrophages markedly enhances LPS-stimulated pro-inflammatory cytokine production (Ming et al., 2012), providing further evidence for a pro-inflammatory role of arg-ii under LPS stimulation. In support of this conclusion, chronic inflammatory diseases such as atherosclerosis and type 2 diabetes (Ming et al., 2012), inflammaging in lung (Zhu et al., 2023), kidney (Huang et al., 2021) and pancreas (Xiong, Yepuri, Necetin, et al., 2017) of aged animals or acute organ injury such as acute ischemic/reperfusion or cisplatin-induced renal injury are reduced in the arg-ii<sup>-/-</sup> mice (Uchida et al., 2023). The discrepant findings between these studies and that with IL-10 may implicate dichotomous functions of Arg-II in macrophages, depending on the experimental context or conditions. Nevertheless, our results strongly implicate a pro-inflammatory role of Arg-II in macrophages in the inflammaging in aging heart”.

      (9) The authors have only performed immunofluorescence-based experiments to show fibrotic and apoptotic phenotypes throughout this study. To verify these findings, we suggest that they additionally perform RT-PCR or western blotting analysis for fibrotic markers and apoptotic markers.

      The fibrotic aspect was analyzed not only by microscopy but also by using a quantitative biochemical assay such as hydroxyproline content assessment. Hydroxyproline is a major component of collagen and largely restricted to collagen. Therefore, the measurement of hydroxyproline levels can be used as an indicator of collagen content as previous investigated in the lung (Zhu et al., 2023). We have also measured collagen genes expression by RT-qPCR as suggested by the reviewer and found an age-related decline of collagen mRNA expression levels in both wt and arg-ii<sup>-/-</sup> mice, suggesting that the age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations, including collagen synthesis and/or degradation. The results are in accordance with that reported by other studies published in the literature. We have pointed out this aspect on page 5, the 2nd paragraph:

      “The increased cardiac fibrosis in aging is however, associated with decreased mRNA levels of collagen-Ia (col-Ia) and collagen-IIIa (col-IIIa), the major isoforms of pre-collagen in the heart (Suppl. Fig. 2A and 2B), which is a well-known phenomenon in cardiac fibrotic remodelling (Besse et al., 1994; Horn et al., 2016). The results demonstrate that age-associated cardiac fibrosis and prevention in arg-ii<sup>-/-</sup> mice is due to alterations of translational and/or post-translational regulations including collagen synthesis and/or degradation”.

      The results are presented in Suppl. Fig. 2, legend to Suppl. Fig. 2 is included in the file “Suppl. figure legend_R”. Suppl. table 2 for primers is revised accordingly.

      We did not use additional markers to perform apoptotic assays with whole heart, since Fig. 3 shows good evidence that the aging is associated with increased apoptotic cells in the heart and significantly reduced in the arg-ii<sup>-/-</sup> mice. The reduction of TUNEL positive (apoptotic) cells in aged arg-ii<sup>-/-</sup> mice is mainly due to decrease in apoptotic cardiomyocytes. With the histological analysis, the apoptotic cell types can be well analysed. Moreover, biochemical assay for apoptosis such as caspase-3 cleavage with whole heart tissues can not distinguish apoptotic cell types and may not be sensitive enough for aging heart, due to relatively low numbers of apoptotic cells in aging heart as compared to myocardial infarct model.  

      (10) Figure 4: arg-ii has previously been reported to be expressed in rat cardiomyocytes (PMID: 16537391). We strongly suggest the authors verify the expression of Arg-II via immunostaining in isolated cardiomyocytes (using published protocols), and by using multiple different cardiomyocyte-specific markers for colocalization studies to prove the lack of arg-ii expression beyond a reasonable doubt.

      As pointed out by this reviewer, Arg-II was previously reported to be expressed in isolated cardiomyocytes from in rats (PMID: 16537391). Unfortunately, negative controls. i.e., arg-ii<sup>-/-</sup> samples were not included in the study to avoid any possible background signals. We made great effort to investigate whether Arg-II is present in the cardiomyocytes from different species including mice, rats and humans and have included old arg-ii<sup>-/-</sup> mouse samples as a negative control. This allows to validate the antibody specificity and background noises beyond any reasonable doubt. The new experiments in Suppl. Fig. 4 confirms the specificity of the antibody against Arg-II in old mouse kidney which is known to express Arg-II in the S3 proximal tubular cells (Huang J, et al. 2021). To exclude the possible species-specific different expression of Arg-II in the cardiomyocytes, aged mouse and rat heart tissues were used for cellular localization of Arg-II by confocal immunofluorescence staining. As shown in Suppl. Fig. 4B and 4C, both species show Arg-II expression only in non-cardiomyocytes (cells between striated cardiomyocytes) (red arrows) but not in striated cardiomyocytes. Even in the rat myocardial infarction tissues, Arg-II was not found in cardiomyocytes but in endocardium cells (Suppl. Fig. 4B). In isolated cardiomyocytes exposed to hypoxia, a well know strong stimulus for Arg-II protein levels, no Arg-II signals could be detected, while in fibroblasts from the same animals, an elevated Arg-II levels under hypoxia is demonstrated (Fig. 5B). Furthermore, RT-qPCR could not detect arg-ii mRNA in cardiomyocytes but in non-cardiomyocytes (Fig. 5C). All together, these results demonstrate that Arg-II are not expressed or at negligible levels in cardiomyocytes but expressed in non-cardiomyocytes. This new experiments with rat heart are included in the method section on page 20, the 1st paragraph. The results are described on page 7, the 1st paragraph, and discussed on page 12, the 2nd paragraph. Legend to Suppl. Fig. 4 is included in the file “Suppl. figure legend_R”.

      (11) Figure 6G: It may be worthwhile to supplement arg-ii<sup>-/-</sup> old cells with IL-1beta to see if there is an increase in TUNEL-positive cells.

      IL-1b is a well known pro-inflammatory cytokine that causes apoptosis in various cell types including cardiomyocytes (Shen Y., et al., Tex Heart Inst J. 2015;42:109–116. doi: 10.14503/THIJ-14-4254; Liu Z. et. al., Cardiovasc Diabetol 2015;14,125. doi: 10.1186/s12933-015-0288-y; Li. Z., et al., Sci Adv 2020;6:eaay0589. doi: 10.1126/sciadv.aay0589). We appreciate very much the interesting idea of this reviewer to investigate the apoptotic responses of cardiomyocytes from arg-ii<sup>-/-</sup> mice to IL-1b. We agree that it is possible that cardiomyocytes from wt from arg-ii<sup>-/-</sup> mice react differently to IL-1b, although the cardiomyocytes do not express Arg-II as demonstrated in our present study. If this is true, it must be due to non-cell autonomous effects of different aging microenvironment in the heart or epigenetic modulations of the myocytes. We found that this is a very interesting aspect and requires further extensive investigation. Since our current study focused on the effect of wt and arg-ii<sup>-/-</sup> macrophages on cardiomyocytes and non-cardiomyocytes, we prefer not to include this suggested aspect in our manuscript and would like to explore it in the following study.

      (12) Figures 4-9: It would be interesting to see if the effect of ArgII in cardiac ageing is gender-specific. It is recommended to include experimental data with male mice in addition to the results demonstrated in female mice.

      As pointed out in the manuscript, we have focused on female mice, because an age-associated increase in arg-ii expression is more pronounced in females than in males (Fig. 1A). As suggested by this reviewer, we performed additional experiments investigating effects of arg-ii deficiency in male mice during aging, focusing on pathophysiological outcomes of ischemia/reperfusion injury in ex vivo experiments. The ex vivo functional analytic experiments with Langendorff system were performed in aged male mice (see Suppl. Fig. 9). Following ischemia/reperfusion injury, wt male mice display reduced left ventricular developed pressure (LVDP), as well as the inotropic and lusitropic states (expressed as dP/dt max and dP/dt min, respectively). As previously reported (Murphy et al., 2007), we also found that old male mice are more prone to I/R injury than age-matched female animals. Specifically, 15 minutes of ischemia are enough to significantly affect the left ventricle contractile function in the male mice (Suppl. Fig. 9). As opposite, age-matched old female mice are relatively resistant to I/R injury, and at least 20 min of ischemia are necessary to induce a significant impairment of the contractile function (Fig. 10). Similar to females, the post I/R recovery of cardiac function is also significantly improved in the male arg-ii<sup>-/-</sup> mice as compared to age-matched wt animals. In addition to functional recovery, triphenyl tetrazolium chloride (TTC) staining (myocardial infarction) upon I/R-injury in males is significantly reduced in the age-matched male arg-ii<sup>-/-</sup> animals (Suppl. Fig. 9C and 9D). All together, these results reveal a role for Arg-II in heart function impairment during aging in both genders with a higher vulnerability to stress in the males. These new results are presented in Suppl. Fig. 9, described on page 10, the last paragraph and page 11. The results are discussed on page 18, the 2nd paragraph as following:

      “The fact that aged females have higher Arg-II but are more resistant to I/R injury seems contradictory to the detrimental effect of Arg-II in I/R injury. It is presumable that cardiac vulnerability to injuries stressors depends on multiple factors/mechanisms in aging. Other factors/mechanisms associated with sex may prevail and determine the higher sensitivity of male heart to I/R injury, which requires further investigation. Nevertheless, the results of our study show that Arg-II plays a role in cardiac I/R injury also in males”.

      The information on the experimental methods in the male animals is included on page 20, the last paragraph and page 21, the 1st paragraph. Legend to Suppl. Fig. 9 is included in the file “Suppl. figure legend_R”.

      (13) Figure 6G: cardiomyocytes from wild-type mice, when treated with macrophages, show 0% TUNEL-positive cells. Since it is unlikely to obtain no TUNEL staining in a cell population, there may be an experimental or analytical error.

      Now it is Fig. 7F and 7G. This is due to our specific experimental procedure. After tissue digestion, cardiomyocytes were plated on laminin-coated dishes. Laminin promotes the adhesion of survived cells. Following plating, we conducted a deep washing process to remove damaged and partially adherent cells. This step ensures that only well-shaped, viable, and strongly adherent cells remain as bioassay cells. These “healthy” cells are then selected for the experiments. the apoptotic cells are removed by washing out, reflecting the high viability of the bioassay cells. We have added this detailed information in the method section on page 24, the 2nd paragraph.

      (14) Figure 7J: Please assess whether arg-ii depletion also affects the mtROS phenotype.

      According to the suggestion of this reviewer, we performed new experiments which show that human cardiac fibroblasts (HCFs) exposed to hypoxia (1% O<sub>2</sub>, 48 hours), a known physiological trigger of Arg-II up-regulation, exhibit increased mtROS generation, which involves Arg-II (new Fig. 8M to 8P). We found that Arg-II protein level as well as mtROS (assessed by mitoSOX staining) were both enhanced, accompanied by increased levels of HIF1α (Fig 8M). Moreover, mito-TEMPO pre-incubation reduces mtROS, confirming the mitochondrial origin of the ROS. Silencing of arg-ii with rAd-mediated shRNA, significantly reduces mtROS levels demonstrating a role of Arg-II in the production of mitochondrial ROS in cardiac fibroblasts (Fig 8M to 8P). We have included these results on page 9, the last paragraph and discussed the results on page 17, the 1st paragraph. The related method is described on page 26, the 2nd paragraph. Legend to Fig. 8 is updated on page 32.

      (15) Figure 8A-E: The authors have treated human-origin endothelial cells with mice-origin macrophage-conditioned media. It would be more suitable to treat the endothelial cells with human-origin macrophage-conditioned media.

      We acknowledge the concern regarding the use of mouse-origin macrophage-conditioned media on human-origin endothelial cells. It is to note, the biological cross-reactivity of cytokines from one species on cells from a different species has been reported in the literature. It was observed that there is quite a strict threshold of 60% amino acid identity, above which cytokines tend to cross-react and statistically, cytokines would tend to cross-react more often as their % amino acid identity increases (Scheerlinck JPY. Functional and structural comparison of cytokines in different species. Vet Immunol Immunopathol. 1999; 72:39-44. https://doi.org/10.1016/S0165-2427(99)00115-4). Taking IL-1b as an example, the 17.5 kDa mature mouse and human IL-1b share 92% aa sequence identity, suggesting a high cross-reactivity. Indeed, human IL-1b has shown biological cross-reactivity in mouse cells (Ledesma E., et al. Interleukin-1 beta (IL-1β) induces tumor necrosis factor alpha (TNF-α) expression on mouse myeloid multipotent cell line 32D cl3 and inhibits their proliferation. Cytokine. 2004; 26:66-72. https://doi.org/10.1016/j.cyto.2003.12.009). Moreover, our results also support the reported cross-reactivity between human and mouse IL-1b. The CM from mouse macrophage indeed showed biological function in human endothelial cells. The observed effects of the conditioned media from aged wild-type macrophages on endothelial cells were specifically mediated through IL-1β. This conclusion is supported by our data showing that the upregulation induced by the conditioned media was significantly reduced by the addition of an IL-1β receptor blocker.

      (16) The co-culture system would be more interesting to test the non-cell autonomous role of Arg II.

      We appreciate the suggestion by this reviewer regarding the co-culture system to test the non-cell autonomous role of Arg-II. We believe that our current model, which involves treating cells with conditioned media, is a well-established and effective method for demonstrating the non-cell autonomous role of Arg-II. This approach allows us to observe the effects of Arg-II on surrounding cells through the factors present in the conditioned media. The co-culture system could be considered, if the released factor in the conditioned medium is not stable. This is however not the case. So we are confident that our experimental model with conditioned medium is good enough to demonstrate a paracrine effect of cell-cell interaction.

      Reviewer #2 (Recommendations For The Authors):

      Some minor comments may be considered to improve the realm of the knowledge related to this study.

      We appreciate this comment and have added and revised our discussion on this aspect accordingly at the end of the discussion section on page 19, the last 6 lines.

      (1) The current study showed strong evidence demonstrating the key role of cardiac macrophages in pathologies of cardiac aging, particularly, the macrophages (MФ) from the circulating blood (hematogenous). It is known that the heart is among the minority of organs in which substantial numbers of yolk-sac MФ persist in adulthood and play a crucial role in maintaining cardiac function. Thus, the adult mammalian heart contains two separate and discrete cardiac MФ subgroups, i.e., the resident MФs originated from yolk sac-derived progenitors and the hematogenous MФs recruited from circulating blood monocytes. These two subtypes of MФs may play distinctive roles in the aging heart and the response to cardiac injury. The author could extend the discussion on the possibility of the resident MФs in aging hearts, which could be further investigated in the future.

      We appreciate the suggestion and agree that it provides valuable insight into the study. Taking the comments of the reviewer 1 into account, we have performed new experiments, i.e., co- immunostaining to analyze the infiltrated (CCR2<sup>+</sup>/F4-80<sup>+</sup>) and resident (LYVE1<sup>+</sup>/F4-80<sup>+</sup>) macrophage populations and to investigate to which extent that Arg-II affects infiltrated and resident macrophage populations in the aging heart. We found that in line with the gene expression of f4/80, immunofluorescence staining reveals an age-associated increase in the numbers of F4/80<sup>+</sup> cells in the wt mouse heart, which is reduced in the age-matched arg-ii<sup>-/-</sup> animals (Fig. 2E, F, G), demonstrating that arg-ii gene ablation reduces macrophage accumulation in the aging heart. Interestingly, resident macrophages as characterized by LYVE1<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2E and 2H) are predominant in the aging heart as compared to the infiltrated CCR2<sup>+</sup>/F4-80<sup>+</sup> cells (Fig. 2F and 2I). The increase in both LYVE1<sup>+</sup>/F4-80<sup>+</sup> and CCR2<sup>+</sup>/F4-80<sup>+</sup> macrophages in aging heart is reduced in arg-ii<sup>-/-</sup> mice (Fig. 2E, 2F, 2H, and 2I). These new results are described on page 6, the 1st paragraph, presented in Fig. 2E to 2I, and discussed on page 13, the 2nd, paragraph. The legend to Fig. 2 is revised. The method for this additional experiment is included on page 22, the 1st paragraph.

      (2) It would be beneficial to the readers if the author could provide some explanation about why ArgII could not be detected in VSMCs in the mouse heart and the species difference between humans and mice. In addition, the author may provide an assumption on the possibility that there may also be a cross-talk between macrophages and VSMCs in the aging heart. A little bit more explanation in the Discussion will be helpful.

      We acknowledge and appreciate the suggestion and have discussed these points on page 19 as the following:

      “In this context, another interesting aspect is the cross-talk between macrophages and vascular SMC in the aging heart. In our present study, we could not detect Arg-II in vascular SMC of mouse heart but in that of human heart. This could be due to the difference in species-specific Arg-II expression in the heart or related to the disease conditions in human heart which is harvested from patients with cardiovascular diseases. Indeed, in the apoe<sup>-/-</sup> mouse atherosclerosis model, aortic SMCs do express Arg-II (Xiong et al., 2013). It is interesting to note that rodents hardly develop atherosclerosis as compared to humans. Whether this could be partly contributed by the different expression of Arg-II in vascular SMC between rodents and humans requires further investigation. In our present study, the aspect of the cross-talk between macrophages and vascular SMC is not studied. Since the crosstalk between macrophages and vascular SMC has been implicated in the context of atherogenesis as reviewed (Gong et al., 2025), further work shall investigate whether Arg-II expressing macrophages could interact with vascular SMC in the coronary arteries in the heart and contribute to the development of coronary artery disease and/or vascular remodelling and the underlying mechanisms“.

      (3) Please clarify the arrows in Figure 9C that indicate the infarct area in each splicing section from one heart.

      The arrows in Figure 9C (now Fig. 10C) are indeed utilized to indicate the sections displaying the infarcted area within each splicing section from one heart. We have explained the arrow in the figure legend (now Fig. 10 and also new Suppl. Fig. 9).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers and Revision Plan

      We thank all three reviewers for their thoughtful and constructive comments. We are pleased that the reviewers found our work to be "very interesting," "well written," with "high quality" data that is "convincing" and will be "of broad interest for the community of axon guidance, circuit formation and brain development." We particularly appreciate the recognition that our study provides "novel functions for Cas family genes in forebrain axon organization" and uses "state-of-the art mouse genetics" with "quantitative and statistical rigor." Below are our detailed responses to each reviewer's comments, including extensive additional experiments and analyses that we will perform to significantly strengthen the manuscript.

      Reviewer #1

      We thank this reviewer for recognizing that our experiments are "carefully done and quantified" with "clear and striking" phenotypes that "support most of the conclusions in the manuscript." We appreciate their acknowledgment that this work will be "of interest to developmental neurobiologists and the axon guidance and adhesion fields."

      Major Comments:

      __ Authors clearly show that misplaced TCA axons are coordinate with cortical layer defects, with misplaced tbr1+ neurons, in EMX-Cre cas and integrin knockouts, suggesting these axons are following misplaced cells. These results are described as 100% coordinate, but since there is no figure of quantification, authors need to clarify how many embryos were examined for each genotype, as this was not described in results or legends.__ We apologize for this oversight and will provide detailed quantification of this important finding. We examined a total of 11 Emx1Cre;TcKO embryos with 13 controls, and 14 Emx1Cre;Itgb1 embryos with 13 littermate controls at two developmental stages (E16.5 and P0) to quantify the coordination between misplaced Tbr1+ neurons and cortical bundle formation. This quantification will be presented in the main text and figure legend.

      Here's a more detailed breakdown of those numbers: For Emx1Cre;TcKO knockouts, we examined 7 controls and 5 mutants at P0, and 6 controls and 6 mutant embryos at E16.5. For the Emx1Cre;Itgb1 knockouts, we examined 5 controls and 6 mutant neonates at P0, and 8 controls and 8 mutant embryos at E16.5.

      __ Are the neurons not misplaced in Nex cre cas or integrin knockouts? One would think presumably not, but then what are the tbr1+ cell migration defect caused by? I struggle with the semantics of non-neuronal autonomous role of cas in cortex, since tbr1+ neurons are misplaced, and this is what the axons are mistargeting too. So yes, potentially cas or b1 is not needed in those neurons, but those misplaced neurons are presumably driving the phenotype.__

      We agree that this important point requires better explanation. You are absolutely correct that Tbr1+ neurons are not misplaced in NexCre;TcKO mutants (Wong et al., 2023), which is precisely why these animals do not exhibit cortical bundle formation. In addition to our previously published data showing normal location of Tbr1+ neurons in those mutants, we can also provide similar analysis at E16.5 and P0 as a supplemental figure. The model we propose is that Cas genes are required in radial glial cells for proper positioning of deep layer cortical neurons. These correctly positioned neurons, in turn, provide appropriate guidance cues for TCA projections. Hence, our model is that while the role of Cas genes is non-neuronal-autonomous (acting in radial glia rather than in the neurons themselves), the mispositioned Tbr1+ neurons in Emx1Cre;TcKO mutants drive the TCA misprojection phenotype. We will clarify this mechanism in the discussion and provide a new graphical model as a supplemental figure to facilitate conceptualization of our conclusions.

      __ Authors need to clarify in the discussion that they can't rule out the cas not also needed in tca neurons, Since neither emx or nex cre would hit those cells.__

      We will add the following clarification to the discussion: The analysis of cortical bundle formation in Emx1Cre;TcKOrevealed a comparable phenotype to that observed in NestinCre;TcKO, strongly suggesting a cortical-autonomous role for Cas genes in CB formation. "However, we cannot formally exclude a thalamus-autonomous role for Itgb1 or Cas genes in TCA pathfinding, as we did not ablate these genes exclusively in the thalamus. Future studies using thalamus-specific Cre drivers would be needed to definitively address this question."

      __ Could authors add boxes in zoomed out brain images to denote zoom regions. And potentially a schematic demonstrating placement of DiI for lipophilic tracing experiments.__

      We will add boxes to denote zoom regions where possible throughout the manuscript. For some high magnification panels, we selected the best representative images, which don't necessarily correspond to specific regions of the lower magnification panels, but we will note this in the figure legends. We will also add a schematic diagram to a supplemental figure illustrating DiI placement for all lipophilic tracing experiments.

      Reviewer #2

      We thank this reviewer for describing our study as "very interesting," "well written," with data that are "of high quality" and findings that are "convincing." We appreciate their recognition that we used "state-of-the art mouse genetics" and that our work will be "of broad interest for the community of axon guidance, circuit formation and brain development."

      Major Comments:

      __ Immunofluorescence labeling for other β-integrin family members to examine expression in AC axons may provide insights into why β1-integrin deficiency does not replicate the Cas TcKO phenotype.__ This is an excellent suggestion that we will address experimentally. We will perform RNAscope analysis for integrin β5, β6, and β8 in developing piriform and S1 cortex at E14.5, E16.5, and E18.5, as these are the only other β-integrins expressed during cortical development. We anticipate that this analysis may reveal expression of alternative β-integrins in the neurons that extend axons along the developing anterior commissure, which would provide a potential explanation for why β1-integrin deficiency does not replicate the AC phenotype observed in Cas TcKO animals. These new data will be presented as part of a new figure.

      __ Is there any evidence that β1-integrin in developing cortical axons is colocalized with Cas proteins (in vivo or in vitro)?__

      We have tested multiple antibodies for p130Cas and CasL without success in cortical tissue. However, we will test two new integrin β1 antibodies and a new p130Cas antibody. While direct colocalization may be challenging due to species restrictions and tissue-specific antibody performance, we will attempt to show regional co-expression in consecutive sections. If the integrin antibodies work, we will present data as a supplemental figure demonstrating that p130Cas (using our BAC-EGFP reporter) and β1-integrin show overlapping expression patterns in developing cortical white matter tracts and neurons, supporting their potential functional interaction. In the end, while we will try to address this critique, we will be limited by the reagents that are available to us.

      Minor Comments:

      __ How long do the Cas TcKO with the various cre driver survive?__

      We have not systematically quantified survival beyond 6 months, but surprisingly, survival up to 6 months of age appears normal for all genotypes examined. This information will be included in the Methods section.

      Reviewer #3

      We thank this reviewer for acknowledging that our "main claims and conclusions are solidly supported by the data" with "good overall data quality" and "high quantitative and statistical rigor." We appreciate their recognition that we "uncover novel functions for Cas family genes in forebrain axon organization" and that our "overall reporting and discussion of findings is data-driven and refrains from excessive speculation."

      Addressing Concerns About Novelty and Impact:

      We respectfully disagree with the characterization of our findings as "somewhat incremental." While we acknowledge that similar axonal defects have been described in other lamination mutants, our study makes several novel and significant contributions:

      First demonstration of Cas family requirement in forebrain axon tract development: This is the first study to establish roles for Cas proteins in axon guidance, representing a completely new function for these well-studied signaling molecules. Novel β1-integrin-independent role for Cas proteins: Our finding that AC defects occur in Cas mutants but not β1-integrin mutants reveals a previously unknown signaling pathway and challenges the assumption that Cas proteins always function downstream of β1-integrin. Mechanistic insights into cortical-TCA interactions: While the general principle that cortical lamination affects TCA projections has been established, our work provides the first demonstration of how specific adhesion signaling molecules (Cas proteins) control this process through radial glial function. Cell-type specific requirements: Our systematic analysis using multiple Cre drivers provides unprecedented detail about where and when Cas proteins function during brain development, revealing both neuronal-autonomous (AC) and non-neuronal autonomous (TCA) roles.

      As Reviewer #2 noted, "The main advancement is a more nuanced understanding of where and when these molecules function during brain development and insights into the origin of the defects observed." This represents significant mechanistic progress in understanding forebrain circuit assembly.

      Specific Comments:

      Suggestion about cell autonomy testing: We appreciate the optional suggestion to test strict cell autonomy using sparse deletion approaches. While this would indeed be interesting, it would represent a substantial undertaking beyond the scope of the current study. However, we believe our current data using NexCre (which hits early postmitotic neurons) versus NestinCre (CNS-wide deletion) and Emx1Cre (cortical progenitors) provides supportive evidence for neuronal autonomy of the AC phenotype, as mentioned by the reviewer.

      In vitro axon guidance assays: This is an excellent suggestion for future molecular studies. In the discussion we identify specific candidate guidance molecules (e.g. Ephrins) that would be prime targets for such experiments.

      Cross-Reviewer Comments:

      We appreciate Reviewer #3's agreement with the other reviewers' suggestions and will address the quantification of neuronal mispositioning/axon bundle correlation as requested by Reviewer #1.

      Additional Improvements:

      Beyond addressing the specific reviewer comments, we will make several additional improvements to strengthen the manuscript:

      Enhanced statistical analysis: All quantifications will include appropriate statistical tests with clearly stated n values and multiple litters represented. Expanded discussion: We will better contextualize our findings within the broader axon guidance literature and discuss future directions (e.g. TCAs). New data: Additional controls, expression analysis, and quantifications will strengthen our conclusions.

      We believe these revisions, particularly the new experimental data addressing integrin family expression and the detailed quantification of phenotype coordination, will significantly strengthen our conclusions and demonstrate the novelty and impact of our findings. We hope the reviewers will find these improvements satisfactory and agree that our work makes important contributions to understanding axon guidance mechanisms in the developing forebrain.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this beautiful paper the authors examined the role and function of NR2F2 in testis development and more specifically on fetal Leydig cells development. It is well known by now that FLC are developed from an interstitial steroidogenic progenitors at around E12.5 and are crucial for testosterone and INSL3 production during embryonic development, which in turn shapes the internal and external genitalia of the male. Indeed, lack of testosterone or INSL3 are known to cause DSD as well as undescended testis, also termed as cryptorchidism. The authors first characterized the expression pattern of the NR2R2 protein during testis development and then used two cKO systems of NR2F2, namely the Wt1-creERT2 and the Nr5a1-cre to explore the phenotype of loss of NR2F2. They found in both cases that mice are presenting with undescended testis and major reduction in FLC numbers. They show that NR2F2 has no effect on the amount and expression of the progenitor cells but in its absence, there are less FLC and they are immature.

      The effect of NR2F2 is cell autonomous and does not seem to affect other signalling pathways implemented in Leydig cell development as the DHH, PDGFRA and the NOTCH pathway.

      Overall, this paper is excellent, very well written, fluent and clear. The data is well presented, and all the controls and statistics are in place. I think this paper will be of great interest to the field and paves the way for several interesting follow up studies as stated in the discussion

      Reviewer #2 (Public review):

      The major conclusion of the manuscript is expressed in the title: "NR2F2 is required in the embryonic testis for Fetal Leydig Cell development" and also at the end of the introduction and all along the result part. All the authors' assertions are supported by very clear and statistically validated results from ISH, IHC, precise cell counting and gene expression levels by qPCR. The authors used two different conditional Nr2f2 gene ablation systems that demonstrate the same effects at the FLC level. They also showed that the haplo-insufficiency of Wt1 in the first system (knock-in Wt1-cre-ERT2) aggravated the situation in FLC differentiation by disturbing the differentiation of Sertoli cells and their secretion of pro-FLC factors, which had a confounding effect and encouraged them to use the second system. This demonstrates the great rigor with which the authors interpreted the results. In conclusion, all authors' claims and conclusions are justified by their high-quality results.

      Recommendations for the authors:

      We thank the reviewers for their comments which have improved and strengthened our manuscript. Please see our responses to specific comments below in blue.

      Reviewer #1 (Recommendations for the authors):

      I have several small comments:

      (1) There has been recently a preprint from the Yao lab about the role of NR2F2 is steroidogenic cells (https://www.biorxiv.org/content/10.1101/2024.09.16.613312v1). They performed cKO of NR2F2 using the Wt1creERT2 and found similar results. You should present and discuss this paper in light of your results.

      Estermann et al., report a very similar phenotype of FLC hypoplasia in an independent mouse model of Nr2f2 conditional mutation. We have now referred to this article in the discussion of our manuscript as suggested.

      (2) In the introduction I think it is important to mention that the steroidogenic progenitors are derived from Wnt5a positive cells (https://pubmed.ncbi.nlm.nih.gov/35705036/).

      We have mentioned this point in the introduction as suggested.

      (3) In both models you show a decrease in the number of FLC (60% or 40%) and yet they both present with undescended testis. It is important to discuss the fact that there is no need for a complete ablation of testosterone and INSL3 in order to get cryptorchidism.

      We have mentioned this point in the discussion as suggested.

      The fact that you get only partial reduction in FLC is likely due to redundancy with additional factors, possibly the ARX like you stated in the discussion and it will be interesting to explore that in the future but is beyond the scope of the current paper.

      We agree with the reviewer, this question could be addressed by analyzing Arx,Nr2f2 double mutants.

      (4) In page 8 line 11 you mention data not shown- not sure if this is allowed in the journal .

      The data is now shown in Figure S5A as suggested.

      (5) In Figure 2- it will be good if you add a schematic model of the mouse strains used as well as the experimental and control mice next to the Tam scheme. Similar scheme should be in figure 3 for Nr5a1-cre.

      We have modified Figures 2 and 3 as suggested.

      (6) There is a clear and pronounced effect of the testis cords number and size. It will be good if you could qualify testis cord numbers/ diameter in the mutants even if you do not follow in detail the effect on Sertoli cells

      We have quantified testis cords numbers and area in E14.5 Control and Wt1<sup>CreERT2/+</sup>; Nr2f2<sup>flox/flox</sup> testes. This data is now shown in Figure S2M.

      (7) It will be good to present the undescended testis in the Wt1-cre model in figure 2 and not in the supp figure

      The data is now shown in Figure 2H-I as suggested.

      (8) Please add labelling of the testis, kidney, bladder, vas deferens in figure 3 N+O and in the Wt1-cre model

      We have added the labels in Figures 2 and 3 as suggested.

      (9) In figure 5 which present both models- it will be good to use the scheme I suggested before to highlight which results refer to which ko model.

      We have modified Figure 5 as suggested.

      Reviewer #2 (Recommendations for the authors):  

      The work presented in this manuscript gave me food for thought. I have always been intrigued by the fact that of the large number of interstitial cells in the testis, a minority differentiate into mature androgen-producing Leydig cells. In other words, how is the number of functional steroidogenic cells defined from a large pool of progenitor cells (ARX and NR2F2 positive ones)? This may have a link with the levels of androgens produced (a kind of feedback control) or the effectiveness of these androgens on the target tissues (i.e.: as spermatogenesis efficiency in adults). In addition, there must be specific signals (probably linked to gonadotropins) that induce the recruitment of Leydig cells from the progenitor pool. Perhaps the genetic models generated in this study could help to address these questions. I leave it to the authors to judge.

      We agree with the reviewer. How NR2F2 (and other factors) integrate extrinsic cues to regulate the recruitment of a subset of interstitial steroidogenic progenitors along the Leydig cell differentiation pathway is a fascinating question beyond the scope of this work.

      In addition to this reflection, I propose a few minor modifications likely to improve the quality of the manuscript:

      (1) Page 3, lane 3: I suggest to replace "growth" by "differentiation"

      We have modified the text as suggested.

      (2) Page 3, lane 4: the "scrotum" is missing in the parenthesis. Please add it before "and penis"

      We have modified the text as suggested.

      (3) Page 5, lanes 21-24: kidney hypoplasia is also evident on Fig S2H (stated in the figure legend). It could be also mentioned in this sentence and it implies "...that NR2F2 function is required for testicular and kidney development."

      We have modified the text as suggested.

      (4) Page 5, lanes 28-30. In addition to the reduction in the number of HSD3B-positive cells, HSD3B staining seems clearly more faint in mutant FLC (Fig 2M) compared to adrenal cells on the same section or FLC in control gonads. This fits well with other results on the level of steroidogenic enzymes (Fig 2O) and those presented thereafter (Fig S4 I-J and Fig 5). Perhaps the author could mention this fact.

      We have modified the text as suggested in the results section “NR2F2 is required for FLC maturation” (Page 8).

      (5) Page 5, lanes 31-34: testicular descent is hugely sensible to INSL3 in the mouse (by contrast with other species where androgens seem to be more critical). I was wondering if you can check a better phenotypic marker for the absence (or reduction) of androgens like the differentiation of epididymides by HE staining or the anogenital distance at birth.

      We have measured the anogenital distance at P0 and P1 as suggested and have included the corresponding graph in Fig. S3P

      (6) Page 8, lanes 21-22: "HSD3B positive FLC were smaller and more elongated". It is clear on Fig 5F but not evident on Fig 5D. Could the authors propose another image?

      We have modified Figure 5 as suggested and provide now another example of HSD3B positive FLCs in a Nr5a1Cre; Nr2f2<sup>flox/flox</sup> mutant gonad (Fig. 5D) and the corresponding control littermate (Fig. 5C).

      (7) Page 14, lane 12: "(arrow in I)" should be "(arrow in H)"

      We have modified the text as suggested. Please note that ACTA 2 expression is now shown in Figure S2 G-H.

      (8) Page 15, lane 6: "Arrows indicate NR5A1 positive FLC". There is no arrow on Fig4 C,D; but a kind of scale bar on the enlargement shown in C.

      We have modified Figure 4 as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions.

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302):

      “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].”

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Line 303-313):

      “In our simulation experiments, we assume the coexistence of the Pavlovian fear system and the instrumental system to demonstrate the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone, with higher punishment sensitivity, therefore we do not argue for the necessity for the Pavlovian fear system here. Instead, the Pavlovian fear system itself could be a potential biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies; the neural substrates for the Pavlovian fear system are well known (e.g., the limbic loop and amygdala, further see Supplementary Fig. 16). Additionally, Pavlovian fear system provides a separate punishment memory that cannot be erased by greater rewards like [Elfwing and Seymour, 2017, Wang et al., 2018]. This fundamental point can be observed in our simple T-maze simulations, where the Pavlovian fear system encourages avoidance behaviour and the agent chooses the smaller reward instead of the greater reward.”

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      Thanks to the reviewer’s comments, we have now added a paragraph in our Discussion section (Line 290-302 onwards) explaining the similarity of our models and their integrated interpretation. We hope this addresses the reviewer’s concerns.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      Thanks to the reviewer’s comments, we have now mentioned this point in Lines 299-302.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We hope our additions to the Discussion section, from Line 290 to Line 313 address the reviewer’s concerns.  

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We have now added a line discussing this. (Line 356-358)

      “Future work could also use a formal account of uncertainty which could fit the fear-conditioned skin-conductance response better than Pearce-Hall associability [Tzovara et al., 2018].”

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      Thank you, we have added further explanations in the discussion section. We have further improved the writing in abstract, introduction and Methods section taking into account recommendations from reviewer #2 and #3.

      Reviewer #2 (Recommendations for the authors): 

      (1) Why is there no flexible omega in Figures 3B and 3C? Did I miss this? 

      Thank you. We have now added additional text to explain our motivation in Experiment 2, which only varies the fixed omega and omits the flexible omega (Lines 136-140).

      “In this set of results, we wish to qualitatively tease apart the role of a Pavlovian bias in shaping and sculpting the instrumental value and also provide more insight into the resulting safety-efficiency trade-off. Having shown the benefits of a flexible ω in the previous section, here we only vary the fixed ω to illustrate the effect of a constant bias and are not concerned with the flexible bias in this experiment.”

      We encourage the reader to consider this akin to an additional study that will explain how Pavlovian bias to withdraw can play a role in avoiding punishments similar to that of punishment sensitivity. This is particularly important as we do have neural correlates for Pavlovian biases but lack a clear neural correlation for punishment sensitivity so far, as mentioned in our new additions to the Discussion section (Lines 303-313).

      (2) The introduction of the flexible omega and the PAL agent in the results is a bit sudden. Some more details are needed to understand this during the first read of this passage. 

      We thank reviewer #2 for bringing this to our notice. We have attempted to refine our passage by including sentences like - 

      “The standard (rational) reinforcement learning system is modelled as the instrumental learning system. The additional Pavlovian fear system biases the withdrawal actions to aid in safe exploration, in line with our hypothesis.”

      “Both systems learn using a basic temporal difference updating rule (or in instances, its special case, the Rescorla-Wagner rule)”

      “We implement the flexible ω using Pearce-Hall associability (see equation 15 in Methods). The Pearce-Hall associability maintains a running average of absolute temporal difference errors (δ) as per equation 14. This acts as a crude but easy-to-compute metric for outcome uncertainty which gates the influence of the Pavlovian fear system, in line with our hypothesis. This implies that higher the outcome uncertainty, as is the case in early exploration, the more cautious our agent will be, resulting in safer exploration”

      (3) In my view, the possibility of modeling moving predators is extremely interesting. I would include Figure 8D and the corresponding explanation in the main text. 

      Response with revision: We thank the reviewer for finding our simulation on moving predators extremely interesting. Unfortunately, since our instrumental system is not model-based, and especially is not explicitly modelling the predator dynamics, our simulation might not be a very accurate representation of real moving predator environments. As pointed out by Reviewer #1, perhaps several other systems other than Pavlovian fear responses are necessary for safe behaviour in such environments and we hope to address these in future studies. Thanks again for taking an interest in our simulations.

      (4) The VR experiment should be mentioned more clearly in the abstract and the introduction. It should be mentioned a bit more clearly why VR was helpful and why the authors did not use a simple bird's eye grid world task. 

      I cannot assess the RLDDM and I did not check the code. 

      Thank you, we have now mentioned the VR experiment more clearly in the abstract and the introduction. We also now further mention that the VR experiment “builds upon previous Go-No Go studies studying Pavlovian-Instrumental transfer (Guitart-Masip et al, 2012; Cavanagh et al, 2013). The virtual-reality approach confers a greater ecological validity and the immersive nature may contribute better fear conditioning, making it easier to distinguish the aversive components.”

      A bird’s eye grid world may not invoke a strong withdrawal response, as seen in these immersive approach-withdrawal tasks where we can clearly distinguish a Pavlovian fear-based withdrawal response. We did include immersive VR maze results in the supplementary materials, but future work is needed to isolate the different systems at play in such a complex behaviour.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      Thank you, we have now attempted to clarify these points in the Discussion section by adding the following text (Lines 313-321):

      “ We next discuss the plausibility of pre-training to select the hardwired actions In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesised to function as a Pavlovian fear/threat learning system [Menegas et al., 2018].”

      Reviewer #3 (Recommendations for the authors): 

      I have relatively little to suggest, as in my view the paper is robust, thorough, and creative, and does enough to support the primary argument being made at the most fundamental level. My suggestions for improvement are as follows: 

      (1) Some aspects of the model are potentially unrealistic (as described in the public review), and the paper may benefit from some discussion of these issues or attempts to make the model more realistic - i.e., to what extent is this plausible in explaining more complex avoidance behaviour? Primarily, the fact that pre-training is required to identify actions subject to Pavlovian bias seems unlikely to be effective in real-world situations - is there a better way to achieve this in cases where there isn't necessarily an instinctual Pavlovian response? 

      Thank you, we agree that the advantage of Pavlovian bias is restricted to the bias/instinctual Pavlovian response conferred by evolution. Future work is needed to model more complex avoidance behaviour such as escapes. We hope to have made this more clear with our edits to the Discussion (Lines 299-302) in our response to Reviewer #1’s comments, specifically:

      “The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020]”  

      (2) The description of the model in the method can be a little hard to follow and would benefit from further explanation of certain parameters. In general, it would be good to ensure that all terms mentioned in equations are described clearly in the text (for example, in Equation1 it isn't clear what k refers to). 

      Thank you, we have now added further information on all of the parameters in Equation 1 and overall improved the Methods section writing, for instance using time subscript for less confusion while introducing the parameters. We use the standard notation used in Sutton and Barto textbook. k refers to the timesteps into the future, and is now explained better in the Methods section.

      (3) Another point of clarification in Equation 1 - does the policy account for the Pavlovian influence or is this purely instrumental? 

      Thank you, Equation 1 is purely instrumental. We have now specifically mentioned this. The Pavlovian influence follows later. They are combined into propensities for action as per equations 11-13.

      (4) I was curious whether similar outcomes could be achieved by more complex instrumental models without the need for Pavlovian influences. For example, could different risk-sensitive decision rules (e.g., conditional value at risk) that rely only on the instrumental system afford safe behaviour without the need for an additional Pavlovian system? 

      Thank you for your comment. Yes, CVaR can achieve safe exploration/cautious behaviour in choices similar to Pavlovian avoidance learning. But we think both differ in the following ways:

      (1) CVaR provides the correct solution to the wrong problem (objective that only maximises the lower tail of the distribution of outcomes)

      (2) Pavlovian bias provides the wrong solution to the right problem (normative objective, but a Pavlovian bias which may be vestige of evolution)

      Here we use the “wrong problem, wrong solution, wrong environment” categorisation terminology from Huys et al. 2015.

      Huys, Q. J., Guitart-Masip, M., Dolan, R. J., & Dayan, P. (2015). Decision-theoretic psychiatry. Clinical Psychological Science, 3(3), 400-421.

      Secondly, we find an effect of Pavlovian bias on reaction times - slowing down of approach responses and faster withdrawal responses. We do not think this can be best explained in a CVaR type model and is a direction for future work. We think such model-based methods are slower to compute, but Pavlovian withdrawal bias is quicker response.

      We have now included this in brief in Lines 280-288.

      (5) Figure 5 would benefit from a clearer caption as it is not necessarily clear from the current one that the left panels refer to choices and the right panels to reaction times. 

      Thank you, we have improved the caption for Fig. 5.

      (6) It would be good to include some indication of the quality of the model fits for the human behavioural study (i.e., diagnostics such as R-hat) to ensure that differences in model fit between models are not due to convergence issues with different models. This would be especially helpful for the RLDDM models as these can be difficult to fit successfully.

      Thank you, we observed that all Rhat values were strictly less than 1.05 (most parameters were less than 1.01 and generally close to 1), indicating that the models converged. We have now added this line to the results (Line 246-248). Thanks to the reviewer’s comments, we have now added the following text to our Discussion section (Lines 290-302): “When it comes to our experiments, both the simulation and VR experiment models are related and derived from the same theoretical framework maintaining an algebraic mapping. They differ only in task-specific adaptations i.e. differ in action sets and differ in temporal difference learning rules - multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task. This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. A further minor difference between the simulation and VR experiment models is the use of a baseline bias in the human experiment's RL and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in the grid world simulations. As mentioned previously, we use the grid world tasks for didactic purposes, similar to Dayan et al. [2006] and common to test-beds for algorithms in reinforcement learning [Sutton et al., 1998]. The main focus of our work is on Pavlovian fear bias in safe exploration and learning, rather than on its role in complex navigational decisions. Future work can focus on capturing more sophisticated safe behaviours, such as escapes [Evans et al., 2019, Sporrer et. al., 2023] and model-based planning, which span different aspects of the threat-imminence continuum [Mobbs et al., 2020].” In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In Sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled.

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (and named it Sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in Sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through the modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. Given Sakura's role in pMad expression, it would be insightful to investigate whether overexpression of Mad or pMad could mitigate these phenotypic defects (UAS-Mad line is available at Bloomington Drosophila Stock Center).

      As suggested reviewer 1, we tested whether overexpression of Mad could rescue or mitigate the loss of sakura phenotypic defects, by using nos-Gal4-VP16 > UASp-Mad-GFP in the background of sakura<sup>null</sup>. As shown in Fig S11, we did not observe any mitigation of defects.

      Then, we also tested whether expressing a constitutive active form of Tkv, by using UAS-Dcr2, NGT-Gal4 > UASp-tkv.Q235D in the background of sakura<sup>RNAi</sup>. As shown in Fig S12, we did not observe any mitigation of defects by this approach either.

      A major concern is the overstated role of Sakura in regulating Orb. The data does not reveal mislocalized Orb; rather, a mislocalized oocyte and cytoskeletal breakdown, which may be secondary consequences of defects in oocyte polarity and structure rather than direct misregulation of Orb. The conclusion that Sakura is necessary for Orb localization is not supported by the data. Orb still localizes to the oocyte until about stage 6. In the later stage, it looks like the cytoskeleton is broken down and the oocyte is not positioned properly, however, there is still Orb localization in the ~8-stage egg chamber in the oocyte. This phenotype points towards a defect in the transport of Orb and possibly all other factors that need to localize to the oocyte due to cytoskeletal breakdown, not Orb regulation directly. While this result is very interesting it needs further evaluation on the underlying mechanism. For example, the decrease in E-cadherin levels leads to a similar phenotype and Bam is known to regulate E-cadherin expression. Is Bam expressed in these later knockdowns?

      We examined Bam and DE-Cadherin expression in later RNAi knockdowns driven by ToskGal4. As shown in Fig S9, Bam was not expressed in these later knockdowns compared with controls. DE-Cadherin staining suggested a disorganized structure in late-stage egg chambers.

      We agree that we overstated a role of Sakura in regulating Orb in the initial manuscript. We changed the text to avoid overstating.

      The manuscript would benefit from a more balanced interpretation of the data concerning Sakura's role in Orb regulation. Furthermore, a more expanded discussion on Sakura's potential role in pMad regulation is needed. For example, since Otu and Bam are involved in translational regulation, do the authors think that Mad is not translated and therefore it is the reason for less pMad? Currently the discussion presents just a summary of the results and not an extension of possible interpretation discussed in context of present literature.

      We changed the text to avoid overstating a role of Sakura in regulating Orb localization.

      Based on our newly added results showing that transgenic overexpression of Mad could not rescue or mitigate the phenotypic defects of sakura<sup>null</sup> mutant (Fig S11), we do not think the reason for less pMad is less translation of Mad.

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field. However, there are some weaknesses and I would recommend that they address the comments in the Recommendations for the authors section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General Comments:

      (1) The gene nomenclature: As mentioned in the text, Sakura means cherry blossom and is one of the national flowers of Japan. I am not sure whether the phenotype of the CG14545 mutant is related to Sakura or not. I would like to suggest the authors reconsider the naming.

      The striking phenotype of sakura mutant­ is tumorous and germless ovarioles. The tumorous phenotype, exhibiting lots of round fusome in germarium visualized by anti-Hts staining, looks like cherry blossom blooming to us. Also, the germless phenotype reminds us falling of the cherry blossom, especially considering that the ratio of tumorous phenotype decreases and that of germless decreases over fly age. Furthermore, “Sakura” symbolizes birth and renewal in Japanese culture (the last author of this manuscript is Japanese). Our findings indicated that the gene sakura is involved in regulation of renewal and differentiation of GSCs (which leads to birth). These are the reasons for the naming, which we would like to keep.

      (2) In many of the microscopic photographs in the figures, especially for the merged confocal images, the resolution looks low, and the images appear blurred, making it difficult to judge the authors' claims. Also, the Alpha Fold structure in Figure 10A requires higher contrast images. The magnification of the images is often inadequate (e.g. Figures 3A, 3B, 5E, 7A, etc). The authors should take high-magnification images separately for the germarium and several different stages of the egg chambers and lay out the figures.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      Specific Comments

      (1) How Sakura can cooperate with Otu remains unanswered. Sakura does not regulate deubiquitinase activity in vitro. Both sakura and otu appear to be involved in the Dpp-Smad signaling pathway and in the spatial control of Bam expression in the germarium, whereas Otu has been reported to act in concert with Bam to deubiquitinate and stabilize Cyc A for proper cystoblast differentiation. Therefore, it is plausible that the stabilization of Cyc A in the Sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. The authors may need to provide much deeper insight into the mechanism by which Sakura plays roles in these seemingly separable steps to orchestrate germ cell maintenance and differentiation during early oogenesis.

      Yes, it is possible that the stabilization of CycA in the sakura mutant is an indirect consequence of Bam misexpression and independent of the Sakura-Otu interaction. To test the significance and role of the Sakura-Otu interaction, we have attempted to identify Sakura point mutants that lose interaction with Otu. If such point mutants were successfully obtained, we were planning to test if their transgene expression could rescue the phenotypes of sakura mutant as the wild-type transgene did. However, after designing and testing the interaction of over 30 point mutants with Otu, we could not obtain such mutant version of Sakura yet. We will continue making efforts, but it is beyond the scope of the current study. We hope to address this important point in future studies.

      (2) Figure 3A and Figure 4: The authors show that piRNA production is abolished in Sakura KO ovaries. It is known that piRNA amplification (the ping-pong cycle) occurs in the Vasa-positive perinuclear nuage in nurse cells. Is the nuage normally formed in the absence of Sakura? The authors provide high-magnification images in the germarium expressing Vas-GFP. How does Sakura, and possibly Out, contribute to piRNA production? Are the defects a direct or indirect consequence of the loss of Sakura?

      We provided higher magnification images of germarium expressing Vasa-EGFP in sakura mutant background (Fig 3A and 3B). The nuage formation does not seem to be dysregulated in sakura mutant. Currently, we do not know if the piRNA defects are direct or indirect consequence of the loss of Sakura. This question cannot be answered easily. We hope to address this in future studies.

      (3) Figure 7 and Figure 12: The authors showed that Dpp-Smad signaling was abolished in Sakura KO germline cells. The same defects were also observed in otu mutant ovaries (Figure 12B). How does the Sakura-Otu axis contribute to the Dpp-Smad pathway in the germline?

      As we mentioned in the response to comment (1), we attempted to test the significance and role of the Sakura-Otu interaction, including in the Dpp-Smad pathway in the germline, but we have not yet been able to obtain loss-of-interaction mutant(s) of Sakura. We hope to address this in future studies.

      (4) Figure 9 and Fig 10: The authors raised antibodies against both Sakura and Otu, but their specificities were not provided. For Western blot data, the authors should provide whole gel images as source data files. Also, the authors argue that the Otu band they observed corresponds to the 98-kDa isoform (lines 302-304). The molecular weight on the Western blot alone would be insufficient to support this argument.

      When we submitted the initial manuscript, we also submitted original, uncropped, and unmodified whole Western blot images for all gel images to the eLife journal, as requested. We did the same for this revised submission. I believe eLife makes all those files available for downloading to readers.

      In the newly added Fig S13B, we used very young 2-5 hours ovaries and 3-7 days ovaries. 2-5 days ovaries contain only mostly pre-differentiated germ cells. Older ovaries (3-7 days in our case here) contain all 14 stages of oogenesis and later stages predominate in whole ovary lysates.

      As reported in previous literature (Sass et al. 1995), we detected a higher abundance of the 104 kDa Otu isoform than the 98 kDa isoform in from 2-5 hours ovaries and predominantly the 98 kDa isoform in 3-7 days ovaries (Fig S13B). These results confirmed that the major Otu isoform we detected in Western blot, all of which uses old ovaries except for the 2-5 hours ovaries in Fig S13B, is the 98 kDa isoform.

      (5) Otu has been reported to regulate ovo and Sxl in the female germline. Is Sakura involved in their regulation?

      We examined sxl alternative splicing pattern in sakura mutant ovaries. As shown in Fig S6, we detected the male-specific isoform of sxl RNA and a reduced level of the female-specific sxl isoform in sakura mutant ovaries. Thus Sakura seems to be involved in sxl splicing in the female germline, while further studies will be needed to understand whether Sakura has a direct or indirect role here.

      (6) Lines 443-447: The GSC loss phenotype in piwi mutant ovaries is thought to occur in a somatic cell-autonomous manner: both piwi-mutant germline clones and germline-specific piwi knockdown do not show the GSC-loss phenotype. In contrast, the authors provide compelling evidence that Sakura functions in the germline. Therefore, the Piwi-mediated GSC maintenance pathway is likely to be independent of the Sakura-Otu axis.

      We changed the text accordingly.

      Reviewer #2 (Recommendations for the authors):

      Overall, this is a cleanly written manuscript, with some sentences/sections that are confusing the way they are constructed (i.e. Line 37-38, 334, section on Flp/FRT experiments).

      We rewrote those sections to avoid confusion.

      Comment for all merged image data: the quality of the merged images is very poor - the individual channels are better but should also be reprocessed for more resolved image data sets. Also, it would be helpful to have boundaries drawn in an individual panel to identify the regions of the germarium, as cartooned in Figure S1A (which should be brought into Figure 1) F-actin or Vsg staining would have helped throughout the manuscript to enhance the visualization of described phenotypes.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      We outlined the germarium in Fig 1E.

      We brought the former FigS1 into Fig 1A.

      We provided Phalloidin (F-Actin) staining images in Fig S7.

      All p-values seem off. I recommend running the data through the student t-test again.

      We used the student t-test to calculate p-values and confirmed that they are correct. We don’t understand why the reviewer thinks all p-values seem off.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      Figure 1

      (1) Within the text, C is mentioned before A.

      We updated the text and now we mentioned Fig 1A before Fig 1C.

      (2) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      (3) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      (4) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakura<sup>null</sup> phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      (5) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      (6) Line 95 "as shown below" is not clear that it's referencing panel D.

      We now referenced Fig 1D.

      (7) Re: Figures 1 E and F. There is no mention of Hts or Vasa proteins in the text.<br /> "Sakura-EGFP was not expressed in somatic cells such as terminal filament, cap cells, escort cells, or follicle cells (Figure 1E). In the egg chamber, Sakura-EGFP was detected in the cytoplasm of nurse cells and was enriched in developing oocytes (Figure 1F)". Outline these areas or label these structures/sites in the images. The color of Merge labels is confusing as the blue is not easily seen.

      We mentioned Hts and Vasa in the text. We labeled the structures/sites in the images and updated the color labeling.

      Figure 2

      (1) Entire figure is not essential to be a main figure, but rather supplemental.

      We don’t agree with the reviewer. We think that the female fertility assay data, where sakura null mutant exhibits strikingly strong phenotype, which was completely rescued by our Sakura-EGFP transgene, is very important data and we would like to present them in a main figure.

      (2) 2A- one star (*) significance does not seem correct for the presented values between 0 and 100+.

      In the original manuscript, as we mentioned in each figure legends, we used asterisk (*) to indicate p-value <0.05, without distinguishing whether it’s <0.001, <0.01< or <0.05.

      Probably reviewer 2 is suggesting us to use ***, **, and *, to indicate p-value of <0.001, <0.01, and <0.05, respectively? If so, we now followed reviewer2’s suggestions.

      (3) 2C images are extremely low quality. Should be presented as bigger panels.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images. We also presented as bigger panels.

      Figure 3

      (1) "We observed that some sakura<sup>null</sup> /null ovarioles were devoid of germ cells ("germless"), while others retained germ cells (Fig 3A)" What is described is, that it is hard to see. Must have a zoomed-in panel.

      We provided zoomed-in panels in Fig 3B

      (2) C - The control doesn't seem to match. Must zoom in.

      We provided matched control and also zoomed in.

      (3) For clarity, separate the tumorous and germless images.

      In the new image, only one tumorous and one germless ovarioles are shown with clear labeling and outline, for clarity.

      (4) Use arrows to help clearly indicate the changes that occur. As they are presented, they are difficult to see.

      We updated all the panels to enhance clarity.

      (5) Line 158 seems like a strong statement since it could be indirect.

      We softened the statement.

      Figure 4

      (1) Line 188-189 - Conclusion is an overstatement.

      We softened the statement.

      (2) Is the piRNA reduction due to a change in transcription? Or a direct effect by Sakura?

      We do not know the answers to these questions. We hope to address these in future studies.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer’s point. We think using numbers, not %, makes more sense.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      (3) In Line 218 there's an extra parenthesis after the PGC acronym.

      We corrected the error.

      (4) TOsk-Gal4 fly is not in the Methods section.

      We mentioned TOsk-Gal4 in the Methods.

      Figure 6:

      (1) The FLP-FRT section must be rewritten.

      We rewrote the FLP-FRT section.

      (2) A - include statistics.

      We included statistics using the chi-square test.

      (3) B - is not recalled in the Results text.

      We referred Fig 6B in the text.

      (4) Line 232 references Figure 3, but not a specific panel.

      We referred Fig 3A, 3C, 3D, and 3E, in the text.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      (1) There should be CycA expression in the control during the first 4 divisions.

      Yes, there is CycA expression observed in the control during the first 4 divisions, while it’s much weaker than in sakura<sup>null</sup> clone.

      (2) Helpful to add the dotted lines to delineate (A) as well.

      We added a dotted outline for germarium in Fig 7A.

      (3) Line 263 CycA is miswritten as CyA.

      We corrected the typo.

      Figure 9

      (1) Otu antibody control?

      We validated Otu antibody in newly added Fig 10C and Fig S13A.

      (2) Which Sakura-EGFP line was used? sakura het. or null background? This isn't mentioned in the text, nor legend.

      We used Sakura-EGFP in the background of sakura[+/+]. We added this information in the methods and figure legend.

      (3) C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti-Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      Figure 10

      (1) A- The resolution of images of the ribbon protein structure is poor.

      We are very sorry for the low-resolution images. This was caused when the original PDF file with high-resolution images was compressed in order to meet the small file size limit in the eLife submission portal. In the revised submission, we used high-resolution images.

      (2) A table summarizing the interactions between domains would help bring clarity to the data presented.

      We added a table summarizing the fragment interaction results.

      (3) Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer’s points. In our study, even for the full-length proteins.

      We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      Figure 12

      (1) A - control and RNAi lines do not match.

      We provided matched images.

      (2) In general, since for Sakura, only its binding to Otu was identified and since they phenocopy each other, doesn't most of the characterization of Sakura just look at Otu phenotypes? Does Sakura knockdown affect Otu localization or expression level (and vice versa)?

      We tested this by Western (Fig S15) and IF (Fig 12). Sakura knockdown did not decrease Otu protein level, and Otu knockdown did not decrease Sakura protein level (Fig S15). In sakura<sup>null</sup> clone, Otu level was not notably affected (Fig 12). In sakura<sup>null</sup> clone, Otu lost its localization to the posterior position within egg chambers.

      Figure S6

      (1) It is Luciferase, not Lucifarase.

      We corrected the typo.

      Reviewer #3 (Recommendations for the authors):

      (1) It is interesting that germless and tumorous phenotypes coexist in the same population of flies. Additional consideration of these essentially opposite phenotypes would significantly strengthen the study. For example, do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age? The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype? Is transposon expression involved in either phenotype? Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole? Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes? What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts? It may not be necessary to answer all of these questions, but more insight into how these two phenotypes can be caused by loss of sakura would be helpful.

      We performed new experiments to answer these questions.

      do they co-exist within the same fly and are the tumorous ovarioles present in newly eclosed flies or do they develop with age?

      Tumorous and germless ovarioles coexist in the same fly (in the same ovary). Tumorous ovarioles are present in very young (0-1 day old) flies, including newly eclosed (Fig S5). The ratio of germless ovarioles increases and that of tumorous ovarioles decreases with age (Fig S5).

      The data in Figure 8 show that bam knockdown partially suppresses the germless phenotype. What effect does it have on the tumorous phenotype?

      bam knockdown effect on tumorous phenotype is shown in Fig S10. bam knockdown increased the ratio of tumorous ovarioles and the number of GSC-like cells.

      Is transposon expression involved in either phenotype?

      Since our transposon-piRNA reporter uses germline-specific nos promoter, it is expressed only in germ line cells, so we cannot examine in germless ovarioles.

      Do Sakura mutant germline stem cell clones overgrow relative to wild-type cells in the same ovariole?

      Yes, Sakura mutant GSC clones overgrow. Please compare Fig 6C and Fig S8.

      Does sakura RNAi driven by NGT-Gal4 only cause germless ovaries or does it also cause tumorous phenotypes?

      Fig S10 and Fig S12 show the ovariole phenotypes of sakura RNAi driven by NGT-Gal4. It causes both germless and tumorous phenotypes.

      What happens if the knockdown of Sakura is restricted to adulthood with a Gal80ts?

      Our mosaic clone was induced at the adult stage, so we already have data of adulthood-specific loss of function. Gal80ts does not work well with nos-Gal4.

      (2) The idea that the excessive bam expression in tumorous ovaries is due to a failure of bam repression by dpp signaling is not well-supported by the data. Dpp signaling is activated in a very narrow region immediately adjacent to the niche but the images in Figure 7A show bam expression in cells that are very far away from the niche. Thus, it seems more likely to be due to a failure to turn bam expression off at the 16-cell stage than to a failure to keep it off in the niche region. To determine whether bam repression in the niche region is impaired, it would be important to examine cells adjacent to the niche directly at a higher magnification than is shown in Figure 7A.

      We provided higher magnification images of cells adjacent to the niche in new Fig 7A.

      We found that cells adjacent to the niche also express Bam-GFP.

      That said, we agree with the reviewer. A failure to turn bam expression off at the 16-cell stage may be an additional or even a main cause of bam misexpression in sakura mutant. We added this in the Discussion.

      (3) In addition, several minor comments should be addressed:

      a. Does anti-Sakura work for immunofluorescence?

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies.

      b. Please provide insets to show the phenotypes indicated by the different color stars in Figure 3C more clearly.

      We provided new, higher-magnification images to show the phenotypes more clearly.

      c. Please indicate the frequency of the expression patterns shown in Figure 4D (do all ovarioles in each genotype show those patterns or is there variable penetrance?).

      We indicated the frequency.

      d. An image showing TOskGal4 driving a fluorophore should be provided so that readers can see which cells express Gal4 with this driver combination.

      It has been already done in the paper ElMaghraby et al, GENETICS, 2022, 220(1), iyab179, so we did not repeat the same experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mallimadugula et al. combined Molecular Dynamics (MD) simulations, thiol-labeling experiments, and RNA-binding assays to study and compare the RNA-binding behavior of the Interferon Inhibitory Domain (IID) from Viral Protein 35 (VP35) of Zaire ebolavirus, Reston ebolavirus, and Marburg marburgvirus. Although the structures and sequences of these viruses are similar, the authors suggest that differences in RNA binding stem from variations in their intrinsic dynamics, particularly the opening of a cryptic pocket. More precisely, the dynamics of this pocket may influence whether the IID binds to RNA blunt ends or the RNA backbone.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Strengths:

      The use of extensive Adaptive Sampling combined with biochemical assays clearly points to the opening of the Interferon Inhibitory Domain (IID) as a factor for RNA binding. This type of approach is especially useful to assess how protein dynamics can affect its function.

      Weaknesses:

      Although a connection between the cryptic pocket dynamics and RNA binding mode is proposed, the precise molecular mechanism linking pocket opening to RNA binding still remains unclear.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine whether a cryptic pocket in the VP35 protein of Zaire ebolavirus has a functional role in RNA binding and, by extension, in immune evasion. They sought to address whether this pocket could be an effective therapeutic target resistant to evolutionary evasion by studying its role in dsRNA binding among different filovirus VP35 homologs. Through simulations and experiments, they demonstrated that cryptic pocket dynamics modulate the RNA binding modes, directly influencing how VP35 variants block RIG-I and MDA5-mediated immune responses.

      The authors successfully achieved their aim, showing that the cryptic pocket is not a random structural feature but rather an allosteric regulator of dsRNA binding. Their results not only explain functional differences in VP35 homologs despite their structural similarity but also suggest that targeting this cryptic pocket may offer a viable strategy for drug development with reduced risk of resistance.

      This work represents a significant advance in the field of viral immunoevasion and therapeutic targeting of traditionally "undruggable" protein features. By demonstrating the functional relevance of cryptic pockets, the study challenges long-standing assumptions and provides a compelling basis for exploring new drug discovery strategies targeting these previously overlooked regions.

      Strengths:

      The combination of molecular simulations and experimental approaches is a major strength, enabling the authors to connect structural dynamics with functional outcomes. The use of homologous VP35 proteins from different filoviruses strengthens the study's generality, and the incorporation of point mutations adds mechanistic depth. Furthermore, the ability to reconcile functional differences that could not be explained by crystal structures alone highlights the utility of dynamic studies in uncovering hidden allosteric features.

      Weaknesses:

      While the methodology is robust, certain limitations should be acknowledged. For example, the study would benefit from a more detailed quantitative analysis of how specific mutations impact RNA binding and cryptic pocket dynamics, as this could provide greater mechanistic insight. This study would also benefit from providing a clear rationale for the selection of the amber03 force field and considering the inclusion of volume-based approaches for pocket analysis. Such revisions will strengthen the robustness and impact of the study.

      Reviewer #3 (Public review):

      Summary:

      The authors suggest a mechanism that explains the preference of viral protein 35 (VP35) homologs to bind the backbone of double-stranded RNA versus blunt ends. These preferences have a biological impact in terms of the ability of different viruses to escape the immune response of the host.

      The proposed mechanism involves the existence of a cryptic pocket, where VP35 binds the blunt ends of dsRNA when the cryptic pocket is closed and preferentially binds the RNA double-stranded backbone when the pocket is open.

      The authors performed MD simulation results, thiol labelling experiments, fluorescence polarization assays, as well as point mutations to support their hypothesis.

      Strengths:

      This is a genuinely interesting scientific question, which is approached through multiple complementary experiments as well as extensive MD simulations. Moreover, structural biology studies focused on RNA-protein interactions are particularly rare, highlighting the importance of further research in this area.

      Weaknesses:

      - Sequence similarity between Ebola-Zaire (94% similarity) explains their similar behaviour in simulations and experimental assays. Marburg instead is a more distant homolog (~80% similarity relative to Ebola/Zaire). This difference is sequence and structure can explain the propensities, without the need to involve the existence of a cryptic pocket.  

      - No real evidence for the presence of a cryptic pocket is presented, but rather a distance probability distribution between two residues obtained from extensive MD simulations. It would be interesting to characterise the modelled RNA-protein interface in more detail

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Before assessing the overall quality and significance of this work, this reviewer needs to specify the context of this review. This reviewer's expertise lies in biased and unbiased molecular dynamics simulations and structural biology. Hence, while this reviewer can overall understand the results for thiol-labeling and RNA-binding assays, this review will not assess the quality of these biochemical assays and will mainly focus on the modelling results.

      Overall, the authors present important findings to reveal how the intrinsic dynamics of proteins can influence their binding to molecules and, hence, their functions. They have used extensive biased simulations to characterize the opening of a pocket which was not clearly seen in experimental results - at least when the proteins were in their unbound forms. Biochemical assays further validated theoretical results and linked them to RNA binding modes. Thus, with the combination of biochemical assays and state-of-the-art Molecular Dynamics simulations, these results are clearly compelling.

      Beyond the clear qualities of this work, I would like to mention a few points that may help to better contextualize and rationalize the results presented here.

      - First, both the introduction and discussion sections seem relatively condensed. Extending them to, for example, better describe the methodological context and discuss the methodological limitations and potential future developments related to biased simulations may help the reader get a better idea of the significance of this work.

      - The authors presented 3 homologs in this study: IIDs of Reston, Zaire, and Marburg viruses. While Zaire and Reston are relatively similar in terms of sequence (Figure S1). The sequences clearly differ between Marburg and the two other viruses. Can the author indicate a similarity/identity score for each sequence alignment and extend Figure S1 to really compare Marburg sequence with Reston and Zaire? Can they also discuss how these differences may impact the comparison of the three IIDs? This may also help the reader to understand why sometimes the authors compare the three viruses and why sometimes they are focusing only on comparing Zaire and Reston.

      We would like to thank the reviewer for raising this point and we agree that additional details about the sequence comparison provide more context for the choices of substitutions we made. Therefore, we have updated Fig S1 to include a detailed pairwise comparison of all the IID sequences including the percentage sequence similarity and identity. We have also added the following sentences to the results section where we first introduced the substitutions between Zaire and Reston IIDs

      “While the sequence of Marburg IID differs significantly from Reston and Zaire IIDs with a sequence identity of 42% and 45% respectively (Fig S1), the sequences of Reston and Zaire IID are 88% identical and 94% similar. Particularly, substitutions between these homologs are all distal to the RNA-binding interfaces and all the residues known to make contacts with dsRNA from structural studies are identical. Therefore, we reasoned that comparing these two homologs would help us identify minimal substitutions that control pocket opening probability and allow us to study its effect on dsRNA binding with minimal perturbation of other factors.”

      - In this work, the authors mentioned the cryptic pocket but only illustrated the opening of this pocket by using a simple distance between residues (Figure 2) and a SASA of one cysteine (Figure 3). In previous work done by the authors (Cruz et al. , Nature Communications, 2022), they better characterized residues involved in RNA binding and forming the cryptic pocket. Thus, would it be possible to better described this cryptic pocket (residues involved, volume, etc ..) and better explain how, structurally speaking, it can affect RNA binding mode (blunt ends vs backbone) ?

      We thank the reviewer for pointing out the need for clarification on the residues involved in RNA binding and pocket opening and the mechanism linking them. We have performed the CARDS analysis on Reston and Marburg IID simulations as we had done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section.

      - As a counter-example, the authors used C315 for SASA calculation and thiol labeling (Figure 3). This cysteine is mainly buried as seen by SASA for Reston and Marburg and thiol labelling (Figure 3 E,G,H). Would it be possible to also get thiol labeling rates for Cystein 264 in Reston and its equivalent to see a case where the residue is solvent exposed?

      We have shown the SASA for C264 from the simulations in Fig S4 and the thiol labeling rates for all 4 cysteines in Reston IID in Fig S6. Comparing these rates to the rates of all 4 cysteines obtained for Zaire IID (Fig 4 in Cruz et Al, 2022), we observe that the rates for C264, which is expected to be exposed are significantly faster than those of C315 which is largely buried in all variants.  

      - I strongly support here the will of the authors to share their data by depositing them in an OSF repository. These data help this reviewer to assess some of the results produced by the authors and help to better understand the dynamics of their respective systems. I have just a few comments that need to be addressed regarding these data: o While there are data for WT Reston and Marburg, there is no data for Zaire. Is this because these data correspond to the previous work (Cruz et al. 2022) (in this case, it would be good to make this clear in the main text) or is it an omission? o There is no center.xtc file in the Marburg-MSM directory o There is no protmasses.pdb in the Reston-MSM directory

      - In general, if possible, it would be good to use the same name for each type of file presented in each directory to help a potential user understand a bit more how to use these data.

      - If possible, adding a bit more of metadata and explanations on the OSF webpage would be very beneficial to help find these data. To help in this direction, the authors may have a look to the guidelines presented at the end of this article: https://elifesciences.org/articles/90061

      We thank the reviewer for pointing out the omissions from the OSF repository. We have added the missing files and followed a uniform naming convention. We have also added documentation in the metadata section of the OSF repository to help others use the data.  

      Indeed, the simulation data used for Zaire IID is available on the OSF repository corresponding to Cruz et al. 2022 at https://osf.io/5pg2a. We have also clarified this in the data availability section of the main text.  

      Minor point:

      In Figure 2, there is a slight bump for the 225-295 distance around 1 nm for Reston. Can the author comment it ? As these results are based on long AS, even if very small, do the authors think this population is significant?

      Comparing the probability distributions obtained from bootstrapping the frames used to calculate the MSM equilibrium probabilities (Revised Fig1), we observe that the bump for the Reston IID distribution is persistent in all bootstraps indicating that it might indeed be significant. This is also consistent with our observation that the cysteine 296 does get fully labeled in our thiol labeling experiments, albeit significantly slowly compared to the other homologs.  

      Reviewer #2 (Recommendations for the authors):

      I recommend that the authors implement moderate revisions prior to the publication of this research article, addressing the identified weaknesses (see below).

      The authors should provide a rationale for their selection of the amber03 force field (Duan et al., JCTC 24, 1999-2012, 2003) for molecular dynamics simulations, particularly given the availability of more recent and optimized versions of the AMBER force fields. These newer force fields may offer improved parameterization for biomolecular systems, potentially enhancing the accuracy and reliability of the simulation results.

      We chose the Amber03 force field because it has performed well in much of our past work, including the original prediction of the cryptic pocket that we study in this manuscript. The results presented in this manuscript also demonstrate the predictive power of Amber03.

      Additionally, while the authors utilized solvent-accessible surface area (SASA) for cryptic pocket analysis, volume-based approaches may be more suitable for this purpose. Several studies (e.g., Sztain et al. J. Chem. Inf. Model. 2021, 61, 7, 3495-3501) have demonstrated the utility of volume analysis in identifying and characterizing cryptic pockets. The authors could consider incorporating such methodologies to provide a more comprehensive assessment of pocket dynamics.

      The authors propose that the cryptic pocket is not merely a random structural feature but functions as an allosteric regulator of dsRNA binding. To further substantiate this claim, an in-depth analysis of this allosteric effect using for instance network analysis could significantly enhance the study. Such an approach could identify key residues and interaction networks within the protein that mediate the allosteric regulation. This type of mechanistic insight would not only provide a stronger theoretical framework but also offer valuable information for the rational design of therapeutic interventions targeting the cryptic pocket.  

      We thank the reviewer for pointing out the need for clarification on the molecular mechanism linking the opening of the cryptic pocket to RNA binding. We have performed the CARDS analysis on Reston and Marburg IID simulations as was done on Zaire IID simulations in Cruz et al, 2022. The results are shown in Fig S3 and discussed in the main text in the first results section. Briefly, we do find a community (blue) comprising the pocket residues in Reston and Marburg IIDs as we did in Zaire. Similarly, we find that many of the RNA binding residues fall into the orange and green communities as in Zaire. However, there are differences in exactly which residues are clustered into which of these two communities. There are also differences in how strongly connected these communities are in the three homologs. Therefore, while we can conclude that pocket residues likely have varying influence on the RNA binding residues in the homologs, it is hard to say exactly what that variation is from this analysis alone.  

      Reviewer #3 (Recommendations for the authors):

      - MD simulations: All simulations were initialised from the 3 crystal structures, is it correct? In all cases, RNA ds was not included in simulations, right? Were crystallographic MG ions in the vicinity of the binding site included? these are known to influence structural dynamics to a large extent.

      All simulations were indeed initialized using only protein atoms from the crystal structures 3FKE, 4GHL, and 3L2A. Therefore, crystallographic Mg ions were not included in the simulations. However, we do agree with the reviewer and think that the effect of parameters such as salt concentration, specifically Mg ions which are known to be important for the stability of dsRNA, on the pocket opening equilibrium merits detailed study in future work.

      - Figure 2: Would it be possible to perform e.g. a block error analysis and show the statistical errors of the distributions?

      We agree that showing the statistical variation in the MSM equilibrium probabilities is important for comparing the different distributions. Therefore, we have updated Figs 2 and 5 to show the distributions obtained from MSMs constructed using 100 and 10 random samples of the data respectively to indicate the extent of the statistical variability in the MSM construction.  

      - More detailed structural biology experiments (such as NMR or HDX-MS) could potentially shed more light on the differential behaviour of the three different homologs, providing more evidence for the presence of the cryptic pocket.

      We agree that NMR and HDX-MS are powerful means to study dynamics and are actively exploring these approaches for our future work.

    1. The language was important invention

      Believes in the same type of religions and custums so that people that we ahvnet met are still recognizable wen we run into them it too thousands of years with the invention of art and language before you were talking you were making arting

      hi my name is Alan Kay and I like to apologize for having a bit of laryngitis just on the day of this shoot and I've been asked to talk about inventing the future and of course we mostly think of inventing in the realm of technology but I think most people watching this will have been struck by the fact that living in the 21st century in the United States is a vastly different experience than living a hundred thousand years ago anywhere in the world and as far as we know the brains that we have are roughly the same as those brains that belong to the very same species we are mostly lived in small groups of people hunting and gathering and falling in love and telling stories to each other and fighting other people taking revenge caring for the young and gradually building up a culture that they taught to the next generation in their tribe and the first great invention of human beings or of evolution was this idea of culture and it came from a slightly earlier invention of evolution which was language and there's a language that just as a few important ways more different than our primate ancestors and that was enough to be able to deal with sequences of things and portrayals of things and being able to make up things which we no other animals can do but be able to tell our made-up things to other people and get them to believe in it that started to allow us to aggregate together in larger than about a hundred people which is what we can deal with face to face so this notion of culture beliefs in the same kinds of religions beliefs in the same kinds of customs is something that can spread so that people we've never met are still recognizable when we finally do run into them so we can think about that as the first great inventing the future for Humanity and it took many tens of thousands of years with the invention art which may have always been with us in between before we started taking stock of the world around us and starting

    1. Author response:

      The following is the authors’ response to the current reviews.

      We wanted to clarify Reviewer #1’s latest comment in the last round of review, “Furthermore, the referee appreciates that the authors have echoed the concern regarding the limited statistical robustness of the observed scrambling events.” We appreciate the follow up information provided from Reviewer #1 that their comment is specifically about the low count alternative pathway events that we view at the dimer interface, and not the statistics of the manuscript overall as they believe that “the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations (Reviewer #1)”. We agree with the Reviewer and acknowledge that overall our coarse-grained study represents the most comprehensive single manuscript of the entire TMEM16 family to date.


      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca2+-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca2+, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca2+, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca2+ and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca2+-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca2+-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca2+-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca2+-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good, standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      Answer: It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca2+-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      While we agree with what the reviewer may be hinting at regarding limitations of coarse-grained MD simulations, we believe that our study holds much more merit than this comment suggests. We have provided something that has yet to be done in the field: a comprehensive study that directly compares the scrambling rates of multiple TMEM16 family members in different conformations using identical simulation conditions. Our work clearly shows that a sufficiently dilated grooves is the major structural feature that enables robust scrambling for all TMEM16 scramblases members with solved structures. While all TMEM16s cause significant distortion and thinning of the membrane, we assert that the extreme thinning observed around open grooves is significantly enhanced by the lipid scrambling itself as the two leaflets merge through lipid exchange.  We saw no evidence that membrane thinning/distortion alone, in the absence of an open groove, could support scrambling at the rates observed under activating conditions or even the low rates observed in Ca2+-independent scrambling. Moreover, our handful of observations of scrambling events outside of the groove, which has not yet been reported in any study, opens an exciting new direction for studying alternative scrambling mechanisms. That said, we are currently following up on many of the observations reported here such as: scrambling events outside the groove, the kinetics of scrambling, the possibility that lipids line the groove of non-scramblers like TMEM16A, etc. This is being done experimentally with our collaborators through site directed mutagenesis and with all-atom MD in our lab. Unfortunately, it is well beyond the scope of the current study to include all of this in the current paper.

      Reviewer #2 (Recommendations for the authors):

      Major comments and questions:

      (1) Line 214 and Figure 1- Figure Supplement 1: why have you only compared the final frame of the trajectory to the cryo-EM structure? Even if these comparisons are qualitative, they should be representative of the entire trajectory, not a single frame.

      We thank the reviewer for this suggestion and replaced the single-frame snapshots in Figure 1-figure supplement 1 for ensemble-averaged head groups densities. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

      (2) Lines 228-231: You comment 'Residues in this site on nhTMEM16 and TMEMF also seem to play a role in scrambling but the mechanism by which they do so is unclear.' This is something you could attempt to quantify in the simulations by calculating the correlation between scrambling and protein-membrane interactions/contacts in this site. Can you speculate on a mechanism that might be a contributing factor?

      We probed the correlation between these residues and scrambling lipids, as suggested by the reviewer, and interestingly not all scrambling lipids interact with these residues. Yet there is strong lipid density in this vicinity (see insets in Figure 1 and Figure 4-figure supplement 2). These observations lead us to suspect these residues impact scrambling indirectly through influencing the conformation of the protein or flexibility and shape of the membrane. This interpretation fits with mutagenesis studies highlighting a role for these residues in scrambling (see refs 59, 62, and 67). Specifically, Falzone et al. 2022 (ref 59) suggested that they may thin the membrane near the groove, but this has not been tested via structure determination and a detailed model of how they impact scrambling is missing. We could address this question with in silico mutations; however, CG simulation is not an appropriate method to study large scale protein dynamics, and AA simulations are likely best, but beyond the scope of this paper.

      (3) Lines 240-245 and Figure 1B: This section discusses the coupling between membrane distortions and the sinusoidal curve around the protein, however, Figure 1B only shows snapshots of the membrane distortions. Is it possible to understand how these two collective variables are correlated quantitatively (as opposed to the current qualitative analysis)?

      We believe that it may be possible to quantitatively capture these two key features of the membrane, as we did previously with nhTMEM16 using our continuum elasticity-based model of the membrane (Bethel and Grabe 2016). Our model agreed with all atom MD surfaces to within ~1 Å, hence showing good quantitative agreement throughout the entire membrane. However, we doubt that we could distill the essence of our model down to a simple functional relationship between the sinusoidal wave and pinching, which we think the reviewer is asking. Rather, we believe that the large-scale sinusoidal distortion (collective variable 1) and pinching/distortion (collective variable 2) near the groove arise from the interplay of the specific protein surface chemistry for each protein (patterning of polar and non-polar residues) and the membrane. This is why we chose to simply report the distinct patterns that the family members impose on the surrounding membrane, which we think is fascinating. Specifically, Fig. 1B shows that different TMEM16 family members distort the membrane in different ways. Most notably, fungal TMEM16s feature a more pronounced sinusoidal deformation, whereas the mammalian members primarily produce local pinching. Then, in Fig. 3A we show that the thinning at the groove happens in all structures and is more pronounced in open, scrambling-competent conformations. In other words, proteins can show very strong thinning (e.g. TMEM16K, 5OC9) even though the membrane generally remains flat.

      (4) Lines 257-258: Authors comment that TMEM16A lacks scramblase activity yet can achieve a fully lipid-lined groove (note the typo - should be lipid-lined, not lipid-line). Is a fully lipid-lined groove a prerequisite for scramblase activity? Are lipid-lined grooves the only requirement for scramblase activity? Could the authors clarify exactly what the prerequisite for scramblase activity is to avoid any confusion; this will be useful for later descriptions (i.e. line 295) where scrambling competence is again referred to. Additionally, the associated figure panel (Figure 1D) shows a snapshot of this finding but lacks any statistical quantifications - is a fully lipid-lined groove a single event? Perhaps the additional analyses, such as the groove-lipid contacts, may be useful here.

      The definition of lipid scrambling is that a lipid fully transitions from one membrane leaflet to the other. While a single lipid could transition through the groove on its own, it is well documented in both atomistic and CG MD simulations, that lipid scrambling typically happens through a lipid-lined groove, as shown in Fig. 1A-B. The lipids tend to form strong choline-to-phosphate interactions with nearest neighbors that make this energetically favorable. That said, lipid-lined grooves are not sufficient for robust scrambling, which is what we show in Fig. 1D where the non-scrambler TMEM16A did in fact feature a lipid-lined groove. As suggested, we performed contact analysis and found that residue K645 on TM6 in the middle of the groove contacts lipids in 9.2% of the simulation frames.

      To get a better understanding of how populated the TM4-TM6 pathway is with lipids across all simulated structures, we determined for every simulation frame how many headgroup beads resided in the groove. This indicates that the ion-conductive state of TMEM16A (5OYB*, Fig. 1D) only had 1 lipid in the pathway, on average, meaning that the configuration shown Fig. 1D is indeed exceptional. As a reference, our strongest scrambler nhTMEM16 4WIS, had an average of 2.8 lipids in the groove. We added a table containing the means and standard deviations that resulted from this analysis as Figure 1-Table supplement 1.

      (5) Lines 295-298 : The scrambling rates of the Ca²⁺-bound and Ca²⁺-free structures fall within overlapping error margins, it becomes difficult to definitively state that Ca²⁺ binding significantly enhances scrambling activity. This undermines the claim that the Ca²⁺-bound structure is the strongest scrambler. The authors should conduct statistical analyses to determine if the difference between the two conditions is statistically significant.

      In contrast to the reviewer’s comment, we do not claim that Ca2+-binding itself enhances lipid scrambling. Instead, what we show is that WT structures that are solved in an open confirmation (all of which are Ca2+-bound, except 6QM6) are robust scramblers. For nhTMEM16, we did not observe any scrambling events for the closed-groove proteins, making further statistical analysis redundant.

      (6) The authors claim that the scrambling rates derived from their MD simulations are in "excellent agreement" with experimental findings (lines 294-295), despite significant discrepancy between simulated and experimentally measured rates. For example, the simulated rate of 24.4 {plus minus} 5.2 events/µs for the open, Ca²⁺-bound fungal nhTMEM16 (PDB ID 4WIS) corresponds to approximately 24 million events per second, which is vastly higher than experimental rates. Experimental studies have reported scrambling rate constants of ~0.003 s⁻¹ for TMEM16 family members in the absence of Ca²⁺, measured under physiological conditions (https://doi.org/10.1038/s41467-019-11753-1 ). Even with Ca²⁺ activation, scrambling rates remain several orders of magnitude lower than the rates observed in simulations. Moreover, this highlights a larger problem: lipid scrambling rates occur over timescales that are not captured by these simulations. While the authors elude to these discrepancies (lines 605-606), they should be emphasised in the text, as opposed to the table caption. These should also be reconducted to differences between the membrane compositions of different studies.

      We agree with the spirit of the reviewer’s comment, and because of that, we were very careful not to claim that we reproduce experimental scrambling rates, just that the trends (scrambling-competent, or not) are correct. On lines 294-295, we actually said that the scrambling rates in our simulations excellently agree with “the presumed scrambling competence of each experimental structure”, which is true. 

      As explained extensively in the discussion section of our paper (and by many others), direct comparison between MD (e.g., Martini 3, but also atomistic force fields) dynamics and experimental measurements is challenging. The primary goal of our paper is to quantify and compare the scrambling capacity of different TMEM16 family members and different states, within a CGMD context.

      That said, we agree with the reviewer that we may have missed rare or long-timescale events (as is the case in any MD experiment) and added this point to the discussion.

      (7) To address these discrepancies, the authors should: i) emphasize that simulated rates serve as qualitative indicators of scrambling competence rather than absolute values comparable to experimental findings and ii) discuss potential reasons for the divergence, such as simulation timescale limitations or lipid bilayer compositions that may favor scrambling and force field inaccuracies.

      Please see our answer to question 6. Within the context of our CGMD survey, we confidently call our results quantitative. However, we agree with the reviewer that comparison with experimental scrambling rates is qualitative and should be interpreted with caution. To reflect this, we rewrote the first sentence of the relevant paragraph in the discussion section.

      (8) Line 310: Can the authors provide a rationale as to why one monomer has a wider groove than the other? Perhaps a contact analysis could be useful. See the comment above about ENM.

      The simulation of Ca2+-bound TMEM16K was initiated from an asymmetric X-ray structure in which chain B features a more dilated groove than chain A (PDB 5OC9). The backbones of TM4 and TM6 in the closed groove (A) are close enough together to be directly interconnected by the elastic network. In contrast, TM4 and TM6 in the more dilated subunit (B) are not restricted by the elastic network and, as a consequence, display some “breathing” behavior (Fig. 3B and Fig. 3-Suppl. 6A), giving rise to a ~4x higher scrambling rate. We explicitly added the word “cryo-EM” and the PDB ID to the sentence to emphasize that the asymmetry stems from the original experimental structure.

      When answering this question, we also corrected a mislabeled chain identifier which was in the original manuscript ‘chain A’ when it is actually ‘chain B’ in Fig.2-Suppl. 3A.

      (9) Line 312: Authors speculate that increased groove width likely accounts for increased scrambling rates. For statistical significance, authors should attempt to correlate scrambling rates and groove width over the simulation period.

      The Reviewer is referring to our description of scrambling rates we measured for TMEM16K where we noted that on average the groove with the highest scrambling rate is also on average wider than the opposite subunit which is below 6 Å. We do not suggest that the correlation between scrambling and groove width is continuous, as the Reviewer may have interpreted from our original submission, but we think it is a binary outcome – lipids cannot easily enter narrow grooves (< 6 Å) and hence scrambling can only occur once this threshold is reached at which point it occurs at a near constant rate. We showed this for 4 different family members in the original Fig. 3B, where scrambling events (black dots) were much more likely during, or right after, groove dilation to distances > 6 Å. 

      (10) Line 359: Authors have plotted the minimum distance between residues TM4 and TM6 in Fig. 3A/B, claiming that a wide groove is required for scrambling. Upon closer examination, it is clear that several of these distributions overlap, reducing the statistical significance of these claims. Statistical tests (i.e. KS-tests) should be performed to determine whether the differences in distributions are significant.

      The Reviewer appears to be asking for a statistical test between the six distance distributions represented by the data in Fig. 3A for the scrambling competent structures (6QP6*, 8B8J, 6QM6, 7RXG, 4WIS, 5OC9), and we think this is being asked because it is believed that we are making a claim that the greater the distance, the greater the scrambling rate. If we have interpreted this comment correctly, we are not making this claim. Rather, we are simply stating that we only observe robust scrambling when the groove width regularly separates beyond 6 Å. The full distance distributions can now be found in Figure 3-figure supplement 6B, and we agree there is significant overlap between some of these distributions. However, the distinguishing characteristic of the 6 distributions from scrambling competent proteins is that they all access large distances, while the others do not. Notably, TMEM16F proteins (6QP6*, 8B8J) are below the 6 Å threshold on average, but they have wide standard deviations and spend well over ¼ of their time in the permissive regime (the upper error bar in the whisker plots in Fig. 3A is the 75% boundary).

      (11) Line 363-364: The authors state that all TMEM16 structures thin the membrane. Could the authors include a description of how membrane thinning is calculated, for instance, is the entire membrane considered, or is thinning calculated on a membrane patch close to the protein? Do membrane patches closer to the transmembrane protein increase or decrease thickness due to hydrophobic packing interactions? The latter question is of particular concern since Martini3 has been shown to induce local thinning of the membrane close to transmembrane helices, yielding thicknesses 2-3 Å thinner than those reported experimentally (https://doi.org/10.1016/j.cplett.2023.140436). This could be an important consideration in the authors' comparison to the bulk membrane thickness (line 364). Finally, how is the 'bulk membrane thickness' measured (i.e., from the CG simulations, from AA simulations, or from experiments)?

      Regarding the calculation of thinning and bulk membrane thickness, as described in Method “Quantification of membrane deformations”, the minimal membrane thickness, or thinning, is defined as the shortest distance between any two points from the interpolated upper and lower leaflet surfaces constructed using the glycerol beads (GL1 and GL2). Bulk membrane thickness is calculated by taking the vertical distance between the averaged glycerol surfaces at the membrane edge.

      The concern of localized membrane deformation due to force field artifacts is well-founded. However, the sinusoidal deformations shown here are much greater than 2-3 Å Martini3 imperfections, and they extend for up to 10 Å radially away from the protein into the bulk membrane (see Figure 3-figure supplement 1-5 for more of a description). Most importantly, the sinusoidal wave patterns set up by the proteins is very similar to those described in the previous continuum calculation and all-atom MD for nhTMEM16 (https://www.pnas.org/doi/full/10.1073/pnas.1607574113).

      (12) Line 374: The authors state a 'positive correlation' between membrane thinning/groove opening and scrambling rates. To support this claim, the authors should report. the correlation coefficients.

      We have removed any discussion concerning correlations between the magnitude of the scrambling rate and the degree of membrane thinning/groove opening. Rather we simply state that opening beyond a threshold distance is required for robust scrambling, as shown in our analysis in Fig. 3A.

      Concerning the relation between thinning and scrambling: Instantaneous membrane thinning is poorly defined (because it is governed by fluctuations of single lipids), and therefore difficult to correlate with the timing of individual scrambling events in a meaningful way.  Moreover, as we state later in that same section, “we argue that the extremely thin membranes are likely correlated with groove opening, rather than being an independent contributing factor to lipid scrambling”.

      (13) Line 396: It is stated that TMEM16A is not a scramblase but the simulating scrambling activity is not zero. How can you be sure that you are monitoring the correct collective variable if you are getting a false positive with respect to experiments?

      We only observe 2 scrambling events in 10 ms, which is a very small rate compared to the scrambling competent states. In a previous large survey Martini CG simulation study that inspired our protocol (Li et al, PNAS 2024), they employed a 1 event/ms cut-off to distinguish scramblers from non-scramblers. Hence, they would have called TMEM16A a non-scrambler as well. We expect that false negatives in this context might be an artifact of the CG forcefield, or it could be that TMEM16A can scramble but too slowly to be experimentally detected. Regarding the collective variable for lipid flipping, it is correct, and we know that this lipid actually flipped.

      (14) Line 402: Distance distributions for the electrostatic interactions between E633 and K645 should be included in the manuscript. This is also the case for the interactions between E843-K850 (lines 491-492).

      Our description of interactions between lipid headgroups and E633 and K645 in TMEM16A (5OYB*) are based on qualitative observations of the MD trajectory, and we highlight an example of this interaction in Figure 3-video 4. The video clearly shows that the lipid headgroups in the center of the groove orient themselves such that the phosphate bead (red) rests just above K645 (blue) and at other times the choline bead (blue) rests just below E633 (red). We do not think an additional plot with the distance distributions between lipids and these residues will add to our understanding of how lipids interact residues in the TMEM16A pore.

      We made a similar qualitative observation for the interaction between the POPC choline to E843 and POPC phosphate to K850 while watching the AAMD simulation trajectory of TMEM16F (PDB ID 6QP6). Given that this was a single observation, and the same interactions does not appear in CG simulation of the same structure (see simulation snapshots in Figure 4-figure supplement 5) we do not think additional analysis would add significantly to our understanding of which residues may stabilize lipids in the dimer interface.

      (15) Lines 450-451: 'As the groove opens, water is exposed to the membrane core and lipid headgroups insert themselves into the water-filled groove to bridge the leaflets.' Is this a qualitative observation? Could the authors report the correlation between groove dilation and the number of water permeation events?

      Yes, this is qualitative, and it sketches the order of events during scrambling, and we revised the main text starting at line 450 to indicate this. As illustrated by the density isosurfaces in Appendix 1-Figure 2A, the amount of water found in the closed versus open grooves is striking – there is a significant flood of water that connects the upper and lower solutions upon groove opening. Moreover, Appendix 1-Figure 2B shows much greater water permeation for open structures (4WIS, 7RXG, 5OC9, 8B8J, …) compared to closed structures (6QMB, 6QMA, 8B8Q, and many of the non-labeled data in the figure that all have closed grooves and near 0 water permeation). A notable exception is TMEM16A (7ZK3*8), which has water permeation but a closed groove and little-to-no lipid scrambling.

      Minor Comments:

      (1) Inconsistent use of '10' and 'ten' throughout.

      We like to kindly point out that we do not find examples of inconsistent use.

      (2) Line 32: 'TM6 along with 3, 4 and 5...' should be 'TM6 along with TM3, TM4 and TM5...'. Same in line 142. Naming should stay consistent.

      Changes are reflected in the updated manuscript.

      (3) Line 141: do you mean traverse (i.e. to travel across)? Or transverse (i.e. to extend across the membrane)?

      This is a typo. We meant “traverse”. Thanks for pointing it out.

      (4) Line 142: 'greasy' should be 'strongly hydrophobic'.

      Changes are reflected in the updated manuscript.

      (5) Line 143-144: "credit card mechanism" requires quotation marks.

      Changes are reflected in the updated manuscript.

      (6) Line 144: state if Nectria haematococca is mammalian or fungal, this is not obvious for all readers.

      Changes are reflected in the updated manuscript.

      (7) Line 147-148: Is TMEM16A/TMEM16K fungal or mammalian? What was the residue before the mutation and which residue is mutated? Perhaps the nomenclature should read as TMEM16X10Y where X=the residue prior to the mutation, 10 is a placeholder for the residue number that is mutated and Y=the new residue following mutation.

      “TMEM16” is the protein family. “A” denotes the specific homolog rather than residue.  

      (8) Lines 157-158: same as 10, it is unclear if these are fungal or mammalian.

      Clarifications added.

      (9) Line 184: "...CGMD simulation" should be "...CGMD simulations".

      Changes made.

      (10) Line 191-192: It would help to create a table of all of the mutants (including if they are mammalian or fungal) summarizing the salt concentrations, lipid and detergent environments, the presence of modulators/activators, etc.

      We added this information to Appendix 1-Table 1 in the supplemental information. We did not specify NaCl concentrations, because they all experimental procedures used standard physiological values for this (100-150 mM).

      (11) Line 210: inconsistencies with 'CG' and 'coarse-grain'.

      Changes made.

      (12) Figure 1 caption: '...totaling ~2μs (B)...' is missing the fullstop after 2μs.

      Changes made.

      (13) Figure 1B: it may be useful to label where the Ca2+ ion binds or include a schematic.

      We updated Fig. 1A to illustrate where Ca2+ binds.

      (14) Line 311: Are these mean distances? The authors should add standard deviations.

      Yes, they are. We added the standard deviations to the text.

      (15) Line 321-322: Perhaps a schematic in Figure 2 would be useful to visualize the structural features described here.

      We would kindly refer interested readers to reference [60].

      (16) Line 377: '...are likely a correlate of groove opening...' should read as: '...are likely correlated to groove opening...'.

      Thank you for pointing it out. Changes made.

      (17) Line 398: the '...empirically determined 6Å threshold for scrambling.' Was this determined from the simulations or from experiments? What does "empirically" mean here? Please state this.

      This value was determined from the simulations. Based on our analysis of the correlation between scrambling rate and groove dilation, we found that the minimal TM4/6 distance of 6 Å can distinguish between the high and low activity scramblers. The exact numerical value is somewhat arbitrary as there is a range of values around 6 Å that serve to distinguish scramblers from non-scramblers.

      (18) Figure 4: This figure should be labelled as A, B, C and D, with the figure caption updated accordingly.

      We updated Figure 4 and its caption.

      Reviewer #3 (Recommendations for Authors):

      The authors must do additional simulations to further validate their claim with different lipids and further substantiate dimer interface independent of Ca2+ ions.

      Thank you for the suggestion. We completely agree that studying scrambling in the context of a diverse lipid environment is an exciting area to explore. We are indeed actively working on a project that shares the similar idea. We decided not to include that study because we think the additional discussion involved would be excessive for the current manuscript. We, however, look forward to publishing our findings in a separate manuscript in the near future. In terms of Ca2+-independent scrambling, we are planning with our experimental collaborator for mutagenesis studies that target the residues we identified along the dimer interface.

      Since calcium ions are critical for the stability of these structures, authors should show that they were placed throughout the simulations consistently.

      As stated in the method section “Coarse-grained system preparation and simulation detail”, all Ca2+ ions are manually placed into the coarse-grained structure from the beginning of the simulation at their identical corresponding position in the experimental structure and harmonically bonded to adjacent acidic residues throughout the duration of simulation. We have also added a label to Fig 1A to indicate where the two Ca2+ ions are located.

      The comparison with experimental structures should be consistent with complete simulation, and not the last structure of the trajectory. Depending on the conformational variability, this might be misleading.

      We agree and updated Fig. 1-supplement figure 1 accordingly. The overall agreement between membrane shapes in CGMD and cryo-EM was not affected by this change.

    1. Author response:

      Reviewer #1 (Public Review):

      In this manuscript, Tran et al. investigate the interaction between BICC1 and ADPKD genes in renal cystogenesis. Using biochemical approaches, they reveal a physical association between Bicc1 and PC1 or PC2 and identify the motifs in each protein required for binding. Through genetic analyses, they demonstrate that Bicc1 inactivation synergizes with Pkd1 or Pkd2 inactivation to exacerbate PKD-associated phenotypes in Xenopus embryos and potentially in mouse models. Furthermore, by analyzing a large cohort of PKD patients, the authors identify compound BICC1 variants alongside PKD1 or PKD2 variants in trans, as well as homozygous BICC1 variants in patients with early-onset and severe disease presentation. They also show that these BICC1 variants repress PC2 expression in cultured cells.

      Overall, the concept that BICC1 variants modify PKD severity is plausible, the data are robust, and the conclusions are largely supported. However, several aspects of the study require clarification and discussion:

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of this interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation. As we do not have yet experimental data to support this model, we have not included this model in the manuscript. Yet, we will update the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. The group of Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require us utilize some of the mice described in above reference, which is beyond the scope of this manuscript. We, however, will revise the discussion to elaborate on this potential mechanism.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, like the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed until we sacrificed the mice at P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing use to the reference showing the heterozygous mice show glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that the better understanding the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are unfortunately beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Tran and colleagues report evidence supporting the expected yet undemonstrated interaction between the Pkd1 and Pkd2 gene products Pc1 and Pc2 and the Bicc1 protein in vitro, in mice, and collaterally, in Xenopus and HEK293T cells. The authors go on to convincingly identify two large and non-overlapping regions of the Bicc1 protein important for each interaction and to perform gene dosage experiments in mice that suggest that Bicc1 loss of function may compound with Pkd1 and Pkd2 decreased function, resulting in PKD-like renal phenotypes of different severity. These results led to examining a cohort of very early onset PKD patients to find three instances of co-existing mutations in PKD1 (or PKD2) and BICC1. Finally, preliminary transcriptomics of edited lines gave variable and subtle differences that align with the theme that Bicc1 may contribute to the PKD defects, yet are mechanistically inconclusive.

      These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed.

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been.

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. Most of the criticisms raised by the reviewer will be easily addressed in the revised version of the manuscript. Yet, none of the critiques raised by the reviewer seems to directly impact the overall interpretation of the data.

      Reviewer #3 (Public Review):

      Summary:

      This study investigates the role of BICC1 in the regulation of PKD1 and PKD2 and its impact on cytogenesis in ADPKD. By utilizing co-IP and functional assays, the authors demonstrate physical, functional, and regulatory interactions between these three proteins.

      Strengths:

      (1) The scientific principles and methodology adopted in this study are excellent, logical, and reveal important insights into the molecular basis of cystogenesis.

      (2) The functional studies in animal models provide tantalizing data that may lead to a further understanding and may consequently lead to the ultimate goal of finding a molecular therapy for this incurable condition.

      (3) In describing the patients from the Arab cohort, the authors have provided excellent human data for further investigation in large ADPKD cohorts. Even though there was no patient material available, such as HUREC, the authors have studied the effects of BICC1 mutations and demonstrated its functional importance in a Xenopus model.

      Weaknesses:

      This is a well-conducted study and could have been even more impactful if primary patient material was available to the authors. A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected before the two patients with the BICC1 p.Ser240Pro mutation passed away. To address this missing link, we have – as a first pass - generated HEK293T cells carrying the BICC1 p.Ser240Pro variant. While these admittingly are not kidney epithelial cells, they indeed show a reduced level of PC2 expression. These data are shown in the manuscript. We have not yet addressed how this relates to its crosstalk with miR-17.

      Conclusion:

      The authors achieve their aims. The results reliably demonstrate the physical and functional interaction between BICC1 and PKD1/PKD2 genes and their products.

      The impact is hopefully going to be manifold:

      (1) Progressing the understanding of the regulation of the expression of PKD1/PKD2 genes.

      (2) Role of BiCC1 in mir/PKD1/2 complex should be the next step in the quest for a modifiable therapeutic target.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      In their manuscript de las Mercedes Carro et al investigated the role of Ago proteins during spermatogenesis by producing a triple knockout of Ago 1, 3 and 4. They first describe the pattern of expression of each protein and of Ago2 during the differentiation of male germ cells, then they describe the spermatogenesis phenotype of triple knockout males, study gene deregulation by scRNA seq and identify novel interacting proteins by co-IP mass spectrometry, in particular BRG1/SMARCA4, a chromatin remodeling factor and ATF2 a transcription factor. The main message is that Ago3 and 4 are involved in the regulation of XY gene silencing during meiosis, and also in the control of autosomal gene expression during meiosis. Overall the manuscript is well written, the topic, very interesting and the experiments, well-executed. However, there are some parts of the methodology and data interpretation that are unclear (see below).

      Major comments

      1= Please clarify how the triple KO was obtained, and if it is constitutive or specific to the male germline. In the result section a Cre (which cre?) is mentioned but it is not mentioned in the M&M. On Figure S1, a MICER VECTOR is shown instead of a deletion, but nothing is explained in the text nor legend. Could the authors provide more details in the results section as well as in the M&M ? This is essential to fully interpret the results obtained for this KO line, and to compare its phenotype to other lines (such as lines 184-9 Comparison of triple KO phenotype with that of Ago4 KO). Also, if it is a constitutive KO, the authors should mention if they observed other phenotypes in triple KO mice since AGO proteins are not only expressed in the male germline.

      Response: We apologize for omitting this vital information. We have now incorporated a more detailed description of how the Ago413 mutant was created in the results and M&M sections (line 120 and 686 respectively).

      As mentioned in the manuscript, Ago4, Ago1 and Ago3 are widely expressed in mammalian somatic tissues. Mutations or deletions of these genes does not disrupt development; however, there is limited research on the impact of these mutations in mammalian models in vivo. In humans, mutations in Ago1 and Ago3 genes are associated with neurological disorders, autism and intellectual disability (Tokita, M.J.,et al. 2015- doi: 10.1038/ejhg.2014.202., Sakaguchi et al. 2019- doi: 10.1016/j.ejmg.2018.09.004, Schalk et al 2021- doi: 10.1136/jmedgenet-2021-107751). In mouse, global deletion of Ago1 and Ago3 simultaneously was shown to increase mice susceptibility to influenza virus through impaired inflammation responses (Van Stry et al 2012- doi.org/10.1128/jvi.05303-11). Studies performed in female Ago413 mutants (the same mutant line used herein) have shown that knockout mice present postnatal growth retardation with elevated circulating leukocytes (Guidi et al 2023- doi: 10.1016/j.celrep.2023.113515). Other studies of double conditional knockout of Ago1 and Ago3 in the skin associated the loss of these Argonautes with decreased weight of the offspring and severe skin morphogenesis defects (Wang et al 2012- doi: 10.1101/gad.182758.111). In our study, we did not observe major somatic or overt behavioral phenotypes, and we did not observe statistical differences in body weights of null males compared to WT as shown in figure below.

      2= The paragraph corresponding to G2/M analysis is unclear to me. Why was this analysis performed? What does the heatmap show in Figure S4? What is G2/M score? (Fig 2D). Lines 219-220, do the authors mean that Pachytene cells are in a cell phase equivalent to G2/M? All this paragraph and associated figures require more explanation to clarify the method and interpretation.

      __Response: __We have modified the methods to include more information about how the cell cycle scoring used in Figures 2D and S4 were calculated and will add more information regarding the interpretation of these figures.

      3= I have concerns regarding Fig2G: to be convincing the analysis needs to be performed on several replicates, and, it is essential to compare tubules of the same stage - which does not seem to be the case. This does not appear to be the case. Besides, co (immunofluorescent) staining with markers of different cell types should be shown to demonstrate the earlier expression of some markers and their colocalization with markers of the earlier stages.

      __Response: __We agree with the Reviewer. New images with staged tubules will be added to the analysis of Figure 2G.

      4= one important question that I think the authors should discuss regarding their scRNAseq: clusters are defined using well characterized markers. But Ago triple KO appears to alter the timing of expression of genes... could this deregulation affects the interperetation of scRNAseq clusters and results?

      __Response: __We thank the reviewer for this suggestion and agree that including this information is important. We expect that, at most, this dysregulation impacts the edges of these clusters slightly. Given that marker genes that have been used to define cell types in these data are consistently expressed between the knockout and wildtype mice (see Figure S4A), we do not think that the cells in these clusters have different identities, just dysregulated expression programs. We have added the relevant sentence to the discussion, and will include additional supplemental figure panels to document this point more comprehensively.

      5= XY gene deregulation is mentioned throughout the result section but only X chromosome genes seem to have been investigated.... Even the gene content of the Y is highly repetitive, it would be very interesting to show the level of expression of Y single copy and Y multicopy genes in a figure 3 panel.

      __Response: __We agree with the reviewer that including analysis of Y-linked genes is important. We will add a supplemental figure which includes the Y:Autosome ratio and differential expression analysis.

      6= Can the authors elaborate on the observation that X gene upregulation is visible in the KO before MSCI; that is in lept/zygotene clusters (and in spermatogonia, if the difference visible in 3A is significant?)

      Response: We do see that X gene expression is upregulated before pachynema. Previous scRNA-seq studies that have looked at MCSI have seen that silencing of genes on the X and Y chromosomes starts before the cell clusters that are defined as pachynema, though silencing is not fully completed until pachynema. We have clarified this point in the manuscript.

      7 = miRNA analysis: could the authors indicate if X encoded miRNA were identified and found deregulated? Because Ago4 has been shown to lead to a downregulation of miRNA, among which many X encoded. It is therefore puzzling to see that the triple KO does not recapitulate this observation. Were the analyses performed differently in the present study and in Ago4 KO study?

      __Response: __The analysis identifying downregulation of miRNA in the original Ago4 mutant analysis was conducted relative to total small RNA expression. Amongst those altered miRNA families in the Ago4 mutants, we demonstrated both upregulation and downregulation of miRNA. We agree that confirming a similar global downregulation of miRNA counts compared to other small RNAs is important. Therefore, in a revised manuscript, we will add this information to the miRNA analysis section, especially highlighting the X chromosome-associated miRNAs, as well as whether the ratios between other small RNA classes change.

      8 = The last results paragraph would also benefit from some additional information. It is not clear why the authors focused on enhancers and did not investigate promoters (or maybe they were but it's unclear). Which regions (size and location from TSS) were investigated for motif enrichment analyses? To what correspond the "transcriptional regulatory regions previously identified using dREG" mentioned in the M&M? I understand it's based on a previous article, but more info in the present manuscript would be useful.

      Response: We thank the reviewer for this suggestion. The regions that were used for motif enrichment will be included as a supplementary information in the fully revised manuscript. We have also clarified in the methods that these transcriptional regulatory regions were downloaded from GEO and obtained from previous ChRO-seq data (from GEO) analysis. These data are run through the dREG pipeline that identifies regions predicted to contain transcription start sites, which include promoters and enhancers.

      Minor comments

      1) In the introduction: The sentence "Ago1 is not expressed in the germline from the spermatogonia stage onwards allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis." is misleading because Ago1 is expressed at least in spermatogonia; It would be more precise to write "after spermatogonia stage" and rephrase the sentence. Otherwise it is surprising to see AGO1 protein in testis lysate and it is not in line with the scRNA seq shown in figure 2.

      __Response: __We agree with the Reviewers suggestion and have edited the sentence on line 100. This sentence now reads "Ago1 is not expressed in the germline after the spermatogonia stage allowing us to use this model to study the roles of Ago4 and Ago3 in spermatogenesis".

      2) Could the authors precise if AGO proteins are expressed in other tissues? In somatic testicular cells?

      __Response: __Expression patterns of mammalian AGOs have been described in somatic and testicular tissues for the mouse by Gonzales-Gonzales et al (2008) by qPCR. They found that Ago2 is expressed in all the somatic tissues analyzed (brain, spleen, heart, muscle and lung) as well as the testis, with the highest expression in brain and lowest in heart. Ago1 is highly expressed in spleen compared to all the tissues analyzed, while Ago3 and Ago4 showed highest expression in testis and brain. Within somatic tissues of the testis, the four argonautes are expressed in Sertoli cells, however, Ago1,3 and 4 expression is very low compared to Ago2, with the latter showing a 10-fold higher transcript level. We have included a sentence with this information in the introduction in line 89.

      3) Pattern of expression: How do the authors explain that AGO3 disappears at the diplotene stage and reappears in spermatids?

      __Response: __ Single cell RNAseq data in the germline shows reduced transcript for Ago3 from the Pachytene stage onwards, suggesting minimal if any new transcription in round spermatids. We hypothesize that the AGO3 protein present in the round spermatid stage is cytoplasmic, presumably coming from the pool of AGO3 in the chromatoid body, a cytoplasmic structure with functional association with the nucleus in round spermatids (Kotaja et al, 2003 doi: 10.1073/pnas.05093331).

      4) It would be useful to show the timing of expression of AGO 1 to 4 throughout spermatogenesis in the first paragraph of the article. Maybe the authors could present data from fig2B earlier?

      Response: We understand the Reviewers concern, however, given that Ago expression throughout spermatogenesis was obtained from scRNA seq, we consider that this data should be presented after introducing the Ago413 knockout and the scRNA seq experiment. As Ago1-4 expression was also described in an earlier manuscript by Gonzales-Gonzales et al in the mouse male germline, and our data aligns with this report, we included a sentence about these previous findings in the earlier results section.

      5) Line 190: please modify the sentence "reveal no differences in cellular architecture of the seminiferous tubules when compared to wild-type males" to " reveal no gross differences..." since even without quantification of the different cell types it is visible that KO seminiferous tubules are different from WT tubules.

      __Response: __We agree with the reviewer, and we modified line 190 (now 173) as suggested. Grossly, seminiferous tubules from Ago413 null males contain the same cell types as in wild type tubules, including spermatozoa. However, our studies show that the number and quality of germ cells is compromised in knockouts, as shown by sperm counts and TUNEL staining.

      6) TUNEL analysis: please stage the tubules to determine the stage(s) at which apoptosis is the most predominant.

      __Response: __We have complied with the reviewer suggestion. Figure 1G now shows staged seminiferous tubules, and we have replaced the wild type image for one where the staged tubules match the knockout image.

      7) Figure S4B does not show an increase of cells at Pachytene stage but at Lepto/zygotene stage (as well as an increase of spermatogonia). Please comment this discrepancy with results shown in Fig2.

      __Response: __Figures 2 and S4 show distribution of cells in different substages of spermatogenesis and prophase I measured with very different methods: a cytological approach using chromosome spreads cells vs a transcriptomic approach that involves clustering of cells. We attribute the differences in cell type distribution to differences in the sensitivity of the methods to identify each cell type and therefore identify differences between the number of cells for each group. Moreover, our scRNA-seq data groups the leptotene and zygotene stages together, while the cytological approach allows for separation of these two sub-stages. Importantly, both results show that Ago413 spermatocytes are progressing slower from pachynema into diplonema and/or are dying after pachynema, as stated in line 194 in our manuscript.

      8) Fig5H and 5I are not mentioned in the result section. Also, it would be useful to label them with "all chromosomes" and "XY" to differentiate them easily

      __Response: __We apologize for the omission and have now cited Figures 5H and 5I in the manuscript (line 453). We have added the suggested labels.

      9) Line 530 "data provide further evidence for a functional association between AGO-dependent small RNAs and heterochromatin formation, maintenance and/or silencing." Please rephrase, the present article does not really show that AGO nuclear role depends on small RNAs.

      __Response____: __We agree with the reviewer that these data do not directly show a dependence on small RNAs. As our identified localization of AGO proteins to the pericentric heterochromatin coincides with localization of DICER shown previously by Yadav and collaborators (2020, doi: 10.1093/nar/gkaa460), we do believe that our data further implicates small RNAs in the silencing of heterochromatin. Yadav et al shows that DICER localizes to pericentromeric heterochromatin and processes major satellite transcripts into small RNAs in mouse spermatocytes, and cKO germ cells have reduced localization of SUV39H2 and H3K9me3 to the pericentromeric heterochromatin. Given the colocalization of both small RNA producing machinery and AGOs at pericentromeric heterochromatin, the AGOs may bind these small RNAs, and the statement in line 530 refers to how our results provide evidence for the involvement of other RNAi machinery in the silencing of pericentromeric heterochromatin investigated by Yadav et al which likely includes small RNAs.

      To clarify this point, we have modified the text accordingly.

      10) Line 1256: replace "cite here " by appropriate reference

      __Response: __The reference was added to line 1256.

      11) Please use SMARCA4 instead of BRG1 name as it is its official name.

      __Response: __We have replaced BRG1 with SMARCA4 in the text and figures.

      Figures:

      Figure 1: Are the pictures shown for Ago3-tagged and floxed from the same stages ? The leptotene stage in 1A looks like a zygotene, while some pachytene/diplotene stage pictures do not look alike.

      __Response: __New representative images have been added to figure 1 to match the same substages across the figure.

      Figure 1D, please label the Y scale properly (testis weight related to body weight)

      __Response: __We have fixed this.

      FigS1: Please comment the presence of non-specific bands in the figure legend

      __Response: __We have added a sentence in Figure S1 Legend.

      Fig 2E and F, please indicate on the figure (in addition to its legend), what are the X and Y axes respectively to facilitate its reading.

      __Response: __X and Y axes are now labelled in Figure 2E and F.

      2F: please use an easier abbreviation for Spermatocyte than Sp (which could spermatogonia, sperm etc..) such as Scyte I ? (same comment for Fig 3C)

      Response: The abbreviation for spermatocyte was changed from Sp to Scyte I in Figures 2 and 3.

      Overall, for all figures showing GSEA analyses, could the authors explain what a High positive NES and a High negative NES mean in the results section?

      Response: Thank you for this suggestion. We have added this information where the GSEA score of the cell markers is initially introduced.

      Significance

      Ago proteins are known for their roles in post transcriptional gene regulation via small RNA mediated cleavage of mRNA, which takes places in the cytoplasm. Some Ago proteins have been shown to be also located in the nucleus suggesting other non-canonical roles. It is the case of Ago4 which has been shown to localize to the transcriptionally silenced sex chromosomes (called sex body) of the spermatocyte nucleus, where it contributes to regulate their silencing (Modzelewski et al 2012). Interestingly, Ago4 knockout leads to Ago3 upregulation, including on the sex body indicating that Ago3 and Ago4 are involved in the same nuclear process. In their manuscript, de las Mercedes Carro et al., investigate the consequences of loss of both Ago3 and Ago4 in the male germline by the production of a triple knockout of Ago1, 3 and 4 in the mouse. With this model, the authors describe the role of Ago3 and Ago4 during spermatogenesis and show that they are involved in sex chromosome gene repression in spermatocytes and in round spermatids, as well as in the control of autosomal meiotic gene expression. Triple KO males have impaired meiosis and spermiogenesis, with fewer and abnormal spermatozoa resulting in reduced fertility. Since Ago1 male germline expression is restricted to pre-meiotic germ cells, it is not expected to contribute to the meiotic and postmeiotic phenotypes observed in the triple KO. The strengths of the study are i) the thorough analyses of mRNA expression at the single cell level, and in purified spermatocytes and spermatids (bulk RNAseq), ii) the identification of novel nuclear partners of AGO3/4 relevant for their described nuclear role: ATF2, which they show to also co-localize with the sex body, and BRG1/SMARCA4, a SWI/SNF chromatin remodeler. The main limitation of the study is the lack of information in the method regarding the production of the triple KO, as well as some aspects of the transcriptome and motif analyses. It is also surprising to see that the triple KO does not recapitulate the miRNA deregulation observed in Ago4 KO. The characterization of a non-canonical role of AGO3/4 in male germ cells will certainly influence researchers of the field, and also interest a broader audience studying Argonaute proteins and gene regulation at transcriptional and posttranscriptional levels.

      Reviewer #2

      Evidence, reproducibility and clarity

      In the manuscript titled "Argonaute proteins regulate the timing of the spermatogenic transcriptional program" by Carro et al., the authors present their findings on how Argonaute proteins regulate spermatogenic development. They utilize a mouse model featuring a deletion of the gene cluster on chromosome 4 that contains Ago1, Ago3, and Ago4 to investigate the cumulative roles of AGO3 and AGO4 in spermatogenic cells. The authors characterize the distribution of AGO proteins and their effects on key meiotic milestones such as synapsis, recombination, meiotic transcriptional regulation, and meiotic sex chromosome inactivation (MSCI). They analyze stage-specific transcriptomes in spermatogenic cells using single-cell and bulk RNA sequencing and determine the interactome of AGO3 and AGO4 through mass spectrometry to examine how AGO proteins may regulate gene expression in these cells during meiotic and post-meiotic development. The authors conclude that both AGO3 and AGO4 are essential for regulating the overall gene expression program in spermatogenic cells and specifically modulate MSCI to repress sex-linked genes in pachytene spermatocytes, which may be partially mediated by the proper distribution of DNA damage repair factors. Additionally, AGO3 is suggested to interact with the chromatin remodeler SWI/SNF factor BRG1, facilitating its removal from the sex-chromatin to enable the repression of sex-linked genes during MSCI.

      Major Comments: 1. The study utilized a triple knockout mouse model to determine the effect of AGO3 on spermatogenesis, following up on their previous report about the role of AGO4 in spermatogenesis, which resulted from an upregulation of AGO3 in Ago4-/- spermatocytes. However, the results are more difficult to interpret and ascertain the role of AGO3 in these cells, given the absence of any observable phenotype from Ago3 interruption. AGO4 regulates sex body formation, meiotic sex chromosome inactivation (MSCI), and miRNA production in spermatocytes, all of which were noted in the absence of both AGO3 and AGO4, with only an increased incidence of cells containing abnormal RNAPII at the sex chromosomes. It will be necessary to characterize how AGO3 regulates spermatogenic development, including meiotic progression and the regulation of the meiotic transcriptome, and compare these findings with the current observations to determine if the proposed mechanism involving AGO3, BRG1, and possibly AP2 is relevant in this context.

      __Response: __While we agree with Reviewer that a single Ago3 knockout will help understand distinct roles of AGO3 and AGO4 in spermatogenesis, the time and resources required to generate a new mouse model are substantial. The analysis included in this current manuscript has already taken over seven years, and with the lengthy production of a new single mutant mouse, validation of the new mouse, and then final analysis, we would be looking at another 3-5 years of analysis. In the current funding climate, and with strong concerns over ensuring reduction in utilization of laboratory mice, we consider this request to be far in excess of what is required to move this important story forward.

      The Ago413-/- mouse model has allowed us to associate a nuclear role of Argonaute proteins with a strong reproductive phenotype in the mouse germline. Given the redundancy between Ago3 and Ago4, it is likely that a single Ago3 knockout would have a mild phenotype just like the Ago4 KO. All this said, we agree with the reviewer that analysis of an Ago3 knockout mouse is a valuable next step, just not within this chapter of the story.

      1. Does Ago413-/- mice recapitulate the early meiotic entry phenotype observed in Ago4-/- mice? If not, could it be possible that AGO3 promotes meiotic entry, given its strong mRNA expression in spermatogonia according to the scRNAseq data (Fig. 2B)

      Response: Our scRNA-seq data shows strong expression of Ago3 in spermatogonia, as mentioned by the Reviewer. Analysis of cell cycle marker expression also shows that the transcriptomic profile of spermatogonia is altered, with higher levels of transcripts corresponding to the later G2/M stages (Figure 2D). Moreover, Ago413 knockouts present an increase in the number of spermatogonial stem cells (Supplementary Figure S4B). However, this cluster represents a pool of quiescent and mitotically active cells entering meiosis, therefore interpretation of these data might be challenging. While specific experiments could be conducted to answer this question, this is outside of the scope of our manuscript. The manuscript as it stands is already rather large, and a full analysis of meiotic entry dynamics would dilute the core message relating to chromatin regulation in the sex body.

      1. The authors suggested that the removal of BRG1 by AGO3 is necessary during sex body formation and the eventual establishment of MSCI. However, the BAF complex subunit ARID1A has been shown to facilitate MSCI by regulating promoter accessibility. It will be interesting to determine how BRG1 distribution changes across the genome in the absence of AGO proteins and how that correlates with alterations in sex-linked gene expression.

      __Response: __We agree that changes in BRG1 distribution across the genome would be very interesting to identify. However, in this work we show that BRG1/SMARCA4 protein changes its localization in the sex body very rapidly between early to late pachynema. These two substages are only discernable by immunofluorescence using synaptonemal complex markers, as there are currently no available techniques to enrich for these subfractions. Therefore, study of genome occupancy of BRG1 in these specific substages by techniques such as CUT&Tag are not currently possible. However, we are currently working on new methods to distinguish these cell populations and hope eventually to use these purification strategies to perform the studies suggested by this reviewer. Alternatively, the hope is that single cell CUT&Tag methods will become more reliable, and will enable us to address these questions. Both of these options are not currently available to us. The studies by Menon et al (2024-doi:10.7554/eLife.88024.5) provide strong evidence to support that ARID1A is needed to reduce promoter accessibility of XY silenced genes in prophase I through modulation of H3.3 distribution. However, this mechanism and our identification of the removal of BRG1 between early and late pachytema are not inconsistent with one another, as either SMARCA4 or SMARCA2 can associate with ARID1A as part of the cBAF complex, and ARID1A is also not in all forms of the BAF complex which BRG1 are in. The difference between our results and those seen in Menon et al likely indicate that there are multiple forms of the BAF complex which are differentially regulated during MSCI and play different roles in silencing transcription. Further studies of specific BAF subunits are needed to elucidate how different flavors of the BAF complex act at specific genomic locations and meiotic time points.

      1. The observations presented in this manuscript (Fig. 1D, 2C, 3D, and 4) suggest a haploinsufficiency of the deleted locus in spermatogenic development. How does this compare with the ablation of either Ago3 or Ago4? Please explain.

      Response: Our previous studies in single Ago4 knockouts did not present a heterozygous phenotype (Modzelewski et al 2012, doi: 10.1016/j.devcel.2012.07.003, data not shown). Triple Ago413 knockouts show a much stronger fertility phenotype than single Ago4 knockout. Testis weight of Ago413 homozygous null present a 30% reduction while heterozygous mice show a 15% reduction (Figure 1D), comparable to the 13% reduction previously observed in Ago4-/- males. Sperm counts of Ago413 null and heterozygous males are reduced by 60% and 39% compared to wild type (Figure 1E), respectively, whereas Ago4 null mice have a milder phenotype, with only a 22% reduction in sperm counts. At the MSCI level, both homozygous and heterozygous Ago413 mutant spermatocytes show a similar increase in pachytene spermatocytes with increased RNA pol II ingression into the sex body with respect to wild-type of 35% and 30%, respectively. Ago4 single knockouts show an almost 18% increase in Pol II ingression when compared to wild type. These comparisons are now included in our manuscript in lines 170, 172 and 288. A milder phenotype of the Ago4 knockout and haploinsufficiency in triple Ago413 knockouts but not in Ago4 single knockouts is likely a consequence of the overlapping functions of Ago3 and Ago4 in mammals (and/or overexpression of Ago3 in Ago4 knockouts). In the context of their role in RISC, Wang et al (doi: 10.1101/gad.182758.111) studied the effects of single and double conditional knockouts for Ago1 and Ago2 in miRNA-mediated silencing. They discovered that the interaction between miRNAs and AGOs is highly correlated with the abundance of each AGO protein, and only double knockouts presented an observable phenotype.

      Minor Comments: Based on the interactome analysis, it was argued that AGO3 and AGO4 may function separately. Please discuss how AGO3 might compensate for AGO4 (Line 109).

      Response: We hypothesize that the combined function of AGO3 and AGO4 is needed for proper sex chromosome inactivation during meiosis. We base this hypothesis on the facts that (i) both proteins localize to the sex body in pachytene spermatocytes, (ii) loss of Ago4 leads to upregulation of Ago3, and (iii) the MSCI phenotype of Ago413 knockout mice is much stronger than the single Ago4 knockout (see above). However, AGO3 and AGO4 might not induce silencing through the same mechanism or pathway. In this work, we observed that their temporal expression in prophase I is different; while AGO3 protein seems to disappear by the diplotene stage, AGO4 is present in the sex body of these cells. Moreover, the proteomic analysis revealed a very low number of common interactors, an observation which could support the idea of AGO3 and AGO4 acting by different (albeit perhaps related) mechanisms to achieve MSCI. It is also possible that common interactors were not identified in our proteomic analysis due to the low abundance of AGO3 and AGO4 in the germ cells, limiting the resolution of the proteomics analysis (note that in order to visualize AGO proteins in WB experiments, at least 60 μg of enriched germ cell lysate must be loaded per lane). Moreover, given the difficulty in obtaining enough isolated pachytene and diplotene spermatocytes to perform immunoprecipitation experiments, we performed IP experiments in whole germ cell lysates, which limits the interpretation of our analysis. If AGO3 and AGO4 protein interactors overlap, then AGO3 would directly substitute for AGO4 leading to silencing in single Ago4 knockouts. However, if AGO3 and AGO4 work together through different, complementary mechanisms, then Ago4 mutant mice likely compensates loss of Ago4 by upregulation of Ago3along with specific interactors of the given pathway. We have added a sentence addressing this matter in line 411 of the results section and lines 506 and 513 of the discussion in the revised manuscript.

      In Line 221, it is unclear what is meant by 'cell cycle transcripts'. Does this refer to meiotic transcripts? It is also important to discuss the relevance of the G2/M cell cycle marker genes at later stages of meiotic prophase.

      Response: Thank you for this suggestion. We have changed the relevant text to remove redundancies and include more information. We agree that considering the importance of these genes across meiotic prophase is needed, as cells which are in the dividing stage will already have produced the proteins necessary for division. These cells likely correspond to the diplotene/M cluster cells that have a lower G2/M score, potentially causing the bimodal distribution seen in Figure 2D. We have added a sentence addressing this to the manuscript.

      While identified as a common interactor of both AGO3 and AGO4 in lines 440-445, HNRNPD is not listed among AGO4 interactors in Table S6. Please correct or explain this discrepancy.

      Response: HNRPD was originally identified as an AGO4 interactor using a less strict criteria than the one used in our manuscript: we required consistent enrichment in at least two rounds of IP MS experiments. This reference to HNRNPD was a mistake, given that HNRPD was only enriched in one of our three replicates. Thus, we apologize and have removed the sentence in lines 440-445.

      It is unclear whether wild-type cell lysate or lysate containing FLAG-tagged AGO3 was used for BRG1 immunoprecipitation, and which antibody was used to detect AGO3 in the BRG1 IP sample. A co-IP experiment demonstrating interaction between BRG1 and wild-type AGO3 would be ideal in this context. Furthermore, co-localization by IF would be beneficial to determine the subcellular localization and the cell stages the interaction may be occurring. Additionally, co-IP and Western blot methodologies should be included in the methods section.

      __Response: __MYC-FLAG tagged AGO3 protein lysates were used for BRG1 Co-Immunoprecipitation, along with an anti MYC antibody to detect AGO3. This is now detailed in the Methods section of our revised manuscript (line 1133).

      Regarding BRG1 and AGO3 colocalization by IF, we can confidently show that both AGO3 and BRG1 localize to the sex chromosomes in early pachynema by comparing BRG1/SYCP3 and FLAG-AGO3/SYCP3 stained spreads. We were not able to show colocalization simultaneously on the same cells, given the lack of appropriate antibodies. Our anti FLAG antibody is raised in mouse, while anti BRG1 is raised in rabbit, therefore a non-rabbit, non-mouse anti SYCP3 would be needed to identify prophase I substages, and our lab does not possess such a validated antibody. However, we now have access to a multiplexing kit that allows to use same-species antibodies for immunofluorescence and we can perform these experiments for a revised manuscript.

      __Response: __The methods section now includes description of co-IP methodologies (line 1132). Western Blot methodologies are explained in lane 718, under the "Immunoblotting" title.

      In line 599, it is unclear what is meant by 'persistence of sex chromosome de-repression'. Please correct or clarify this.

      Response: This sentence has been changed and reads: "The persistence of sex chromosome gene expression".

      If possible, please add an illustration to summarize the findings together.

      Response: We thank the reviewer for this suggestion, and have now added this in Figure 6

      Significance

      Overall, this study enhances the understanding of gene expression regulation by AGO proteins during spermatogenesis. Several approaches, including functional, histological, and molecular characterization of the triple knockout phenotype, were instrumental in elucidating the role of AGO proteins in MSCI and meiotic as well as postmeiotic gene regulation. The main limitation of the study is that it is challenging to appreciate the role of AGO3 in addition to the previously published role of AGO4 without the inclusion of necessary control groups. Furthermore, the mechanism of action for AGO proteins in meiotic gene regulation was left relatively unexplored. This study presents new findings that will be significant for the research community interested in gene regulation, chromatin biology, and reproductive biology with the above suggestions considered.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The authors characterize a CRISPR-Cas9 mouse mutant that targets 3 genes that encode AGO family proteins, 2 of which are expressed during spermatogenesis (AGO3 and AGO4) and one that is said is not expressed, AGO1. This mouse mutant showed that AGO3 and AGO4 both contribute to spermatogenesis success as the "Ago413" mutation gave rise to an additive reduction in testis weight, due to spermatocyte apoptosis, and reduction in sperm count. Furthermore, they use insertion mouse mutants for Ago3 and Ago2 that express tagged versions of their corresponding proteins, which they use in combination with pan-AGO antibodies and Ago mutants to show differential expression and localization properties of AGO2, AGO3, and AGO4 (and the absence of AGO1) during spermatogenesis with a particular focus on meiotic prophase. They perform single-cell RNAseq and intricate analyses to demonstrate a change in distribution of meiotic stages in Ago413 mutants, and the overall cell cycle in spermatogonia and spermatocytes is altered. This analysis shows that the mutation leads to an inability to downregulate prior spermatogonia/spermatocyte stage transcripts in a timely manner. On the other hand, later-stage spermatocytes are abnormally expressing spermiogenesis genes. Similar to the Ago4 mutant previously characterized MSCI is disrupted. The authors also show that AGO3 has different interaction partners compared to AGO4 and focus their final assessment on a novel interaction partner of AGO3, BRG1. They show that this factor, which is involved in chromatin remodeling, is aberrantly localized to the sex body during meiotic prophase and diplonema. As BRG1 is involved in open chromatin, it is proposed that AGO3 restricts BRG1 (and related proteins) from the XY chromosome to ensure MSCI. Overall, this paper is very well constructed with mechanistic insights that make this a very impactful contribution to the research community. Major Comments:

      1. The abstract contains "Ago413-/- mouse" without any explanation of what that is. The abstract needs to be a stand-alone document that does not require any referencing for context.

      Response: We have included a sentence describing Ago413 in line 27

      Figure 2C. - The significance bars are confusing as they appear to overlap strangely.

      Response: We have modified this figure and now present the significance bars are on top of the data points.

      On line 235, the authors state that "we first identified the top non-overlapping upregulated genes for Ago413+/+ germ cells in each cluster. Why did the authors not also select down-regulated genes in each cluster to perform a similar analysis?

      __Response: __Thank you for this question. As our goal was to identify genes that are markers of the transcriptional program in each cell type, we used only uniquely upregulated genes for each cluster. Genes that are downregulated for a cluster may be indicative of the transcription in several other cell types, which is not easily interpretable. For a revised manuscript, we will perform this analysis to determine if there is any specific alterations in these downregulated genes.

      Their Ago413 mutant characterization does a good job of assessing meiotic prophase and spermatozoa. However, their assessment of the stages in between these is lacking (meiotic divisions and spermiogenesis).

      Response: We understand the reviewer's concern, however, it is not usual to study stages between the first meiotic division and spermiogenesis because meiosis II is so rapid and thus we lack tools to dissect it. In general, any defect that impacts meiosis I (and particularly prophase I) leads to cell death during prophase I or at metaphase I due to strictly adhered checkpoints that eradicate defective cells. Thus, the increased TUNEL staining in prophase I indicates to us that defective cells are cleared before exit from meiosis I, and those cells progressing to the spermatid stage are "normal" for meiosis II progression. For these cells that did complete meiosis I and progressed normally through meiosis II, we analyzed their spermiogenic outcome extensively (see section entitled "Post-meiotic spermatids from Ago413-/- males exhibit defective spermiogenesis and poor spermatozoa function"). This section included extensive sperm morphology, sperm motility and sperm fertility through in vitro fertilization assays. That said, we have added a sentence on line 268 to explain the transit through meiosis II.

      The discovery of the interaction between BRG1 and AGO3 is exciting. They should assess BRG1 localization in later sub-stages, including late diplonema and diakinesis.

      __Response: __BRG1(SMARCA4) was analyzed throughout prophase I, as shown in image 5G, including quantification of fluorescence intensity included the analysis of diplonema (5H-I). However, diakinesis was not included here since there was no observable signal of BRG1 in these cells. We have explained this in lines 459.

      ATF2 should have been assessed in more detail, as was done for BRG1 in Figure 5.

      __Response: __We agree with the Reviewer, however, staining of chromosome spreads with the anti ATF2 antibody was not possible in our hands after several attempts and changes in staining conditions. However, as staining of sections was successful, we showed localization of ATF2 on spermatocytes by co staining sections with SYCP3 and ATF2.

      Reviewer #3 (Significance (Required)): Overall, this paper is very well constructed with mechanistic insights, as described in my reviewer comments, that make this a very impactful contribution to the research community.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: 

      (1) a large set of behavioral attributes, 

      (2) with inter-individual variability, that are 

      (3) stable over time. 

      A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining the correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      We thank the reviewer for his exceptionally kind assessment of our work!

      Weaknesses: 

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. 

      We have now uploaded a high-resolution PDF to the Github Address: https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality/blob/main/S8.pdf, and this is also mentioned in the figure legend for Fig. S8

      Why were five or so parameters selected from the full set? How were these selected? 

      The five parameters (% of time walked, walking speed, vector strength, angular velocity, and centrophobicity) were selected because they describe key aspects of the investigated behaviors that can be compared directly across assays. Importantly, several parameters we typically use (e.g., Linneweber et al., 2020) cannot be applied under certain conditions, such as darkness or the absence of visual cues. Furthermore, these five parameters encompass three critical aspects of navigation across standard visual behavioral arenas: (1) The “exploration” category is characterized by parameters describing the fly’s activity. (2) Parameters related to “attention” reflect heightened responses to visual cues, but unlike commonly used metrics such as angle or stripe deviations (e.g., Coulomb, 2012; Linneweber et al., 2020), they can also be measured in absence of visual cues and are therefore suitable for cross-assay comparisons. (3) The parameter “centrophobicity,” used as a potential indicator of anxiety, is conceptually linked to the open-field test in mice, where the ratio of wall-to-open-field activity is frequently calculated as a measurement of anxiety (see for example Carter, Sheh, 2015, chapter 2. https://www.sciencedirect.com/book/9780128005118/guide-to-researchtechniques-in-neuroscience). Admittedly, this view is frequently challenged in mice, but it has a long history which is why we use it.

      Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset? 

      As noted above, we only included a subset of parameters in our final analysis, as many were unsuitable for comparison across assays while still providing valuable assayspecific information which are important to relate these results to previous publications.

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts, it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency". 

      Thank you for this suggestion. During the preparation of the manuscript, we indeed frequently alternated between the terms “stability” and “consistency.” And decided to go with “stability” as the only descriptor, to keep it simple. We now fully agree with the reviewer’s argument and have replaced “stability” by “consistency” throughout the current version of the manuscript in order to increase clarity and coherence.

      The parameters are considered one by one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability' and analyses of single-parameter variability stability.

      We agree with the reviewer that a multivariate analysis adds clear advantages in terms of statistical power, in addition to our chosen approach. On one hand, we believe that the simplicity of our initial analysis, both for correlational and mean data, makes easy for readers to understand and reproduce our data. While preparing the previous version of the manuscript we were skeptical since more complex analyses often involve numerous choices, which can complicate reproducibility. For instance, a recent study in personality psychology (Paul et al., 2024) highlighted the risks of “forking paths” in statistical analysis, showing that certain choices of statistical methods could even reverse findings—a concern mitigated by our simplistic straightforward approach. Still, in preparation of this revised version of the manuscript, we accepted the reviewer’s advice and reanalyzed the data using a generalized linear model. This analysis nicely recapitulates our initial findings and is now summarized in a single figure (Fig. 9).

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that a 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      We agree that this is an important question. Our paper clearly demonstrates that individuality always plays a role in decision-making (and, in this context, any behavioral output can be considered a decision). However, the non-linear relationship between certain situations and the individual’s behavior often reduces the predictive value (or correlation) across contexts, sometimes quite drastically.

      For instance, temperature has a relatively linear effect on certain behavioral parameters, leading to predictable changes across individuals. As a result, correlations across temperature conditions are often similar to those observed across time within the same situation. In contrast, this predictability diminishes when comparing conditions like the presence or absence of visual stimuli, the use of different arenas, or different modalities.

      For this reason, we believe that significance remains the best indicator for describing how measurable individuality persists, even across vastly different situations.

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining the correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?  

      We thank the reviewer for this suggestion, and we have now addressed this point. To account for slope effects, we have now introduced in-group ranks for our linear model computation (see Fig. 9). 

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general and with regard to these specific parameters? Is the increased walking speed at higher temperatures necessarily due to an increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      We agree that grouping our parameters into traits like exploration, attention, and anxiety always includes subjective decisions. The classification into these three categories is even considered partially controversial in the mouse specific literature, which uses the term “anxiety” in similar experiments (see for exampler Carter, Sheh, 2015, chapter 2 . https://www.sciencedirect.com/book/9780128005118/guide-to-research-techniquesin-neuroscience). Nevertheless, we believe that readers greatly benefit from these categories, since they make it easier to understand (beyond mathematical correlations) which aspects of the flies’ individuality can be considered consistent across situations. Furthermore, these categories serve as a bridge to compare insight from very distinct models.

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      We assume the reviewer is referring to Figure 3a. The detailed experimental protocol can be found in the Materials and Methods section under Setup 2: IndyTrax Multi-Arena Platform. We have now clarified this in the mentioned figure legend.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The reviewer raises an important point about hierarchies within the concept of animal individuality or personality. We agree that this is best addressed by first focusing on single behavioral traits/parameters and then integrating several trait properties into a cohesive concept of animal personality (holistic individuality). To ensure consistency throughout the text, we have now thoroughly reviewed the entire manuscript clearly distinguish between single-parameter variability stability/consistency and holistic individuality/personality.

      The study presents a bounty of new technology to study visually guided behaviors. The GitHub link to the software was not available. To verify the successful transfer of open hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      We have now uploaded all codes and materials to GitHub and made them available as soon as we received the reviewers’ comments. All files and materials can be accessed at https://github.com/LinneweberLab/Mathejczyk_2024_eLife_Individuality, which is now frequently mentioned throughout the revised manuscript.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      We thank the reviewer again for the extensive and constructive feedback.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths: 

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting it to their own needs.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting and temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low-risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      We agree with the reviewer that the definition of environmental context can differ between fields and that behavioral context is differently defined, particularly in ecology. Nevertheless, we highlight that our alternations of environmental context are highly stereotypic, well-defined, and unbiased from any interpretation (we only modified what we stated in the experimental description without designing a specific situation that might be again perceived individually differently. E.g., comparing a context with a predator and one without might result in a binary response because one fraction of the tested individuals might perceive the predator in the predator situation, and the other half does not. 

      The analytical framework in terms of statistical methods is lacking. It appears as though the authors used correlations across time/situations to estimate individual variation; however, far more sophisticated and elegant methods exist. The paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data these models could capture and estimate differences in individual behavior across time and situations simultaneously. Along with this, it's currently unclear whether and how any statistical inference was performed. Right now, it appears as though any results describing how individuality changes across situations are largely descriptive (i.e. a visual comparison of the strengths of the correlation coefficients?). 

      The reviewer raises an important point, also raised by reviewer #1. On one hand, we agree with both reviewers that a more aggregated analysis has clear advantages like more statistical power and has the potential to streamline our manuscript, which is why we added such an analysis (see below). On the other hand, we would also like to defend the initial approach we took, since we think that the simplicity of the analysis for both correlational and mean data is easy to understand and reproduce. More complex analyses necessarily include the selection of a specific statistical toolbox by the experimenters and based on these decisions, different analyses become less comparable and more and more complicated to reproduce, unless the entire decision tree is flawlessly documented. For instance, a recent personality psychology paper investigated the relationship between statistical paths within the decision tree (forking analysis) and their results, leading to very surprising results (Paul et al., 2024), since some paths even reversed their findings. Such a variance in conclusions is hardly possible with the rather simplistic and easily reproducible analysis we performed. One of the major strengths of our study is the simple experimental design, allowing for rather simple and easy to understand analyses.

      We nevertheless took the reviewer’s advice very seriously and reanalyzed the data using a generalized linear model, which largely recapitulated the findings of our previously performed “low-tech” analysis in a single figure (Fig. 9).

      Another pretty major weakness is that right now, I can't find any explicit mention of how many flies were used and whether they were re-used across situations. Some sort of overall schematic showing exactly how many measurements were made in which rigs and with which flies would be very beneficial. 

      We apologize for this inconvenience. A detailed overview of male and female sample sizes has been listed in the supplemental boxplots next to the plots (e.g, Fig S6). Apparently, this was not visible enough. Therefore, we have now also uniformly added the sample sizes to the main figure legends.

      I don't necessarily doubt the robustness of the results and my guess is that the author's interpretations would remain the same, but a more appropriate modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation.

      As described above, we have now added the suggested analyses. We hope that the reviewer will appreciate the new Fig. 9, which, in our opinion, largely confirms our previous findings using a more appropriate generalized linear modelling framework.

      Reviewer #3 (Public Review): 

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable the individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days. 

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested failed to remain stable over the spatially varying environment (arena shape).

      (4) Only angular velocity (a readout of attention) remains stable across varying internal states (walking and flying).

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new highthroughput assays. The number of animals is large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, and different temperatures among others. 

      We thank the reviewer for this extraordinary kind assessment of our work!

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors): 

      While appreciating the effort and quality of the work that went into this manuscript, the reviewers identified a few key points that would greatly benefit this work.

      (1) Statistical methods adopted. The dataset produced through this work is large, with multiple conditions and comparisons that can be made to infer parameters that both define and affect the individualistic behaviour of an animal. Hierarchical mixed models would be a more appropriate approach to handle such datasets and infer statistically the influence of different parameters on behaviours. We recommend that the authors take this approach in the analyses of their data.

      (2) Brevity in the text. We urge the authors to take advantage of eLife's flexible template and take care to elaborate on the text in the results section, the methods adopted, the legends, and the guides to the legends embedded in the main text. The findings are likely to be of interest to a broad audience, and the writing currently targets the specialist.

      Reviewer #2 (Recommendations For The Authors): 

      I want to start by saying this seems like a really cool study! It's an impressive amount of work and addressing a pretty basic question that is interesting (at least I think so!)

      We thank the reviewer again for this assessment!

      That said, I would really strongly recommend the authors embrace using mixed/hierarchical models to analyze their data. They're producing some really impressive data and just doing Pearson correlation coefficients across time points and situations is very clunky and actually losing out on a lot of information. The most up-todate, state-of-the-art are mixed models - these models can handle very complex (or not so complex) random structures which can estimate variance and importantly, covariance, in individual intercepts both over time and across situations. I actually think this could add some really cool insights into the data and allow you to characterize the patterns you're seeing in far more detail. It's datasets exactly like this that are tailormade for these complex variance partitioning models! 

      As mentioned before, we have now adopted a more appropriate GLM-based data analysis (see above).

      Regardless of which statistical methods you decide to use, please explicitly state in your methods exactly what analyses you did. That is completely lacking now and was a bit frustrating. As such, it's completely unclear whether or how statistical inference was performed. How did you do the behavioral clustering? 

      We apologize that these points were not clearly represented in the previous version of the manuscript. We have now significantly extended the methods section to include a separate paragraph on the statistical methods used, in order to address this critique and hope that the revised version is clear now.

      Also, I could not for the life of me figure out how many flies had been measured. Were they reused across the situation? Or not?

      We reused the same flies across situations whenever possible. However, having one fly experience all assays consecutively was not feasible due to their fragility. Instead, individual flies were exposed to at least 2 of the 3 groups of assays used here: in the Indytrax setup ,  the Buridan arenas and variants thereof, and the virtual arenas Hence, we have compared flies across entirely different setups, but the number of times flies can be retested is limited (as otherwise, sample sizes will drop over time, and the flies will have gone through too many experimental alternations). To make this more clear, we have elaborated on this point in the main text, and we added group sample sizes to figure legends r.

      What are these "groups" and "populations" that are referred to in the results (e.g. lines 384, 391, 409)?

      We apologize for using these two terms somewhat interchangeably without proper introduction/distinction. We have now made this more clear in at the beginning of the results in the main text, by focusing on the term ‘group’. By ‘group’ we refer to the average of all individuals tested in the same situation. Sample sizes in the figure legends now indicate group/population sizes to make this clearer.

      Some of the rationale for the development of the behavioral rigs would have actually been nice to include in the intro, rather than in the results.

      This rationale is introduced at the beginning of the last paragraph of the introduction. We hope that this now becomes clear in the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript would do well to take advantage of eLife's flexible word limit. I sense that it has been written in brevity for a different journal but I would urge the authors to revisit this and unpack the language here - in the text, in the figure legends, in references to the figures within the text. The way it's currently written, though not misleading, will only speak to the super-specialist or the super-invested :). But the findings are nice, and it would be nice to tailor it to a broader audience.

      We appreciate this suggestion. Initially, we were hoping that we had described our results as clearly and brief as possible. We apologize if that was not always the case. The comments and requests of all three reviewers now led to a series of additions to both main text and methods, leading to a significantly expanded manuscript. We hope that these additons are helpful for the general, non-expert audience.

    1. Author response:

      The following is the authors’ response to the original reviews

      Overview of changes in the revision

      We thank the reviewers for the very helpful comments and have extensively revised the paper. We provide point-by-point responses below and here briefly highlight the major changes:

      (1) We expanded the discussion of the relevant literature in children and adults.

      (2) We improved the contextualization of our experimental design within previous reinforcement studies in both cognitive and motor domains highlighting the interplay between the two.

      (3) We reorganized the primary and supplementary results to better communicate the findings of the studies.

      (4) The modeling has been significantly revised and extended. We now formally compare 31 noise-based models and one value-based model and this led to a different model from the original being the preferred model. This has to a large extent cleaned up the modeling results. The preferred model is a special case (with no exploration after success) of the model proposed in Therrien et al. (2018). We also provide examples of individual fits of the model, fit all four tasks and show group fits for all, examine fits vs. data for the clamp phases by age, provide measures of relative and absolute goodness of fit, and examine how the optimal level of exploration varies with motor noise.

      Reviewer #1 (Public review):

      Summary:

      Here the authors address how reinforcement-based sensorimotor adaptation changes throughout development. To address this question, they collected many participants in ages that ranged from small children (3 years old) to adulthood (1 8+ years old). The authors used four experiments to manipulate whether binary and positive reinforcement was provided probabilistically (e.g., 30 or 50%) versus deterministically (e.g., 100%), and continuous (infinite possible locations) versus discrete (binned possible locations) when the probability of reinforcement varied along the span of a large redundant target. The authors found that both movement variability and the extent of adaptation changed with age.

      Thank you for reviewing our work. One note of clarification. This work focuses on reinforcementbased learning throughout development but does not evaluate sensorimotor adaptation. The four tasks presented in this work are completed with veridical trajectory feedback (no perturbation).

      The goal is to understand how children at different ages adjust their movements in response to reward feedback but does not evaluate sensorimotor adaptation. We now explain this distinction on line 35.

      Strengths:

      The major strength of the paper is the number of participants collected (n = 385). The authors also answer their primary question, that reinforcement-based sensorimotor adaptation changes throughout development, which was shown by utilizing established experimental designs and computational modelling.

      Thank you.

      Weaknesses:

      Potential concerns involve inconsistent findings with secondary analyses, current assumptions that impact both interpr tation and computational modelling, and a lack of clearly stated hypotheses.

      (1) Multiple regression and Mediation Analyses.

      The challenge with these secondary analyses is that:

      (a) The results are inconsistent between Experiments 1 and 2, and the analysis was not performed for Experiments 3 and 4,

      (b) The authors used a two-stage procedure of using multiple regression to determine what variables to use for the mediation analysis, and

      (c)The authors already have a trial-by-trial model that is arguably more insightful.

      Given this, some suggested changes are to:

      (a) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are consistent.

      (b) Move the regression/mediation analysis to Supplementary, since it is slightly distracting given current inconsistencies and that the trial-by-trial model is arguably more insightful.

      Based on these comments, we have chosen to remove the multiple regression and mediation analyses. We agree that they were distracting and that the trial-by-trial model allows for differentiation of motor noise from exploration variability in the learning block.

      (2) Variability for different phases and model assumptions:

      A nice feature of the experimental design is the use of success and failure clamps. These clamped phases, along with baseline, are useful because they can provide insights into the partitioning of motor and exploratory noise. Based on the assumptions of the model, the success clamp would only reflect variability due to motor noise (excludes variability due to exploratory noise and any variability due to updates in reach aim). Thus, it is reasonable to expect that the success clamps would have lower variability than the failure clamps (which it obviously does in Figure 6), and presumably baseline (which provides success and failure feedback, thus would contain motor noise and likely some exploratory noise).

      However, in Figure 6, one visually observes greater variability during the success clamp (where it is assumed variability only comes from motor noise) compared to baseline (where variability would come from: (a) Motor noise.

      (b) Likely some exploratory noise since there were some failures.

      (c) Updates in reach aim.

      Thanks for this comment. It made us realize that some of our terminology was unintentionally misleading. Reaching to discrete targets in the Baseline block was done to a) determine if participants could move successfully to targets that are the same width as the 100% reward zone in the continuous targets and b) determine if there are age dependent changes in movement precision. We now realize that the term Baseline Variability was misleading and should really be called Baseline Precision.

      This is an important distinction that bears on this reviewer's comment. In clamp trials, participants move to continuous targets. In baseline, participants move to discrete targets presented at different locations. Clamp Variability cannot be directly compared to Baseline Precision because they are qualitatively different. Since the target changes on each baseline trial, we would not expect updating of desired reach (the target is the desired reach) and there is therefore no updating of reach based on success or failure. The SD we calculate over baseline trials is the endpoint variability of the reach locations relative to the target centers. In success clamp, there are no targets so the task is qualitatively different.

      We have updated the text to clarify terminology, expand upon our operational definitions, and motivate the distinct role of the baseline block in our task paradigm (line 674).

      Given the comment above, can the authors please:

      (a) Statistically compare movement variability between the baseline, success clamp, and failure clamp phases.

      Given our explanation in the previous point we don't think that comparing baseline to the clamp makes sense as the trials are qualitatively different.

      (b) The authors have examined how their model predicts variability during success clamps and failure clamps, but can they also please show predictions for baseline (similar to that of Cashaback et al., 2019; Supplementary B, which alternatively used a no feedback baseline)?

      Again, we do not think it makes sense to predict the baseline which as we mention above has discrete targets compared to the continuous targets in the learning phase.

      (c) Can the authors show whether participants updated their aim towards their last successful reach during the success clamp? This would be a particularly insightful analysis of model assumptions.

      We have now compared 31 models (see full details in next response) which include the 7 models in Roth et al. (2023). Several of these model variants have updating even after success with so called planning noise). We also now fit the model to the data that includes the clamp phases (we can't easily fit to success clamp alone as there are only 10 trials). We find that the preferred model is one that does not include updating after success.

      (d) Different sources of movement variability have been proposed in the literature, as have different related models. One possibility is that the nervous system has knowledge of 'planned (noise)' movement variability that is always present, irrespective of success (van Beers, R.J. (2009). Motor learning is optimally tuned to the properties of motor noise. Neuron, 63(3), 406-417). The authors have used slightly different variations of their model in the past. Roth et al (2023) directly Rill compared several different plausible models with various combinations of motor, planned, and exploratory noise (Roth A, 2023, "Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds." Proceedings of the Royal Society B 290: 20231475: see Supplemental). Their best-fit model seems similar to the one the authors propose here, but the current paper has the added benefit of the success and failure clamps to tease the different potential models apart. In light of the results of a), b), and c), the authors are encouraged to provide a paragraph on how their model relates to the various sources of movement variability and ther models proposed in the literature.

      Thank you for this. We realized that the models presented in Roth et al. (2023) as well as in other papers, are all special cases of a more general model. Moreover, in total there are 30 possible variants of the full model so we have now fit all 31 models to our larger datasets and performed model selection (Results and Methods). All the models can be efficiently fit by Kalman smoother to the actual data (rather than to summary statistics which has sometimes been done). For model selection, we fit only the 100 learning trials and chose the preferred model based on BIC on the children's data (Figure 5—figure Supplement 1). After selecting the preferred model we then refit this model to all trials including the clamps so as to obtain the best parameter estimates.

      The preferred model was the same whether we combined the continuous and discrete probabilistic data or just examin d each task separately either for only the children or for the children and adults combined. The preferred model is a pecial case (no exploration after success) of the one proposed in Therrien et al. (2018) and has exploration variability (after failure) and motor noise with full updating with exploration variability (if any) after success. This model differs from the model in the original submission which included a partial update of the desired reach after exploration this was considered the learning rate. The current model suggests a unity learning rate.

      In addition, as suggested by another reviewer, we also fit a value-based model which we adapted from the model described in Giron et al. (2023). This model was not preferred.

      We have added a paragraph to the Discussion highlighting different sources of variability and links to our model comparison.

      (e) line 155. Why would the success clamp be composed of both motor and exploratory noise? Please clarify in the text

      This sentence was written to refer to clamps in general and not just success clamps. However, in the revision this sentence seemed unnecessary so we have removed it.

      (3) Hypotheses:

      The introduction did not have any hypotheses of development and reinforcement, despite the discussion above setting up potential hypotheses. Did the authors have any hypotheses related to why they might expect age to change motor noise, exploratory noise, and learning rates? If so, what would the experimental behaviour look like to confirm these hypotheses? Currently, the manuscript reads more as an exploratory study, which is certainly fine if true, it should just be explicitly stated in the introduction. Note: on line 144, this is a prediction, not a hypothesis. Line 225: this idea could be sharpened. I believe the authors are speaking to the idea of having more explicit knowledge of action-target pairings changing behaviour.

      We have included our hypotheses and predictions at two points in the paper In the introduction we modified the text to:

      "We hypothesized that children's reinforcement learning abilities would improve with age, and depend on the developmental trajectory of exploration variability, learning rate (how much people adjust their reach after success), and motor noise (here defined as all sources of noise associated with movement, including sensory noise, memory noise, and motor noise). We think that these factors depend on the developmental progression of neural circuits that contribute to reinforcement learning abilities (Raznahan et al., 2014; Nelson et al., 2000; Schultz, 1998)."

      In results we modified the sentence to:

      "We predicted that discrete targets could increase exploration by encouraging children to move to a different target after failure.”

      Reviewer #2 (Public review):

      Summary:

      In this study, Hill and colleagues use a novel reinforcement-based motor learning task ("RML"), asking how aspects of RML change over the course of development from toddler years through adolescence. Multiple versions of the RML task were used in different samples, which varied on two dimensions: whether the reward probability of a given hand movement direction was deterministic or probabilistic, and whether the solution space had continuous reach targets or discrete reach targets. Using analyses of both raw behavioral data and model fits, the authors report four main results: First, developmental improvements reflected 3 clear changes, including increases in exploration, an increase in the RL learning rate, and a reduction of intrinsic motor noise. Second, changes to the task that made it discrete and/or deterministic both rescued performance in the youngest age groups, suggesting that observed deficits could be linked to continuous/probabilistic learning settings. Overall, the results shed light on how RML changes throughout human development, and the modeling characterizes the specific learning deficits seen in the youngest ages.

      Strengths:

      (1) This impressive work addresses an understudied subfield of motor control/psychology - the developmental trajectory of motor learning. It is thus timely and will interest many researchers.

      (2) The task, analysis, and modeling methods are very strong. The empirical findings are rather clear and compelling, and the analysis approaches are convincing. Thus, at the empirical level, this study has very few weaknesses.

      (3) The large sample sizes and in-lab replications further reflect the laudable rigor of the study.

      (4) The main and supplemental figures are clear and concise.

      Thank you.

      Weaknesses:

      (1) Framing.

      One weakness of the current paper is the framing, namely w/r/t what can be considered "cognitive" versus "non-cognitive" ("procedural?") here. In the Intro, for example, it is stated that there are specific features of RML tasks that deviate from cognitive tasks. This is of course true in terms of having a continuous choice space and motor noise, but spatially correlated reward functions are not a unique feature of motor learning (see e.g. Giron et al., 2023, NHB). Given the result here that simplifying the spatial memory demands of the task greatly improved learning for the youngest cohort, it is hard to say whether the task is truly getting at a motor learning process or more generic cognitive capacities for spatial learning, working memory, and hypothesis testing. This is not a logical problem with the design, as spatial reasoning and working memory are intrinsically tied to motor learning. However, I think the framing of the study could be revised to focus in on what the authors truly think is motor about the task versus more general psychological mechanisms. Indeed, it may be the case that deficits in motor learning in young children are mostly about cognitive factors, which is still an interesting result!

      Thank you for these comments on the framing of our study. We now clearly acknowledge that all motor tasks have cognitive components (new paragraph at line 65). We also explain why we think our tasks has features not present in typical cognitive tasks.

      (2) Links to other scholarship.

      If I'm not mistaken a common observation in tudies of the development of reinforcement learning is a decrease in exploration over-development (e.g., Nussenbaum and Hartley, 2019; Giron et al., 2023; Schulz et al., 2019); this contrasts with the current results which instead show an increase. It would be nice to see a more direct discussion of previous findings showing decreases in exploration over development, and why the current study deviates from that. It could also be useful for the authors to bring in concepts of different types of exploration (e.g. "directed" vs "random"), in their interpretations and potentially in their modeling.

      We recognize that our results differ from prior work. The optimal exploration pattern differs from task to task. We now discuss that exploration is not one size fits all, it's benefits vary depending upon the task. We have added the following paragraphs to the Discussion section:

      "One major finding from this study is that exploration variability increases with age. Some other studies of development have shown that exploration can decrease with age indicating that adults explore less compared to children (Schulz et al., 2019; Meder et al., 2021; Giron et al., 2023). We believe the divergence between our work and these previous findings is largely due to the experimental design of our study and the role of motor noise. In the paradigm used initially by Schulz et al. (2019) and replicated in different age groups by Meder et al. (2021) and Giron et al. (2023), participants push buttons on a two-dimensional grid to reveal continuous-valued rewards that are spatially correlated. Participants are unaware that there is a maximum reward available and therefore children may continue to explore to reduce uncertainty if they have difficulty evaluating whether they have reached a maxima. In our task by contrast, participants are given binary reward and told that there is a region in which reaches will always be rewarded. Motor noise is an additional factor which plays a key role in our reaching task but minimal if any role in the discretized grid task. As we show in simulations of our task, as motor noise goes down (as it is known to do through development) the optimal amount of exploration goes up (see Figure 7—figure Supplement 2 and Appendix 1). Therefore, the behavior of our participants is rational in terms of R230 increasing exploration as motor noise decreases.

      A key result in our study is that exploration in our task reflects sensitivity to failure. Older children make larger adjustments after failure compared to younger children to find the highly rewarded zone more quickly. Dhawale et al. (2017) discuss the different contexts in which a participant may explore versus exploit (i.e., stick at the same position). Exploration is beneficial when reward is low as this indicates that the current solution is no longer ideal, and the participant should search for a better solution. Konrad et al. (2025) have recently shown this behavior in a real-world throwing task where 6 to 12 year old children increased throwing variability after missed trials and minimized variability after successful trials. This has also been shown in a postural motor control task where participants were more variable after non-rewarded trials compared to rewarded trials (Van Mastrigt et al., 2020). In general, these studies suggest that the optimal amount of exploration is dependent on the specifics of the task."

      (3) Modeling.

      First, I may have missed something, but it is unclear to me if the model is actually accounting for the gradient of rewards (e.g., if I get a probabilistic reward moving at 45°, but then don't get one at 40°, I should be more likely to try 50° next then 35°). I couldn't tell from the current equations if this was the case, or if exploration was essentially "unsigned," nor if the multiple-trials-back regression analysis would truly capture signed behavior. If the model is sensitive to the gradient, it would be nice if this was more clear in the Methods. If not, it would be interesting to have a model that does "function approximation" of the task space, and see if that improves the fit or explains developmental changes.

      The model we use (similar to Roth et al. (2023) and Therrien et al. (2016, 2018)) does not model the gradient. Exploration is always zero-mean Gaussian. As suggested by the reviewer, we now also fit a value-based model (described starting at line 810) which we adapted from the model presented in Giron et al. (2023). We show that the exploration and noise-based model is preferred over the value-based model.

      The multiple-trials-back regression was unsigned as the intent was to look at the magnitude and not the direction of the change in movement. We have decided to remove this analysis from the manuscript as it was a source of confusion and secondary analysis that did not add substantially to the findings of these studies.

      Second, I am curious if the current modeling approach could incorporate a kind of "action hysteresis" (aka perseveration), such that regardless of previous outcomes, the same action is biased to be repeated (or, based on parameter settings, avoided).

      In some sense, the learning rate in the model in the original submission is highly related to perseveration. For example if the learning rate is 0, then there is complete perseveration as you simply repeat the same desired movement. If the rate is 1, there is no perseveration and values in between reflect different amounts of perseveration. Therefore, it is not easy to separate learning rate from perseveration. Adding perseveration as another parameter would likely make it and the learning unidentifiable. However, we now compare 31 models and those that have a non-unity learning rate are not preferred suggesting there is little perseveration.

      (4) Psychological mechanisms. There is a line of work that shows that when children and adults perform RL tasks they use a combination of working memory and trial-by-trial incremental learning processes (e.g., Master et al., 2020; Collins and Frank 2012). Thus, the observed increase in the learning rate over development could in theory reflect improvements in instrumental learning, working memory, or both. Could it be that older participants are better at remembering their recent movements in short-term memory (Hadjiosif et al., 2023; Hillman et al., 2024)?

      We agree that cognitive processes, such as working memory or visuospatial processing, play a role in our task and describe cognitive elements of our task in the introduction (new paragraph at line 65). However, the sensorimotor model we fit to the data does a good job of explaining the variation across age, which suggests that that age-dependent cognitive processes probably play a smaller role.

      Reviewer #3 (Public review):

      Summary:

      The study investigates reinforcement learning across the lifespan with a large sample of participants recruited for an online game. It finds that children gradually develop their abilities to learn reward probability, possibly hindered by their immature spatial processing and probabilistic reasoning abilities. Motor noise, reinforcement learning rate, and exploration after a failure all contribute to children's subpar performance.

      Strengths:

      (1) The paradigm is novel because it requires continuous movement to indicate people's choices, as opposed to discrete actions in previous studies.

      (2) A large sample of participants were recruited.

      (3) The model-based analysis provides further insights into the development of reinforcement learning ability.

      Thank you.

      Weaknesses:

      (1 ) The adequacy of model-based analysis is questionable, given the current presentation and some inconsistency in the results.

      Thank you for raising this concern. We have substantially revised the model from our first submission. We now compare 31 noise-based models and 1 value-based model and fit all of the tasks with the preferred model. We perform model selection using the two tasks with the largest datasets to identify the preferred model. From the preferred model, we found the parameter fits for each individual dataset and simulated the trial by trial behavior allowing comparison between all four tasks. We now show examples of individual fits as well as provide a measure of goodness of fit. The expansion of our modeling approach has resolved inconsistencies and sharpened the conclusions drawn from our model.

      (2) The task should not be labeled as reinforcement motor learning, as it is not about learning a motor skill or adapting to sensorimotor perturbations. It is a classical reinforcement learning paradigm.

      We now make it clear that our reinforcement learning task has both motor and cognitive demands, but does not fall entirely within one of these domains. We use the term motor learning because it captures the fact that participants maximize reward by making different movements, corrupted by motor noise, to unmarked locations on a continuous target zone. When we look at previous ublications, it is clear that our task is similar to those that also refer to this as reinforcement motor learning Cashaback et al. (2019) (reaching task using a robotic arm in adults), Van Mastrigt et al. (2020) (weight shifting task in adults), and Konrad et al. (2025) (real-world throwing task in children). All of these tasks involve trial-by-trial learning through reinforcement to make the movement that is most effective for a given situation. We feel it is important to link our work to these previous studies and prefer to preserve the terminology of reinforcement motor learning.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Thank you for this summary. Rather than repeat the extended text from the responses to the reviewers here, we point the Editor to the appropriate reviewer responses for each issue raised.

      The reviewers and editors have rated the significance of the findings in your manuscript as "Valuable" and the strength of evidence as "Solid" (see eLife evalutation). A consultancy discussion session to integrate the public reviews and recommendations per reviewer (listed below), has resulted in key recommendations for increasing the significance and strength of evidence:

      To increase the Significance of the findings, please consider the following:

      (1) Address and reframe the paper around whether the task is truly getting at a motor learning process or more generic cognitive decision-making capacities such as spatial memory, reward processing, and hypothesis testing.

      We have revised the paper to address the comments on the framing of our work. Please see responses to the public review comments of Reviewers #2 and #3.

      (2) It would be beneficial to specify the differences between traditional reinforcement algorithms (i.e., using softmax functions to explore, and build representations of state-action-reward) and the reinforcement learning models used here (i.e., explore with movement variability, update reach aim towards the last successful action), and compare present findings to previous cognitive reinforcement learning studies in children.

      Please see response to the public review comments of Reviewer #1 in which we explain the expansion of our modeling approach to fit a value-based model as well as 31 other noise-based models. In our response to the public review comments of Reviewer #2, we comment on our expanded discussion of how our findings compare with previous cognitive reinforcement learning studies.

      To move the "Strength of Evidence" to "Convincing", please consider doing the following:

      (1 ) Address some apparently inconsistent and unrealistic values of motor noise, exploration noise, and learning rate shown for individual participants (e.g., Figure 5b; see comments reviewers 1 and take the following additional steps: plotting r squares for individual participants, discussing whether individual values of the fitted parameters are plausible and whether model parameters in each age group can extrapolate to the two clamp conditions and baselines.

      We have substantially updated our modeling approach. Now that we compare 31 noise-based models, the preferred model does not show any inconsistent or unrealistic values (see response to Reviewer #3). Additionally, we now show example individual fits and provide both relative and absolute goodness of fit (see response to Reviewer #3).

      (2) Relatedly, to further justify if model assumptions are met, it would be valuable to show that the current learning model fits the data better than alternative models presented in the literature by the authors themselves and by others (reviewer 1). This could include alternative development models that formalise the proposed explanations for age-related change: poor spatial memory, reward/outcome processing, and exploration strategies (reviewer 2).

      Please see response to public review comments of Reviewer #1 in which we explain that we have now fit a value-based model as well as 31 other noise-based models providing a comparison of previous models as well as novel models. This led to a slightly different model being preferred over the model in the original submission (updated model has a learning rate of unity). These models span many of the processes previously proposed for such tasks. We feel that 32 models span a reasonable amount of space and do not believe we have the power to include memory issues or heuristic exploration strategies in the model.

      (3) Perform the mediation analysis with all the possible variables (i.e., not informed by multiple regression) to see if the results are more consistent across studies and with the current approach (see comments reviewer 1).

      Please see response to public review comments of Reviewer #1. We chose to focus only on the model based analysis because it allowed us to distinguish between exploration variability and motor noise.

      Please see below for further specific recommendations from each reviewer.

      Reviewer #1 (Recommendations for the author):

      (1) In general, there should be more discussion and contextualization of other binary reinforcement tasks used in the motor literature. For example, work from Jeroen Smeets, Katinka van der Kooij, and Joseph Galea.

      Thank you for this comment. We have edited the Introduction to better contextualize our work within the reinforcement motor learning literature (see line 67 and line 83).

      (2) Line 32. Very minor. This sentence is fine, but perhaps could be slightly improved. “select a location along a continuous and infinite set of possible options (anywhere along the span of the bridge)"

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (3) Line 57. To avoid some confusion in successive sentences: Perhaps, "Both children over 12 and adolescents...".

      Thank you for this comment. We have edited the sentence to reflect this suggestion.

      (4) Line 80. This is arguably not a mechanistic model, since it is likely not capturing the reward/reinforcement machinery used by the nervous system, such as updating the expected value using reward predic tion errors/dopamine. That said, this phenomenological model, and other similar models in the field, do very well to capture behaviour with a very simple set of explore and update rules.

      We use mechanistic in the standard use in modeling, as in Levenstein et al. (2023), for example. The contrast is not with neural modeling, but with normative modeling, in which one develops a model to optimize a function (or descriptive models as to what a system is trying to achieve). In mechanistic modeling one proposes a mechanism and this can be at a state-space level (as in our case) or a neural level (as suggested my the reviewer) but both are considered mechanistic, just at different levels. Quoting Levenstein "... mechanistic models, in which complex processes are summarized in schematic or conceptual structures that represent general properties of components and their interactions, are also commonly used." We now reference the Levenstein paper to clarify what we mean by mechanistic.

      (5) Figure 1. It would be useful to state that the x-axis in Figure 1 is in normalized units, depending on the device.

      Thank you for this comment. We have added a description of the x-axis units to the Fig. 1 caption.

      (6) Were there differences in behaviour for these different devices? e.g., how different was motor noise for the mouse, trackpad, and touchscreen?

      Thank you for this question. We did not find a significant effect of device on learning or precision in the baseline block. We have added these one way ANOVA results for each task in Supplementary Table 1.

      (7) Line 98. Please state that participants received reinforcement feedback during baseline.

      Thank you for this comment. We have updated the text to specify that participants receive reward feedback during the baseline block.

      (8) Line 99. Did the distance from the last baseline trial influence whether the participant learned or did not learn? For example, would it place them too far from the peak success location such that it impacted learning?

      Thank you for this question. We looked at whether the position of movement on the last baseline block trial was correlated with the first movement position in the learning block. We did not find any correlations between these positions for any of the tasks. Interestingly, we found that the majority of participants move to the center of the workspace on the first trial of the learning block for all tasks (either in the presence of the novel continuous target scene or the presentation of 7 targets all at once). We do not think that the last movement in the baseline block "primed" the participant for the location of the success zone in the learning block. We have added the following sentence to the Results section:

      "Note that the reach location for the first learning trial was not affected by (correlated with) the target position on the last baseline trial (p > 0.3 for both children and adults, separately)."

      (9) The term learning distance could be improved. Perhaps use distance from target.

      Thank you for this comment. We appreciate that learning distance defined with 0 as the best value is counter intuitive. We have changed the language to be "distance from target" as the learning metric.

      (10) Line 188. This equation is correct, but to estimate what the standard deviation by the distribution of changes in reach position is more involved. Not sure if the authors carried out this full procedure, which is described in Cashaback et al., 2019; Supplemental 2.

      There appear to be no Supplemental 2 in the referenced paper so we assume the reviewer is referring to Supplemental B which deals with a shuffling procedure to examine lag-1 correlations.

      In our tasks, we are limited to only 9 trials to analyze in each clamp phase so do not feel a shuffling analysis is warranted. In these blocks, we are not trying to 'estimate what the standard deviation by the distribution of changes in reach position' but instead are calculating the standard deviation of the reach locations and comparing the model fit (for which the reviewer says the formula is correct) with the data. We are unclear what additional steps the reviewer is suggesting. In our updated model analysis, we fit the data including the clamp phases for better parameter estimation. We use simulations to estimate s.d. in the clamp phase (as we ensure in simulations the data does not fall outside the workspace) making the previous analytic formulas an approximation that are no longer used.

      (11) Line 197-199. Having done the demo task, it is somewhat surprising that a 3-year-old could understand these instructions (whose comprehension can be very different from even a 5-year old).

      Thank you for raising this concern. We recognize that the younger participants likely have different comprehension levels compared to older participants. However, we believe that the majority of even the youngest participants were able to sufficiently understand the goal of the task to move in a way to get the video clip to play. We intentionally designed the tasks to be simple such that the only instructions the child needed to understand were that the goal was to get the video clip to play as much as possible and the video clip played based on their movement. Though the majority of younger children struggled to learn well on the probabilistic tasks, they were able to learn well on the deterministic tasks where the task instructions were virtually identical with the exception of how many places in the workspace could gain reward. On the continuous probabilistic task, we did have a small number (n = 3) of 3 to 5 year olds who exhibited more mature learning ability which gives us confidence that the younger children were able to understand the task goal.

      (12) Line 497: Can the authors please report the F-score and p-value separately for each of these one-way ANOVA (the device is of particular interest here).

      Thank you for this request. We have added ina upplementarytable (Supplementary Table 1) with the results of these ANOVAs.

      (13) Past work has discussed how motivation influences learning, which is a function of success rate (van der Kooij, K., in 't Veld, L., & Hennink, T. (2021). Motivation as a function of success frequency. Motivation and Emotion, 45, 759-768.). Can the authors please discuss how that may change throughout development?

      Thank you for this comment. While motivation most probably plays a role in learning, in particular in a game environment, this was out of the scope of the direct focus of this work and not something that our studies were designed to test. We have added the following sentence to the discussion section to address this comment:

      "We also recognize that other processes, such as memory and motivation, could affect performance on these tasks however our study was not designed to test these processes directly and future work would benefit from exploring these other components more explicitly."

      (14) Supplement 6. This analysis is somewhat incomplete because it does not consider success.

      Pekny and collegues (2015) looked at 3 trials back but considered both success and reward. However, their analysis has issues since successive time points are not i.i.d., and spurious relationships can arise. This issue is brought up by Dwahale (Dhawale, A. K., Miyamoto, Y. R., Smith, M. A., & R475 Ölveczky, B. P. (2019). Adaptive regulation of motor variability. Current Biology, 29(21), 3551-3562.). Perhaps it is best to remove this analysis from the paper.

      Thank you for this comment. We have decided to remove this secondary analysis from the paper as it was a source of confusion and did not add to the understanding and interpretation of our behavioral results.

      Reviewer #2 (Recommendations for the author):

      (1 ) the path length ratio analyses in the supplemental are interesting but are not mentioned in the main paper. I think it would be helpful to mention these as they are somewhat dramatic effects

      Thank you for this comment. Path length ratios are defined in the Methods and results are briefly summarized in the Results section with a point to the supplementary figures. We have updated the text to more explicitly report the age related differences in path length ratios.

      (2) The second to last paragraph of the intro could use a sentence motivating the use ofthe different task features (deterministic/probabilistic and discrete/continuous).

      Thank you for this comment. We have added an additional motivating sentence to the introduction.

      Reviewer #3 (Recommendations for the author):

      The paper labeled the task as one for reinforcement motor learning, which is not quite appropriate in my opinion. Motor learning typically refers to either skill learning or motor adaptation, the former for improving speed-accuracy tradeoffs in a certain (often new) motor skill task and the latter for accommodating some sensorimotor perturbations for an existing motor skill task. The gaming task here is for neither. It is more like a

      decision-making task with a slight contribution to motor execution, i.e., motor noise. I would recommend the authors label the learning as reinforcement learning instead of reinforcement motor learning.

      Thank you for this comment. As noted in the response to the public review comments, we agree that this task has components of classical reinforcement learning (i.e. responding to a binary reward) but we specifically designed it to require the learning of a movement within a novel game environment. We have added a new paragraph to the introduction where we acknowledge the interplay between cognitive and motor mechanisms while also underscoring the features in our task that we think are not present in typical cognitive tasks.

      My major concern is whether the model adequately captures subjects' behavior and whether we can conclude with confidence from model fitting. Motor noise, exploration noise, and learning rate, which fit individual learning patterns (Figure 5b), show some quite unrealistic values. For example, some subjects have nearly zero motor noise and a 100% learning rate.

      We have now compared 31 models and the preferred model is different from the one in the first submission. The parameter fits of the new model do not saturate in any way and appear reasonable to us. The updates to the model analysis have addressed the concern of previously seen unrealistic values in the prior draft.

      Currently, the paper does not report the fitting quality for individual subjects. It is good to have an exemplary subject's fit shown, too. My guess is that the r-squared would be quite low for this type of data. Still, given that the children's data is noisier, it might be good to use the adult data to show how good the fitting can be (individual fits, r squares, whether the fitted parameters make sense, whether it can extrapolate to the two clamp phases). Indeed, the reliability of model fitting affects how we should view the age effect of these model parameters.

      We now show fits to individual subjects. But since this is a Kalman smoother it fits the data perfectly by generating its best estimate of motor noise and exploration variability on each trial to fully account for the data — so in that sense R<sup>2</sup> is always 1 so that is not helpful.

      While the BIC analysis with the other model variants provides a relative goodness of fit, it is not straightforward to provide an absolute goodness of fit such as standard R<sup>2</sup> for a feedforward simulation of the model given the parameters (rather than the output of the Kalman smoother). There are two problems. First, there is no single model output. Each time the model is simulated with the fit parameters it produces a different output (due to motor noise, exploration variability and reward stochasticity). Second, the model is not meant to reproduce the actual motor noise, exploration variability and reward stochasticity of a trial. For example, the model could fit pure Gaussian motor noise across trials (for a poor learner) by accurately fitting the standard deviation of motor noise but would not be expected to actually match each data point so would have a traditional R<sup>2</sup> of O.

      To provide an overall goodness of fit we have to reduce the noise component and to do so we exam ined the traditional R<sup>2</sup> between the average of all the children's data and the average simulation of the model (from the median of 1000 simulations per participant) so as to reduce the stochastic variation. The results for the continuous probabilistic and discrete probabilistic task are R<sup>2</sup> of 0.41 and 0.72, respectively.

      Not that variability in the "success clamp" doe not change across ages (Figure 4C) and does not contribute to the learning effect (Figure 4F). However, it is regarded as reflecting motor noise (Figure SC), which then decreases over age from the model fitting (Figure 5B). How do we reconcile these contradictions? Again, this calls the model fitting into question.

      For the success clamp, we only have 9 trials to calculate variability which limits our power to detect significance with age. In contrast, the model uses all 120 trials to estimate motor noise. There is a downward trend with age in the behavioral data which we now show overlaid on the fits of the model for both probabilistic conditions (Figure 5—figure Supplement 4) and Figure 6—figure Supplement 4). These show a reasonable match and although the variance explained is 1 6 and 56% (we limit to 9 trials so as to match the fail clamp), the correlations are 0.52 and 0.78 suggesting we have reasonable relation although there may be other small sources of variability not captured in the model.

      Figure 5C: it appears one bivariate outlier contributes a lot to the overall significant correlation here for the "success clamp".

      Recalculating after removing that point in original Fig 5C was still significant and we feel the plots mentioned in the previous point add useful information to this issue. With the new model this figure has changed.

      It is still a concern that the young children did not understand the instructions. Nine 3-to-8 children (out of 48) were better explained by the noisy only model than the full model. In contrast, ten of the rest of the participants (out of 98) were better explained by the noisy-only model. It appears that there is a higher percentage of the "young" children who didn't get the instruction than the older ones.

      Thank you for this comment. We did take participant comprehension of the task into consideration during the task design. We specifically designed it so that the instructions were simple and straight forward. The child simply needs to understand the underlying goal to make the video clip play as often as possible and that they must move the penguin to certain positions to get it to play. By having a very simple task goal, we are able to test a naturalistic response to reinforcement in the absence of an explicit strategy in a task suited even for young children.

      We used the updated reinforcement learning model to assess whether an individual's performance is consistent with understanding the task. In the case of a child who does not understand the task, we expect that they simply have motor noise on their reach, and crucially, that they would not explore more after failure, nor update their reach after success. Therefore, we used a likelihood ratio test to examine whether the preferred model was significantly better at explaining each participant's data compared to the model variant which had only motor noise (Model 1). Focusing on only the youngest children (age 3-5), this analysis showed that that 43, 59, 65 and 86% of children (out of N = 21, 22, 20 and 21 ) for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic conditions, respectively, were better fit with the preferred model, indicating non-zero exploration after failure. In the 3-5 year old group for the discrete deterministic condition, 18 out of 21 had performance better fit by the preferred model, suggesting this age group understands the basic task of moving in different directions to find a rewarding location.

      The reduced numbers fit by the preferred model for the other conditions likely reflects differences in the task conditions (continuous and/or probabilistic) rather than a lack of understanding of the goal of the task. We include this analysis as a new subsection at the end of the Results.

      Supplementary Figure 2: the first panel should belong to a 3-year-old not a 5-year-old? How are these panels organized? This is kind of confusing.

      Thank you for this comment. Figure 2—figure Supplement 1 and Figure 2—figure Supplement 2 are arranged with devices in the columns and a sample from each age bin in the rows. For example in Figure 2—figure Supplement 1, column 1, row 1 is a mouse using participant age 3 to 5 years old while column 3, row 2 is a touch screen using participant age 6 to 8 years old. We have edited the labeling on both figures to make the arrangement of the data more clear.

      Line 222: make this a complete sentence.

      This sentence has been edited to a complete sentence.

      Line 331: grammar.

      This sentence has been edited for grammar.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

      We thank the reviewer for their appreciation of our work. Although we agree that there can be limitations to brain slice preparations, the approaches used here for synaptic connectivity mapping are well-designed to identify long-range synaptic connectivity patterns. Optogenetic stimulation of axon terminals in brain slices does not require intact axons and works well when axons are cut, allowing identification of all inputs expressing optogenetic channels from aXerent regions. Terminal stimulation in slices yields stable post-synaptic responses for hours without rundown, assuring that polysynaptic and monosynaptic connections can be reliably identified in our brain slices.  Additionally, conducting similar types of experiments in vivo can run into important limitations. First, the extent of TTX and 4-AP diXusion, which is necessary for identification of long-range monosynaptic connections, can be diXicult to verify in vivo - potentially confounding identification of monosynaptic connectivity.  Second, conducting whole-cell patch-clamp experiments in vivo, particularly in deeper brain regions, is technically challenging, and would limit the number of cells that can be patched and increase the number of animals needed. 

      We agree that there may well be important diXerences between adult connectivity and connectivity patterns in the juvenile brain. Indeed, learning and experience during development almost certainly shape connectivity patterns and these patterns of connectivity may change incrementally and/or dynamically during development. Ultimately, adult connectivity patterns are the result of changes in the brain that accrue over development. Given that this is the first study mapping long-range connectivity of HVC input-output pathways, we reasoned that the adult connectivity would provide a critical reference allowing future studies to map diXerent stages of juvenile connectivity and the changes in connectivity driven by milestones like forming a tutor song memory, sensorimotor learning, and song crystallization.

      In this revision we worked to better highlight the points raised above and thank the reviewer for their comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron aXerents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

      We thank the reviewer for their assessment of our research and suggestions. We have implemented many of these suggestions and provide details in our response to their specific Recommendations. Additionally, we are organizing our data and will make it publicly available with the version of record.

      Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

      We thank the reviewer for their thoughtful assessment of our research.

      Recommendations for the authors:

      The following recommendations were considered by all reviewers to be important to incorporate for improving this paper:

      (1) Clarify the site of viral injection and the possibility of labeling other structures a) Show images of viral injection sites.

      We provide a representative image of viral expression for each pathway studied in this manuscript. Please see panel A in Figures 2-3 and 5-6 showing our viral expression in Uva, NIf, mMAN, and Av respectively.  

      b) Include in discussion caveats that the virus may spread beyond the boundaries of structures (e.g. especially injections into NIF could spread into Field L).

      For each HVC aXerent nucleus we have now included a sentence describing the possible spread of viral infection in surrounding structures in the Results. We also now expanded the image from the Av section to include NIf, to showcase lack of viral expression in NIf (see Fig. 6A).

      (2) Clarify the logic and precise methods of the TTX and 4-AP experiments

      a) Please see the detailed issue raised by Reviewer 3, Major Point 1 below.

      The TTX and 4AP application is the gold-standard of opsin-assisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 (Petreanu, Mao et al. 2009) and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review(Linders, Supiot et al. 2022). We now better describe the logic of this approach in the second paragraph of the Results section and cite the first description of this method from the Svoboda lab and a recent review weighing this method with other optogenetic methods for tracing synaptic connections in the brain.

      (3) Include caveats in discussion

      a) Note that there may be other inputs to HVC that were not examined in this study (e.g. CMM, Field L)

      In our original manuscript we did state “Although a complete description of HVC circuitry will require the examination of other potential inputs (i.e. RA<sub>HVC</sub> PNs, A11 glutamatergic neurons(Roberts, Klein et al. 2008, Ben-Tov, Duarte et al. 2023)) and a characterization of interneuron synaptic connectivity, here we provide a map of the synaptic connections between the 4 best described aPerents to HVC and its 3 populations of projection neurons” in the last paragraph of the Discussion. We have now edited this sentence to include the projection from NCM to HVC and cited Louder et al., 2024.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      b) Also note that birds in this study were adults and that some inputs to HVC likely to be important for learning may recede during development (e.g. Louder et al, 2024).

      In the second to last paragraph of the Discussion we now state: While our opsin-assisted circuit mapping provides us with a new level of insight into HVC synaptic circuitry, there are limitations to this research that should be considered. All circuit mapping in this study was carried out in brain slices from adult male zebra finches. Future studies will be needed to examine how this adult connectivity pattern relates to patterns of connectivity in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds.   

      (4) Consider cosmetic changes to figures as suggested by Reviewers 2-3 below.

      We thank the reviewers for their suggestions and have implemented the changes as best we can.

      (5) Address all minor issues raised below.

      Reviewer #1 (Recommendations for the authors):

      I see this study is well designed to answer the author's specific question, mapping synaptic auditorymotor connections within HVC. Their experiments with advanced techniques of projection-specific optogenetic manipulation of synaptic inputs and retrograde identification of projection areas revealed input-output combination selective synaptic mapping.

      As I found this study advanced our knowledge with the compelling dataset, I have only some minor comments here.

      (1) One technical concern is we don't see how much the virus infection was focused on the target area and if we can ignore the eXect of synaptic connectivity from surrounding areas. As the amount of virus they injected is large (1.5ul) and target areas are small, we assume the virus might spread to the surrounding area, such as field L which also projects to HVC when targeting Nif. While I think the majority of the projections were from their target areas, it would be better to mention (also the images with larger view areas) the possibilities of projections of surrounding areas.

      We agree with the reviewer about the concern about specificity of viral expression. For this reason, we included sample images of the viral expression in each target area (panel A in Fig. 2,3,5,6). We have now also included a sentence at the beginning of each subsection of our Result to describe how we have ensured interpretability of the results. Uva and mMAN’s surrounding areas are not known to project to HVC. Possible cross-infection is an issue for Av and NIf, and we checked each bird’s injection site to ensure that eGtACR1+ cells were not visible in the unintended HVC-projecting areas.

      As mentioned in our response the public comment, consistent with Vates (Vates, Broome et al. 1996) we do not see evidence that Field L projects directly to HVC (see Fig. 3G).

      (2) Another concern about the technical issue is the damage to axonal projections. While I understand the authors stimulated axonal terminals axonal projections were assumed to be cut and their ability to release neurotransmitters would be reduced especially after long-term survival or repeated stimulation. Mentioning whether projection pathways were within their 230um-thick slice (probably depends on input sites) or not and the eXect of axonal cut would be helpful.

      We agree that slice electrophysiology has limitations. However, we disagree with the claim of reduced reliability or stability of the evoked response. We and others find that electrical and optogenetic repeated terminal stimulation in slices can yield stable post-synaptic responses for tens of minutes and even hours (Bliss and Gardner-Medwin 1973, Bliss and Lomo 1973, Liu, Kurotani et al. 2004, Pastalkova, Serrano et al. 2006, Xu, Yu et al. 2009, Trusel, Cavaccini et al. 2015, Trusel, Nuno-Perez et al. 2019). Indeed, long-term synaptic plasticity experiments in most preparations and across brain areas rely on such stability of the presynaptic machinery for synaptic release, despite axons being severed from their parent soma. Our assumption is the vast majority, if not all, connections between axon terminals and their cell body in the aXerent regions have been cut in our preparations. Nonetheless, the diversity of outcomes we report (currents returning after TTX+4AP or not, depending on the specific combination of input and HVCPN class) is consistent with the robustness of the synaptic interrogation method. 

      (3) While I understand this study focused on 4 major input areas and the authors provide good pictures of synaptic HVC connections from those areas, HVC has been reported to receive auditory inputs from other areas as well (CMM, FieldL, etc.). It is worth mentioning that there are other auditory inputs and would be interesting to discuss coordination with the inputs from other areas.

      We have extensively mapped input pathways to HVC, and consistent with Vates (Vates, Broome et al. 1996) we have not found evidence that Field L projects to HVC. Rather that it projects to the shelf region outside of HVC. Consistent with this, we do not see retrogradely labeled neurons in Field L following tracer injections confined to HVC (see Fig. 3G). Additionally, we find that CM projections to HVC arise from the nucleus Avalanche (Roberts, Hisey et al. 2017) which we specifically examine in this study. We do not dispute that there may be other pathways projecting to HVC that will need to be examined in the future, including known projections from neuromodulatory regions and RA, from developmentally restricted pathway(s) like NCM (Louder, Kuroda et al. 2024), and from yet unidentified pathways.

      (4) The HVC local neuronal connections have been reported to be modified and a recent study revealed the transient auditory inputs into HVC during song learning period. The author discusses the functions of HVC synaptic connections on song learning (also title says synaptic connection for song learning), however, the experiments were done in adults and dp not discuss the possibility of diXerent synaptic connection mapping in juveniles in the song learning period. Mentioning the neuronal activities and connectivity changes during song learning is important. Also, it would be helpful for the readers to discuss the potential diXerences between juveniles/adults if they want to discuss the functions of song learning.

      We now mention in the Discussion that this is an important caveat of our research and that future studies will be needed to examine how these adult connectivity patterns relate to connectivity patterns in juveniles during sensory or sensorimotor phases of vocal learning and connectivity patterns in female birds. Nonetheless, the title and abstract cite song learning because it is important for the broader public to understand that at least some of these aXerent brain regions carry an essential role in song learning (Foster and Bottjer 2001, Roberts, Gobes et al. 2012, Roberts, Hisey et al. 2017, Zhao, Garcia-Oscos et al. 2019, Koparkar, Warren et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      The work is very detailed and will be an important resource to those working in the field. The recordings are of a high quality and lots of information is included such as measures of response kinetics amplitude and pharmacological confirmation of excitatory and inhibitory synaptic responses. In general, I feel the quality is extremely high and the quantity of data is on a very significant exhaustive scale that will certainly aid the field. I have come at this conclusion as a non zebra finch person but I feel the connection information shown will be of benefit given its high quality.

      Figure 7 is a nice way of showing the overall organization. Optional suggestion, consider highlighting anything in Figure 7 that results in a new understanding of the song system as compared to previous work on anatomy and function.

      We thank the reviewer for the kind comments about our research. We have highlighted our newly found connection between mMAN and Av and all the connections onto the HVC PNs in Panel B are newly identified in this study.

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Clarification regarding methods for determining monosynaptic events:

      One of the manipulations that I struggled the most with was those describing the use of TTX + 4AP to isolate monosynaptic events. Initially, not being as familiar with the use of optically based photostimulation of axons to release transmitter locally, I was initially confused by statements such as "we found that oEPSC returned after application of TTX+4AP". This might be clear to someone performing these manipulations, but a bit more clarification would be helpful. Should I assume that an existing monosynaptic EPSC would be masked by co-occurring polysynaptic IPSCs which disappear following application of TTX + 4AP, thereby unmasking the monosynaptic EPSC, thereby causing the EPSC to "return"? A word that I am not sure works. Continuing my confusion with these experiments, I am unsure how this cocktail of drugs is added, if it is even added as a cocktail, which is what I initially assumed. The methods and the results are not so clear if they are added in sequence and why and if traces are recorded after the addition of both drugs or if they are recorded for TTX and then again for TTX + 4AP. Finally, looking at the traces in the experimental figures (e.g. Figures 2F, 3F, 5F, and 6F), it is diXicult to see what is being shown, at least for me. First, the authors need to describe better in the results why they stimulate twice in short succession and why they seem to use the response to the second pulse (unless I am mistaken) to measure the monosynaptic event. Second, I was confused by the traces (which are very small) in the presence of TTX. I would have expected to see a response if there was a monosynaptic EPSC but I only seem to see a flat line.  

      The confusion that I list above might be due in part to my ignorance, but it is important in these types of papers not to assume too much expertise if you want readers with a less sophisticated understanding of synaptic physiology to understand the data. In other words, a little bit more clarity and hand-holding would be welcome.

      We understand the reviewer’s confusion about the methodology.  In Voltage clamp, the amplifier injects current through the electrode maintaining the membrane voltage to -70mV, where the equilibrium potential for Cl- is near equilibrium, and therefore the only synaptic current evoked by light stimulation is due to cation influx, mainly through AMPA receptors (see Fig. 1).  Therefore, cooccurring polysynaptic IPSCs wouldn’t be visible. We examine those holding the membrane voltage at +10mV, see Fig. 1. TTX application suppresses V-dependent Na+ channels and therefore stops all neurotransmission. We show the traces upon TTX to show that currents we were recording prior to TTX application were of synaptic origin, and not due to accidental expression of opsin in the patched cell. Also, this ensures that any current visible after 4AP application is due to monosynaptic transmission and not to a failure of TTX application.

      After recording and light stimulation with TTX, we then add 4AP, which is a blocker of presynaptic K+ channels. This prevents the repolarization of the terminals that would occur in response to opsinmediated local depolarization. 4AP application, therefore, allows local opsin-driven depolarizations to reach the threshold for Ca2+-dependent vesicle docking and release. This procedure selectively reveals or unmasks the monosynaptic currents because any non-monosynaptically connected neuron would still need V-dependent Na+ channels to eXectively produce indirect neurotransmission onto the patched cell. The TTX and 4AP application is the gold-standard of opsinassisted synaptic circuit interrogation, pioneered by the Svoboda lab in 2009 and widely used to assess monosynaptic connectivity in multiple brain circuits, as summarized in a recent review (Linders et al., 2022). We now include 2 more sentences near the beginning of the Results to clarify this process and directly point to the Linders review for researchers wanting a deeper explanation of this technique. 

      The double stimulation is unrelated to our testing of monosynaptic connections. We originally conducted the experiments by delivering 2 pulses of light separated by 50ms, a common way to examine the pair-pulse ratio (PPR) – a physiological measure which is used to probe synapses for short-term plasticity and release probability. However, through discussions with colleagues we realized that the slow decay time of eGtACR1 may complicate interpretation of the response to the second light pulse. Thus, we elected to not report these results and indicated this in the Methods section:  “We calculated the paired-pulse ratio (PPR) as the amplitude of the second peak divided by the amplitude of the first peak elicited by the twin stimuli, however due to slow kinetics of eGtACR1 the results would be diPicult to interpret, and therefore we are not currently reporting them.” 

      (2) Suggestions for improving summary figures:

      Summary Figure 1a: The circuit diagram (schematic to the right of 1a) is OK but I initially found it a bit diXicult to interpret. For example, it is not clear why pink RA projecting neurons don't reach as far to the right as X or Av projecting neurons, suggesting that they are not really projection neurons. Also, the big question marks in the intermediate zone are not entirely intuitive. It seems there might be a better way of representing this. It might also be worth stating in the figure legend that the interconnectivity patterns shown in the figure between PNs in HVC are based on specific prior studies.

      We thank the reviewer for the constructive criticism. We have modified the figure to extend the RA projection line and mentioned in the figure legend that connectivity between PNs is based on prior studies.

      Summary Figure 1a: I am not sure I love this figure. There are a few minor issues. First, there are too many browns [Nif/AV and mMAN] which makes it more challenging to clearly disambiguate the diXerent projections. Second, it is unclear why this figure does not represent projections from RA to HVC. My biggest concern with this figure is that it oversimplifies some of the findings. From the figure, one gets the impression that Uva only projects to RA-PNs and that Av only projects to X-PNs even though the authors show connections to other PNs. With the small sample size in this current study for each projection and each PN type, one really cannot rule out that these "minority" projections are not important. I, therefore, suggest that the authors qualitatively represent the strength/probability of connections by weighting with thickness of aXerent connections.

      We assume the reviewer is commenting on our summary figure panel 7B. We agree with the referee that this is a simplified representation of our findings. We had indeed indicated in the legend that this was just a “Schematic of the HVC aXerent connectivity map resulting from the present work” and that “For conceptualization purposes, aXerent connectivity to HVC-PNs is shown only when the rate of monosynaptic connectivity reaches 50% of neurons examined”. We have added a title to highlight that this is but a simplification. We have now adjusted the colors to make the figure easier to follow. Based on the reviewers critique we searched for a better method for summarizing the complex connectivity patterns described in this research. We settled on a Sankey diagram of connectivity. This is now Figure 7C. In this diagram, we are able to show the proportion of connections from each input pathway onto each class of neuron and if these connections are poly or monosynaptic. We find this to a straightforward way of displaying all of the connectivity patterns identified in our figure 2-3 and 4-5 look forward to understanding if the reviewers find this a useful way of illustrating our findings.

      Minor points:

      (1) Line 50 - typo - song circuits.

      Thank you for catching this.

      (2) Line 106 - 111 - The findings suggest that 100% of Uva projections onto HVCRA neurons are monosynaptic. However, because the authors only tested 6 neurons their statements that their findings are so diXerent from other studies, should be somewhat tempered since these other studies (e.g. Moll et al.) looked at 251 neurons in HVC and sampling bias could still somewhat explain the diXerence.

      We observed oEPSCs in 43 of 51 (84.3%) HVC-RA neurons recorded (mean rise time = 2.4 ms) and monosynaptic connections onto 100% of the HVC-RA neurons tested (n = 6). Moll et al. combined electrical stimulation of Uva with two-photon calcium imaging (GCaMP6s) of putative HVC-RA neurons (n = 251 neurons). We should note that these are putative HVC-RA neurons because they were not visually identified using retrograde tracing or using some other molecular handle. They report that only ~16% of HVC-RA neurons showed reliable calcium responses following Uva stimulation. Although the experiments by Moll et al are technically impressive, calcium imaging is an insensitive technique for measuring post-synaptic responses, particularly subthreshold responses, when compared to whole-cell patch-clamp recordings. This approach cannot identify monosynaptic connections and is likely limited to only be sensitive suprathreshold activity that likely relies on recruitment of other polysynaptic inputs onto the neurons in HVC. Furthermore, as indicated in the Discussion, our opsin-mediated synaptic interrogation recruits any eGtACR1+ Uva terminal in the slice and therefore will have great likelihood of revealing any existing connections. 

      A limitation of whole-cell patch-clamp recordings is that it is a laborious low throughput technique. Future experiments using better imaging approaches, like voltage imaging, may be able to weigh in on diXerences between what we report here using whole-cell patch-clamp recordings from visually identified HVC-RA neurons combined with optogenetic manipulations of Uva terminals and the calcium imaging results reported by Moll. Nonetheless, whole-cell patch-clamp recordings combined with optogenetic manipulations is likely to remain the most sensitive method for identifying synaptic connectivity.

      (3) Figure 2G - the significance of white circles is not clear.

      The figure legend indicates that those highlight and mark the position of “retrogradely labeled HVCprojecting neurons in Uva (cyan, white circles)” to facilitate identification of colocalization with the in-situ markers.

      (4) Line 135 - Cardin et al. (J. Neurophys. 2004) is the first to show that song production does not require Nif.

      We thank the reviewer pointing this out and we have cited this important study. 

      (5) Line 183 - This is a confusing sentence because I initially thought that mMAN-mMANHVC PNs was a category!

      We switched the dash with a colon.

      (6) Figure 4d could use some arrows to identify what is shown. It is assumed that the box represents mMAN. Should it be assumed that Av is not in the plane of this section? If not, this should be stated in the legend. It is also unclear where the anterograde projections are. Is this the dork highway that goes from the box to the dorsal surface? If yes this should be indicated but it should also be made clear why the projections go both in the dorsal as well as the ventral directions.

      The inset, as indicated by the lines around it, is a magnification of the terminal fields in Av. We added an explanation of the inset.

      (7) Discussion. In the introduction, the authors mention projections from RA to HVC but never end up studying them in the current manuscript which seems like a missed opportunity and perhaps even a weakness of the study. In the discussion, it would certainly be good for the authors to at least discuss the possible significance of these projections and perhaps why they decided not to study them.

      We thank the reviewer for the comment. Unfortunately, we couldn’t reliably evoke interpretable currents from RA, and we elected to publish the current version of the paper with these 4 major inputs. Nonetheless, we have indicated in the Introduction and in the Discussion that more inputs (e.g. RA, A11, NCM) remain to be evaluated. 

      (8) Line 622 - Is this reference incomplete?

      We thank the reviewer. We have corrected the reference.

      • Ben-Tov, M., F. Duarte and R. Mooney (2023). "A neural hub for holistic courtship displays." Curr Biol 33(9): 1640-1653 e1645.

      • Bliss, T. V. and A. R. Gardner-Medwin (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the unanaestetized rabbit following stimulation of the perforant path." J Physiol 232(2): 357-374.

      • Bliss, T. V. and T. Lomo (1973). "Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path." J Physiol 232(2): 331-356.

      • Foster, E. F. and S. W. Bottjer (2001). "Lesions of a telencephalic nucleus in male zebra finches: Influences on vocal behavior in juveniles and adults." J Neurobiol 46(2): 142-165.

      • Koparkar, A., T. L. Warren, J. D. Charlesworth, S. Shin, M. S. Brainard and L. Veit (2024). "Lesions in a songbird vocal circuit increase variability in song syntax." Elife 13.

      • Linders, L. E., L. F. Supiot, W. Du, R. D'Angelo, R. A. H. Adan, D. Riga and F. J. Meye (2022). "Studying Synaptic Connectivity and Strength with Optogenetics and Patch-Clamp Electrophysiology." Int J Mol Sci 23(19).

      • Liu, H. N., T. Kurotani, M. Ren, K. Yamada, Y. Yoshimura and Y. Komatsu (2004). "Presynaptic activity and Ca2+ entry are required for the maintenance of NMDA receptor-independent LTP at visual cortical excitatory synapses." J Neurophysiol 92(2): 1077-1087.

      • Louder, M. I. M., M. Kuroda, D. Taniguchi, J. A. Komorowska-Muller, Y. Morohashi, M. Takahashi, M. Sanchez-Valpuesta, K. Wada, Y. Okada, H. Hioki and Y. Yazaki-Sugiyama (2024). "Transient sensorimotor projections in the developmental song learning period." Cell Rep 43(5): 114196.

      • Pastalkova, E., P. Serrano, D. Pinkhasova, E. Wallace, A. A. Fenton and T. C. Sacktor (2006). "Storage of spatial information by the maintenance mechanism of LTP." Science 313(5790): 1141-1144.

      • Petreanu, L., T. Mao, S. M. Sternson and K. Svoboda (2009). "The subcellular organization of neocortical excitatory connections." Nature 457(7233): 1142-1145.

      • Roberts, T. F., S. M. Gobes, M. Murugan, B. P. Olveczky and R. Mooney (2012). "Motor circuits are required to encode a sensory model for imitative learning." Nat Neurosci 15(10): 1454-1459.

      • Roberts, T. F., E. Hisey, M. Tanaka, M. G. Kearney, G. Chattree, C. F. Yang, N. M. Shah and R. Mooney (2017). "Identification of a motor-to-auditory pathway important for vocal learning." Nat Neurosci 20(7): 978-986.

      • Roberts, T. F., M. E. Klein, M. F. Kubke, J. M. Wild and R. Mooney (2008). "Telencephalic neurons monosynaptically link brainstem and forebrain premotor networks necessary for song." J Neurosci 28(13): 3479-3489.

      • Trusel, M., A. Cavaccini, M. Gritti, B. Greco, P. P. Saintot, C. Nazzaro, M. Cerovic, I. Morella, R. Brambilla and R. Tonini (2015). "Coordinated Regulation of Synaptic Plasticity at Striatopallidal and Striatonigral Neurons Orchestrates Motor Control." Cell Rep 13(7): 1353-1365.

      • Trusel, M., A. Nuno-Perez, S. Lecca, H. Harada, A. L. Lalive, M. Congiu, K. Takemoto, T. Takahashi, F. Ferraguti and M. Mameli (2019). "Punishment-Predictive Cues Guide Avoidance through Potentiation of Hypothalamus-to-Habenula Synapses." Neuron 102(1): 120-127.e124.

      • Vates, G. E., B. M. Broome, C. V. Mello and F. Nottebohm (1996). "Auditory pathways of caudal telencephalon and their relation to the song system of adult male zebra finches." Journal of Comparative Neurology 366(4): 613-642.

      • Xu, T., X. Yu, A. J. Perlik, W. F. Tobin, J. A. Zweig, K. Tennant, T. Jones and Y. Zuo (2009). "Rapid formation and selective stabilization of synapses for enduring motor memories." Nature 462(7275): 915-919.

      • Zhao, W., F. Garcia-Oscos, D. Dinh and T. F. Roberts (2019). "Inception of memories that guide vocal learning in the songbird." Science 366: 83 - 89.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.  

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript. 

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript. 

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm3, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t(106) = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t(106) = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.”

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. 

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.  

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.  

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t(106) = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Response: Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t(106) = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t(106) \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t(106) \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t(106) \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.  

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742. 

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. 

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309. 

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71. 

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118. 

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749. 

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241. 

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126. 

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755. 

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730. 

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120. 

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667. 

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572. 

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224. 

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682. 

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185. 

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796. 

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769. 

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75. 

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579. 

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870. 

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98. 

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421. 

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011. 

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72. 

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169. 

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231. 

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112. 

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119. 

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387. 

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686. 

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9. 

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670. 

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      The following is the authors’ response to the original reviews

      Main revision made to the manuscript

      The main revision made to the manuscript is to reconcile our findings with the line attractor model. The revision is based on Reviewer 1’s comment on reinterpreting our results as a superposition of an attractor model with fast timescale dynamics. We expanded our analysis regime to the start of a trial and characterized the overall within-trial dynamics to reinterpret our findings.

      We first acknolwedge that our results are not in contradiction with evidence integration on a line attractor. As pointed out by the reviewers, our finding that the integration of reward outcome explains the reversal probability activity x_rev (Figure 3) is compatible with the line attractor model. However, the reward integration equation is an algebraic relation and does not characterize the dynamics of reversal probability activity. So a closer analysis on the neural dynamics is needed to assess the feasibility of line attractor.

      In the revised manuscript, we show that x_rev exhibits two different activity modes (Figure 4). First, x_rev has substantial non-stationary dynamics during a trial, and this non-stationary activity is incompatible with the line attractor model, as claimed in the original manuscript. Second, we present new results showing that x_rev is stationary (i.e., constant in time) and stable (i.e., contracting) at the start of a trial. These two properties of x_rev support that it is a point attractor at the start of a trial and is compatible with the line attractor model. 

      We further analyze how the two activity modes are linked (Figure 4, Support vector regression). We show that the non-stationary activity is predictable from the stationary activity if the underlying dynamics can be inferred. In other words, the non-stationary activity during a trial is generated by an underlying dynamics with the initial condition provided by the stationary state at the start of trial.

      These results suggest an extension of the line attractor model where an attractor state at the start of a trial provides an initial condition from which non-stationary activity is generated during a trial by an underlying dynamics associated with task-related behavior (Figure 4, Augmented model). 

      The separability of non-stationary trajectories (Figure 5 and 6) is a property of the non-stationary dynamics that allows separable points in the initial stationary state to remain separable during a trial, thus making it possible to represent distinct probabilistic values in non-stationary activity.

      This revised interpretation of our results (1) retains our original claim that the non-stationary dynamics during a trial is incompatible with the line attractor model and (2) introduces attractor state at the start of a trial which is compatible with the line attractor model. Our anlaysis shows that the two activity modes are linked by an underlying dynamics, and the attractor state serves as initial state to launch the non-stationary activity.

      Responses to the Public Reviews:

      Reviewer # 1:

      (1) To provide better explanation of the reversal learning task and network training method, we added detailed description of RNN and monkey task structure (Result Section 1), included a schematic of target outputs (Figure1B), explained the rationale behind using inhibitory network model (Method Section 1) and explained the supervised RNN training scheme (Result Section 1). This information can also be found in the Methods.

      (2) Our understanding is that the augmented model discussed in the previous page is aligned with the model suggested by Reviewer 1: “a curved line attractor, with faster timescale dynamics superimposed on this structure”. It is likely that the “fast” non-stationary activity observed during the trial is driven by task-related behavior, thus is transient. For instance, we do not observe such non-stationary activity in the inter-trial-interval when the task-related behavior is absent. For this reason, the non-stationary trajectories were not considered to be part of the attractor. Instead, they are transient activity generated by the underlying neural dynamics associated with task-related behavior. We believe such characterization of faster timescale dynamics is consistent with Reviewer 1’s view and wanted to clarify that there are two different activity modes.

      (3) We appreciate the reviewers (Reviewer 1 and Reviewer 2) comment that TDR may be limited in isolating the neural subspace of interest. Our study presents what could be learned from TDR but is by no means the only way to interpret the neural data. It would be of future work to apply other methods for isolating task-related neural activities.

      We would appreciate it if the reviewers could share thoughts on what other alternative methods could better isolate the reversal probability activity.

      Reviewer # 2:

      (1) (i) We respectfully disagree with Reviewer 2’s comment that “no action is required to be performed by neurons in the RNN”. In our network setup, the output of RNN learns to choose a sign (+ or -), as Reviewer 2 pointed out, to make a choice. This is how the RNN takes an action. It is unclear to us what Reviewer 2 has intended by “action” and how reaching a target value (not just taking a sign) would make a significant difference in how the network performs the task. 

      (ii)  From Reviewer 2’s comment that “no intervening behavior is thus performed by neurons”, we noticed that the term “intervening behavior” has caused confusion. It refers to task-related behavior, such as making choices or receiving reward, that the subject must perform across trials before reversing its preferred choice. These are the behaviors that intervene the reversal of preferred choice. To clarify its meaning, in the revised manuscript, we changed the term to “task-related behavior” and put them in context. For example, in the Introduction we state that “However, during a trial, task-related behavior, such as making decisions or receiving feedback, produced …”

      (iii) As pointed out by Reviewer 2, the lack of fixation period in the RNN could make differences in the neural dynamics of RNN and PFC, especially at the start of a trial. We demonstrate this issue in Result Section 4 where we analyze the stationary activity at the start of a trial. We find that fixating the choice output to zero before making a choice promotes stationary activity and makes the RNN activity more similar to the PFC activity.

      Reviewer #3:

      (1) (i) In the previous study (Figure 1 in [Bartolo and Averbeck ‘20]), it was shown that neural activity can predict the behavioral reversal trial. This is the reason we examined the neural activity in the trials centered at the behavioral reversal trial. We explained in Result Section 2 that we followed this line of analysis in our study.

      (ii) We would like to emphasize that the main point of Figures 4 and 5 is to show the separability of neural trajectories: the entire trajectory shifts without overlapping. It is not obvious that high-dimensional neural population activity from two trials should remain separated when their activities are compressed into a one-dimensional subspace. The onedimensional activities can easily collide since their activities are compressed into a lowdimensional space. We revised the manuscript to bring out these points. We added an opening paragraph that discusses separability of trajectories and revised the main text to bring out the findings on separability. 

      (iii) We agree with Reviewer 3 that it would be interesting to look at what happens in other subspace of neural activity that are not related to reversal probability and characterize how different neural subspace interact with each. However, the focus of this paper was the reversal probability activity, and we’d consider these questions out of the scope of current paper. We point out that, using the same dataset, neural activity related to other experimental variables were analyzed in other papers [Bartolo and Averbeck ’20; Tang, Bartolo and Averbeck ‘21] 

      (2) (i) In the revised manuscript, we added explanation on the rational behind choosing inhibitory network as a simplified model for the balanced state. In brief, strong inhibitory recurrent connections with strong excitatory external input operates in the balanced state, as in the standard excitatory-inhibitory network. We included references that studied this inhibitory network. We also explained the technical reason (GPU memory) for choosing the inhibitory model.

      (ii) We thank the reviewer for pointing out that the original manuscript did not mention how the feedback and cue were initialized. They were random vectors sample from Gaussian distribution. We added this information in the revised manuscript. In our opinion, it is common to use random external inputs for training RNNs, as it is a priori unclear how to choose them. In fact, it is possible to analyze the effects of random feedback on one-dimensional x_rev dynamics by projecting the random feedback vector to the reversal probability vector. This is shown in Figure 4F.

      (iii) We agree that it would be more natural to train the RNN to solve the task without using the Bayesian model. We point out this issue in the Discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1:

      (1) My understanding of network training was that a Bayesian ideal observer signaled target output based on previous reward outcomes. However, the authors never mention that networks are trained by supervised learning in the main text until the last paragraph of the discussion. There is no mention that there was an offset in the target based on the behavior of the monkeys in the main text. These are really important things to consider in the context of the network solution after training. I couldn't actually find any figure that presents the target output for the network. Did I miss something key here?

      In Result Section 1, we added a paragraph that describes in detail how the RNN is trained. We explained that the network is first simulated and then the choice outputs and reward outcomes are fed into the Bayesian model to infer the scheduled reversal trial. A few trials are added to the inferred reversal trial to obtain the behavioral reversal trial, as found in a previous study [Bartolo and Averbeck ‘20]. Then the network weights are updated by backpropagation-through-time via supervised learning. 

      In the original manuscript, the target output for the network was described in Methods Section 2.5, Step 4. To make this information readily accessible, we added a schematic in Figure 1B that shows the scheduled, inferred and behavioral reversal trials. It also shows how the target choice ouputs are defined. They switch abruptly at the behavioral reversal trial.

      (2) The role of block structure in the task is an important consideration. What are the statistics of block switches? The authors say on average the reversals are every 36 trials, but also say there are random block switches. The reviewer's notes suggest that both the networks and monkeys may be learning about the typical duration of blocks, which could influence their expectations of reversals. This aspect of the task design should be explained more thoroughly and considered in the context of Figure 1E and 5 results.

      We provided more detailed description of the reversal learning task in Result Section 1. We clarified that (1) a task is completed by executing a block of fixed number of trials and (2) reversal of reward schedule occurrs at a random trial around the mid-trial in a block. The differences in the number of trials in a block that the RNNs (36) and the monkeys (80) perform are also explained. We also pointed out the differences in how the reversal trial is randomly sampled.

      However, it is unclear what Reviewer 1 meant by random block switches. Our reversal learning task is completed when a block of fixed number of trials is executed. Reversal of reward schedule occurs only once on a randomly selected trial in the block, and the reversed reward schedule is maintained until the end of a block. It is different from other versions of reveral learning where the reward schedule switches multiple times across trials. We clarified this point in Result Section 1.

      (3) The relationship between the supervised learning approach used in the RNNs and reinforcement learning was confused in the discussion. "Although RNNs in our study were trained via supervised learning, animals learn a reversal-learning task from reward feedback, making it into a reinforcement learning (RL) problem." This is fundamentally not true. In the case of this work, the outcome of the previous trial updates the target output, rather than the trial and error type learning as is typical in reinforcement learning. Networks are not learning by reinforcement learning and this statement is confusing.

      We agree with Reviewer 1’s comment that the statement in the original manuscript is confusing. Our intention was to point out that our study used supervised learning, and this is different from animals learn by reinforcement learning in rea life. We revised the sentence in Discussion as follows:

      “The RNNs in our study were trained via supervised learning. However, in real life, animals learn a reversal learning task via reinforcement learning (RL), i.e., learn the task from reward outcomes.”

      (4) The distinction between line attractors and the dynamic trajectories described by the authors deserves further investigation. A significant concern arises from the authors' use of targeted dimensionality reduction (TDR), a form of regression, to identify the axis determining reversal probability. While this approach can reveal interesting patterns in the data, it may not necessarily isolate the dimension along which the RNN computes reversal probability. This limitation could lead to misinterpretation of the underlying neural dynamics.

      a) This manuscript cites work described in "Prefrontal cortex as a meta-reinforcement learning system," which examined a similar task. In that study, the authors identified a v-shaped curve in the principal component space of network states, representing the probability of choosing left or right.

      Importantly, this curve is topologically equivalent to a line and likely represents a line attractor. However, regressing against reversal probability in such a case would show that a single principal component (PC2) directly correlates with reversal probability.

      b) The dynamics observed in the current study bear a striking resemblance to this structure, with the addition of intervening loops in the network state corresponding to within-trial state evolution. Crucially, these observations do not preclude the existence of a line attractor. Instead, they may reflect the network's need to produce fast timescale dynamics within each trial, superimposed on the slower dynamics of the line attractor.

      c) This alternative interpretation suggests that reward signals could function as inputs that shift the network state along the line attractor, with information being maintained across trials. The fast "intervening behaviors" observed by the authors could represent faster timescale dynamics occurring on top of the underlying line attractor dynamics, without erasing the accumulated evidence for reversals.

      d) Given these considerations, the authors' conclusion that their results are better described by separable dynamic trajectories rather than fixed points on a line attractor may be premature. The observed dynamics could potentially be reconciled with a more nuanced understanding of line attractor models, where the attractor itself may be curved and coexist with faster timescale dynamics.

      We appreciate the insightful comments on (1) the similarity of the work by Wang et al ’18 with our findings and (2) an alternative interpretation that augments the line attractor with fast timescale dynamics. 

      (1) We added a discussion of the work by Wang et al ’18 in Result Section 2 to point out the similarity of their findings in the principal component space with ours in the x_rev and x_choice space. We commented that such network dynamics could emerge when learning to perform the reversal learning the task, regardless of the training schemes. 

      We also mention that the RL approach in Wang et al ’18 does not consider within-trial dynamics, therefore lacks the non-stationary activity observed during the trial in the PFC of monkeys and our trained RNNs.

      (2) We revised our original manuscript substantially to reconcile the line attractor model with the nonstationary activity observed during a trial. 

      Here are the highlights of the revised interpretation of the PFC and the RNN network activity

      - The dynamics of x_rev consists of two activity modes, i.e., stationary activity at the start of a trial and non-stationary activity during the trial. Schematic of the augmented model that reconciles two activity modes is shown in Figure 4A. Analysis of the time derivative (dx_reverse / dt) and contractivity of the stationary state are shown in Figure 4B,C to demonstrate two activity modes.

      - We discuss in Result Section 4 main text that the stationary activity is consistent with the line attractor model, but the non-stationary activity deviates from the model. 

      - The two activity modes are linked dynamically. There is an underlying dynamics that can map the stationary state to the non-stationary trajectory. This is shown by predicting the nonstationary trajectory with the stationary state using a support vector regression model. The prediction results are shown in Figure 4D,E,F.

      - We discuss in Result Section 4 an extension of the standard line attractor model: points on the line attractor can serve as initial states that launch non-stationary activity associated with taskrelated behavior.

      - The separability of neural trajectories presented in Result Section 5 is framed as a property of the non-stationary dynamics associated with task-related behavior.

      To strengthen their claims, the authors should:

      (1) Provide a more detailed description of their RNN training paradigm and task structure, including clear illustrations of target outputs.

      (2) Discuss how their findings relate to and potentially extend previous work on similar tasks, particularly addressing the similarities and differences with the v-shaped state organization observed in reinforcement learning contexts. (https://www.nature.com/articles/s41593-018-0147-8 Figure1).

      (3) Explore whether their results could be consistent with a curved line attractor model, rather than treating line attractors and dynamic trajectories as mutually exclusive alternatives.

      Our response to these three comments is described above.

      Addressing these points would significantly enhance the impact of the study and provide a more nuanced understanding of how reversal probabilities are represented in neural circuits.

      In conclusion, while this study provides interesting insights into the neural representation of reversal probability, there are several areas where the methodology and interpretations could be refined.

      Additional Minor Concerns:

      (1) Network Training and Reversal Timing: The authors mention that the network was trained to switch after a reversal to match animal behavior, stating "Maximum a Posterior (MAP) of the reversal probability converges a few trials past the MAP estimate." More explanation of how this training strategy relates to actual animal behavior would enhance the reader's understanding of the meaning of the model's similarity to animal behavior in Figure 1.

      In Method Section 2.5, we described how our observation that the running estimate of MAP converges a few trials after the actual MAP is analogous to the animal’s reversal behavior.

      “This observation can be interpreted as follows. If a subject performing the reversal learning task employs the ideal observer model to detect the trial at which reward schedule is reversed, the subject can infer the reversal of reward schedule a few trials past the actual reversal and then switch its preferred choice. This delay in behavioral reversal, relative to the reversal of reward schedule, is analogous to the monkeys switching their preferred choice a few trials after the reversal of reward schedule.”

      In Step 4, we also mentioned that the target choice outputs are defined based on our observation in Step 3.

      “We used the observation from Step 3 to define target choice outputs that switch abruptly a few trials after the reversal of reward schedule, denoted as $t^*$ in the following. An example of target outputs are shown in Fig.\,\ref{fig_behavior}B.”

      (2) How is the network simulated in step 1 of training? Is it just randomly initialized? What defines this network structure?

      The initial state at the start of a block was random. We think the initial state is less relevant as the external inputs (i.e., cue and feedback) are strong and drive the network dynamics. We mentioned these setup and observation in Step 1 of training.

      “Step 1. Simulate the network starting from a random initial state, apply the external inputs, i.e., cue and feedback inputs, at each trial and store the network choices and reward outcomes at all the trials in a block. The network dynamics is driven by the external inputs applied periodically over the trials.”

      (3) Clarification on Learning Approach: More description of the approach in the main text would be beneficial. The statement "Here, we trained RNNs that learned from a Bayesian inference model to mimic the behavioral strategies of monkeys performing the reversal learning task [2, 4]" is somewhat confusing, as the model isn't directly fit to monkey data. A more detailed explanation of how the Bayesian inference model relates to monkey behavior and how it's used in RNN training would improve clarity.

      We described the learning approach in more detail, but also tried to be concise without going into technical details.

      We revised the sentence in Introduction as follows:

      “We sought to train RNNs to mimic the behavioral strategies of monkeys performing the reversal learning task. Previous studies \cite{costa2015reversal, bartolo2020prefrontal} have shown that a Bayesian inference model can capture a key aspect of the monkey's behavioral strategy, i.e., adhere to the preferred choice until the reversal of reward is detected and then switch abruptly. We trained the RNNs to replicate this behavioral strategy by training them on target behaviors generated from the Bayesian model.”

      We also added a paragraph in Result Section 1 that explains in detail how the training approach works.

      (4) In Figure 1B, it would be helpful to show the target output.

      We added a figure in Fig1B that shows a schematic of how the target output is generated.

      (5) An important point to consider is that a line attractor can be curved while still being topologically equivalent to a line. This nuance makes Figure 4A somewhat difficult to interpret. It might be helpful to discuss how the observed dynamics relate to potentially curved line attractors, which could provide a more nuanced understanding of the neural representations.

      As discussed above, we interpret the “curved” activity during the trial as non-stationary activity. We do not think this non-stationary activity would be characterized as attractor. Attractor is (1) a minimal set of states that is (2) invariant under the dynamics and (3) attracting when perturbed into its neighborhood [Strogatz, Nonlinear dynamics and chaos]. If we consider the autonomous system without the behavior-related external input as the base system, then the non-stationary states could satisfy (2) and (3) but not (1), so they are not part of the attractor. If we include the behavior-related external input to the autonomous dynamics, then it may be possible that the non-stationary trajectories are part of the attractor. We adopted the former interpretation as the behavior-related inputs are external and transient.

      (6) The results of the perturbation experiments seem to follow necessarily from the way x_rev was defined. It would be valuable to clarify if there's more to these results than what appears to be a direct consequence of the definition, or if there are subtleties in the experimental design or analysis that aren't immediately apparent.

      The neural activity x_rev is correlated to the reversal probability, but it is unclear if the activity in this neural subspace is causally linked to behavioral variables, such as choice output. We added this explanation at the beginning of Results Section 7 to clarify the reason for performing the perturbation experiments.

      “The neural activity $x_{rev}$ is obtained by identifying a neural subspace correlated to reversal probability. However, it remains to be shown if activity within this neural subspace is causally linked to behavioral variables, such as choice output.”

      Reviewer #2:

      Below is a list of things I have found difficult to understand, and been puzzled/concerned about while reading the manuscript:

      (1) It would be nice to say a bit more about the dataset that has been used for PFC analysis, e.g. number of neurons used and in what conditions is Figure 2A obtained (one has to go to supplementary to get the reference).

      We added information about the PFC dataset in the opening paragraph of Result Section 2 to provide an overview of what type of neural data we’ve analyzed. It includes information about the number of recorded neurons, recording method and spike binning process.

      (2) It would be nice to give more detail about the monkey task and better explain its trial structure.

      In Result Section 1 we added a description of the overall task structure (and its difference with other versions of revesal learning task), the RNN / monkey trial structure and differences in RNN and monkey tasks.

      (3) In the introduction it is mentioned that during the hold period, the probability of reversal is represented. Where does this statement come from?

      The fact that neural activity during a hold period, i.e., fixation period before presenting the target images, encodes the probability of reversal was demonstrated in a previous study (Bartolo and Averbeck ’20). 

      We realize that our intention was to state that, during the hold period, the reversal probability activity is stationary as in the line attractor model, instead of focusing on that the probability of reversal is represented during this period. We revised the sentence to convey this message. In addition, we revised the entire paragraph to reinterpret our findings: there are two activity modes where the stationary activity is consistent with the line attractor model but the non-stationary activity deviates from it.

      (4) "Around the behavioral reversal trial, reversal probabilities were represented by a family of rankordered trajectories that shifted monotonically". This sentence is confusing and hard to understand.

      Thank you for point this out. We rewrote the paragraph to reflect our revised interpretation. This sentence was removed, as it can be considered as part of the result on separable trajectories.

      (5) For clarity, in the first section, when it is written that "The reversal behavior of trained RNNs was similar to the monkey's behavior on the same task" it would be nice to be more precise, that this is to be expected given the strategy used to train the network.

      We removed this sentence as it makes a blanket statement. Instead, we compared the behavioral outputs of the RNNs and the monkeys one by one.

      We added a sentence in Result Section 1 that the RNN’s abrupt behavioral reversal is expected as they are trained to mimic the target choice outputs of the Bayesian model.

      “Such abrupt reversal behavior was expected as the RNNs were trained to mimic the target outputs of the Bayesian inference model.”

      (6) What is the value of tau used in eq (1), and how does it compare to trial duration?

      We described the value of time constant tau in Eq (1) and also discussed in Result Section 1 that tau=20ms is much faster than trial duration 500ms, thus the persistent behavior seen in trained RNNs is due to learning.

      (7) It would be nice to expand around the notion of « temporally flexible representation » to help readers grasp what this means.

      Instead of stating that the separable dynamic trajectories have “temporally flexible representation”, we break down in what sense it is temporally flexible: separable dynamic trajectories can accommodate the effects that task-related behavior have on generating non-stationary neural dynamics.

      “In sum, our results show that, in a probabilistic reversal learning task, recurrent neural networks encode reversal probability by adopting, not only stationary states as in a line attractor, but also separable dynamic trajectories that can represent distinct probabilistic values while accommodating non-stationary dynamics associated with task-related behavior.”

      Reviewer #3:

      (1) Data:

      It would be useful to describe the experimental task, recording setup, and analyses in much more detail - both in the text and in the methods. What part of PFC are the recordings from? How many neurons were recorded over how many sessions? Which other papers have they been used in? All of these things are important for the reader to know, but are not listed anywhere. There are also some inconsistencies, with the main text e.g. listing the 'typical block length' as 36 trials, and the methods listing the block length as 24 trials (if this is a difference between the biological data and RNN, that should be more explicit and motivated).

      We provided more detailed description of the monkey experimental task and PFC recordings in Result Section 1. We also added a new section in Methods 2.1 to describe the monkey experiment.

      The experimental analyses should be explained in more detail in the methods. There is e.g. no detailed description of the analysis in Figure 6F.

      We added a new section in Methods 6 to describe how the residual PFC activity is computed. It also describes the RNN perturbation experiments.

      Finally, it would be useful for more analyses of monkey behaviour and performance, either in the main text or supplementary figures.

      We did not pursue this comment as it is unclear how additional behavioral analyses would improve the manuscript.

      (2) Model:

      When fitting the network, 'step 1' of training in 2.3 seems superfluous. The posterior update from getting a reward at A is the same as that from not getting a reward at B (and vice versa), and it is therefore completely independent of the network choice. The reversal trial can therefore be inferred without ever simulating the network, simply by generating a sample of which trials have the 'good' option being rewarded and which trials have the 'bad' option being rewarded.

      We respectfully disagree with Reviewer 3’s comment that the reversal trial can be inferred without ever simulating the network. The only way for the network to know about the underlying reward schedule is to perform the task by itself. By simulating the network, it can sample the options and the reward outcomes. 

      Our understanding is that Review 3 described a strategy that a human would use to perform this task. Our goal was to train the RNN to perform the task.

      Do the blocks always start with choice A being optimal? Is everything similar if the network is trained with a variable initial rewarded option? E.g. in Fig 6, would you see the appropriate swap in the effect of the perturbation on choice probability if choice B was initially optimal?

      Thank you for pointing out that the initial high-value option can be random. When setting up the reward schedule, the initial high-value option was chosen randomly from two choice outputs and, at the scheduled reversal, it was switched to the other option. We did not describe this in the original manuscript.

      We added a descrption in Training Scheme Step 4 that the the initial high-value option is selected randomly. This is also explained in Result Section 1 when we give an overview of the RNN training procedure.

      (3) Content:

      It is rarely explained what the error bars represent (e.g. Figures 3B, 4C, ...) - this should be clear in all figures.

      We added that the error bars represent the standard error of mean.

      Figure 2A: this colour scheme is not great. There are abrupt colour changes both before and after the 'reversal' trial, and both of the extremes are hard to see.

      We changed the color scheme to contrast pre- and post-reversal trials without the abrupt color change.

      Figure 3E/F: how is prediction accuracy defined?

      We added that the prediction accuracy is based on Pearson correlation.

      Figure 4B: why focus on the derivative of the dynamics? The subsequent plots looking at the actual trajectories are much easier to understand. Also - what is 'relative trial' relative to?

      The derivative was analyzed to demonstrate stationarity or non-stationarity of the neural activity. We think it will be clearer in the revised manuscript that the derivative allows us to characterize those two activity modes.

      Relative trial number indicate the trial position relative to the behavioral reversal trial. We added this description to the figures when “relative trial” is used.

      Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories? As it is now, there will presumably be more rewarded trials early and late in each block, and more unrewarded trials around the reversal point. Does this introduce biases in the analysis? A related question is (i) why the black lines are different in the top and bottom plots, and (ii) why the ends of the black lines are discontinuous with the beginnings of the red/blue lines.

      We could not understand what Reviewer 3 was asking in this comment. It’d help if Review 3 could clarify the following question:

      “Figure 4C: what do these analyses look like if you match the trial numbers for the shift in trajectories?”

      Question (i): We wanted to look at how the trajectory shifts in the subsequent trial if a reward is or is not received in the current trial. The top panel analyzed all the trials in which the subsquent trial did not receive a reward. The bottom panel analyzed all the trials in which the subsequent trial received a reward. So, the trials analyzed in the top and bottom panels are different, and the black lines (x_rev of “current” trial) in the top and bottom panels are different.

      Question (ii): Black line is from the preceding trial of the red/blue lines, so if trials are designed to be continuous with the inter-trial-interval, then black and red/blue should be continuous. However, in the monkey experiment, the inter-trial-intervals were variable, so the end of current trial does not match with the start of next trial. The neural trajectories presented in the manuscript did not include the activity in this inter-trial-interval.

      Figure 6C: are the individual dots different RNNs? Claiming that there is a decrease in Delta x_choice for a v_+ stimulation is very misleading.

      Yes individual dots are different RNN perturbations. We added explanation about the dots in Figure7C caption. 

      We agree with the comment that \Delta x_choice did not decrease. This sentence was removed. Instead, we revised the manuscript to state that x_choice for v_+ stimulation was smaller than the x_choice for v_- stimulation. We performed KS-test to confirm statistical significance.

      Discussion: "...exhibited behaviour consistent with an ideal Bayesian observer, as found in our study". The RNN was explicitly trained to reproduce an ideal Bayesian observer, so this can only really be considered an assumption (not a result) in the present study.

      We agree that the statement in the original manuscript is inaccurate. It was revised to reflect that, in the other study, behavior outputs similar to a Bayesian observer emerged by simply learning to do the task, intead of directly mimicking the outputs of Bayesian observer as done in our study.

      “Authors showed that trained RNNs exhibited behavior outputs consistent with an ideal Bayesian observer without explicitly learning from the Bayesian observer. This finding shows that the behavioral strategies of monkeys could emerge by simply learning to do the task, instead of directly mimicking the outputs of Bayesian observer as done in our study.”

      Methods: Would the results differ if your Bayesian observer model used the true prior (i.e. the reversal happens in the middle 10 trials) rather than a uniform prior? Given the extensive literature on prior effects on animal behaviour, it is reasonable to expect that monkeys incorporate some non-uniform prior over the reversal point.

      Thank you for pointing out the non-uniform prior. We haven’t conducted this analysis, but would guess that the convergence to the posterior distribution would be faster. We’d have to perform further analysis, which is out of the scope of this paper, to investigate whether the posteior distribution would be different from what we obtained from uniform prior.

      Making the code available would make the work more transparent and useful to the community.

      The code is available in the following Github repository: https://github.com/chrismkkim/LearnToReverse

    1. Author response:

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion: “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the following sentence to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We would like to thank the reviewers for their comments, we see great value in the suggestions they made to strengthen our work. We are glad to see that they are in general positive about the manuscript. In the following, we include a point-by-point response to their comments, which are in general consistent with each other.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Sanchez-Cisneros and colleagues, examine how tracheal cell adhesion to the ECM underneath the epidermis helps shape the tracheal system. They show that if cell-ECM adhesion is perturbed the development of the tracheal system and the epidermis is disrupted. They also detect protrusions extending from the dorsal trunk cells towards the ECM. The work is novel, the figures are clear, and the questions are well addressed. However, I find that some of the claims are not completely supported by the data presented. I have some suggestions that will, I believe, clarify certain points.

      Major comments

      At the beginning of the results section as in the introduction the authors claim that "It is generally assumed that trunk displacement occurs due to tip cells pulling on the trunks so that they follow their path dorsally." This sentence is not referenced, and I do not know where it has been shown or proposed to be like this. In addition, the comparison with the ventral branches is also not referenced and the movie does not really show this. Forces generated by tracheal branch migration have been shown to drive intercalation (Caussinus E, Colombelli J, Affolter M. Tip-cell migration controls stalk-cell intercalation during Drosophila tracheal tube elongation. Curr Biol. 2008;18(22):1727-1734. doi:10.1016/j.cub.2008.10.062), but not dorsal trunk (DT) displacement.

      • *

      We agree that dorsal trunk displacement has not been discussed in previous works, just the fact that tip-cell migration influences stalk cell intercalation. We will rephrase this sentence, stating that dorsal trunk displacement has not been studied.

      However, to rule out the possibility that DT displacement and the phenotype observed in XXX is due to dorsal branch pulling forces, the authors should analyze what happens in the absence of dorsal branches (in condition of Dpp signalling inhibition as in punt mutants or Dad overexpression conditions).

      This is a great idea, and we thank the reviewer for suggesting this. We tried to achieve a similar goal by expressing a Dominant Negative FGFR (Breathless-DN) in the tracheal system, since its expression under btl-gal4 affects tip cell migration. However, the phenotype arises too late to have an effect in dorsal branch migration during the stages we were interested in analyzing. The alternative proposed by the reviewer should be more efficient, as blocking Dpp signalling prevents the formation of dorsal branches completely. We have just received flies carrying the UAS-Dad construct. We will express Dad under btl-gal4 and see how this affects dorsal trunk displacement.

      I am concerned about the TEM observations. The authors claim they can identify tracheal cells by their lumen (Fig. 2 C'). However, at stage 15, the tracheal lumen should be clearly identifiable, and the interluminal DT space should be wider relative to the size of the cells. In this case, there is nothing telling us that we are not looking at a dorsal branch or lateral trunk cell. Furthermore, at embryonic stage 15, the tracheal lumen is filled with a chitin filament, which is not visible in these micrographs. Also, there is quite a lot of tissue detachment and empty spaces between cells, which might be a sign of problems in sample fixing. Better images and more accurate identification of dorsal trunk cells is necessary to support the claim that "These experiments revealed a novel anatomical contact between the epidermis and tracheal trunks".

      The protocol that we use for TEM involves performing 1-μm sections that allow us to stage embryos and to identify the anatomical regions using light microscopy and then switch to ultra-thin sections for electron microscopy once we have found the right position within the sample. This approach also allows us to determine the integrity of the sample. We attach here a micrograph of the last section we analyzed before we decided to do the EM analysis. The asterisk (*) points to a region where the multicellular lumen of the trunk is visible. Due to its proximity to the posterior spiracles, we are confident this is the dorsal trunk and not the lateral trunk. We realize now, after comparing this image with an atlas of development (Campos-Ortega and Hartenstein, 2013), that the stage we chose to illustrate the interaction is a stage 14 embryo instead of the stage 15 we indicated in the manuscript. We will change the stage but given that dorsal closure has already started by stage 14, this does not affect our analysis. Still, we apologize for the mis-staging of the embryo.

      In the light-microscopy image, we have overlaid the EM section to the corresponding region of interest. We agree that the lumen should be thicker compared to the length of the cells, if the section would be cutting the trunk through its largest diameter. However, the protrusions we see do not emerge from the middle part of the trunk where the lumen is found but are seen towards the dorsal side of the trunk, where the lumen will no longer be visible in a longitudinal section as the ones we present. In the embryo shown in Figure 2A-C, our interpretation is that the section was done through a very shallow section of the lumen (represented below). We interpret this from the fact that we see abundant electron-dense areas which we think are adherens junctions from multiple cells. These junctions are visible in Figure 2C but are currently not labelled. We will add arrows to increase their visibility.

      Given that protruding cells lie at the base of dorsal branches, it would be expected that in some sections we would find the protrusions close to the dorsal branches. This is in fact what we show in the micrograph shown in Figure 2D, with a lower magnification overview image shown in Figure S2D. In this case, we see a cell in close proximity to the tendon cells on one side (Figure 2D), which is connected to a dorsal branch on the opposite side (shown in Figure S2D). This dorsal branch is clearly autocellular and chitin deposition is visible as expected for the developmental stage. Again, in Figure S2E we see an electron-dense patch near the lumen that corresponds to the adherens junctions that seal the lumen. We see that all this needs to be better explained in the manuscript, so we will elaborate on the descriptions, and incorporate the light microscopy micrograph to the supplemental figures. This should also aid with the anatomical descriptions requested by Reviewer #3. Nevertheless, we think these observations confirm that what we are describing are the contact points between the dorsal trunk and tendon cells.

      Timelapse imaging of the protrusions in DT cells is done with frames every 4 minutes (Video S3). This is not enough to properly show cellular protrusions and the images do not really show interaction with the epidermis. Video S4 has a better time resolution but it is very short and only shows the cut moment. Video S4, shows the cut, but the reported (and quantified recoil) is not clear. Nevertheless, the results are noteworthy and should be further analysed.

      We will acquire high temporal resolution time-lapse images using E-Cadherin::GFP and btl-gal4, UAS-PH::mCherry to show the behaviour of the protrusions on a short time scale.

      • *

      Provided these embryos survive, would it be possible to check if embryos after laser cutting will develop wavy DTs?

      We think it would be interesting to carry out this experiment, but the laser cut experiments were done under a collaborative visit and we would not be able to repeat it in a short-term period.

      What happens to the larvae under the genetic conditions presented in Fig.S3? Do they reach pupal stages? Do these animals reach adult stages?

      We have seen escapers out of these crosses, but we have not quantified the lethality of the experiment. We will analyse this and include it in the manuscript.

      The kayak phenotypes are very interesting and perhaps the authors could explore them more. As in inhibition of adhesion to the ECM, kay mutants display wavy dorsal trunks. Do they have defective adhesion? Fos being a transcription factor, this is a possibility. The authors should at least discuss the kay phenotypes more extensively and present a suitable hypothesis for the phenotype.

      We agree that the kayak experiments might bring more consequences than just preventing dorsal closure. We will complement this approach by blocking dorsal closure by other independent means. We will use pannier-gal4 (a lateral epidermis driver), engrailed-gal4 (a driver for epidermal posterior compartment), and 332-gal4 (an amnioserosa driver) to express dominant-negative Moesin. In our experience, this also delays dorsal closure and it should result in a similar tracheal phenotype as the one we see in kayak embryos.

      Minor comments

      Page 2 Line 9/10 The sentence "tracheal tubes branch and migrate over neighbouring tissues of different biochemical and mechanical properties to ventilate them." should be rewritten. Tracheal cells do not migrate over other tissues to ventilate them.

      We meant to say that tracheal cells migrate over other tissues at the same time as they branch and interconnect to allow gas exchange in their surroundings after tracheal morphogenesis is completed. Ventilation is used here as a synonym for gas exchange or breathing. We will rephrase this if the reviewer considers it confusing.

      Page 2 Line 24/25 The sentence "It has been generally assumed that trunks reach the dorsal side of the embryo because of the pulling forces of dorsal branch migration." needs to be backed up by a reference.

      As explained above, we will rephrase this sentence.

      Page 7 Line 32/23 In this sentence, the references are not related to dorsal closure "Similarly, the signals that regulate epidermal dorsal closure do not participate in tracheal development, or vice versa (Letizia et al., 2023; Reichman-Fried et al., 1994)."

      Our goal in this sentence was to explain that while JNK is required for proper epidermal dorsal closure, loss of JNK signaling in the trachea does not affect tracheal development (as shown by Letizia et al., 2023). At the same time, Reichman-Fried et al., 1994 described the phenotypes of loss of breathless (btl). We will remove this last reference as the work does not study the epidermis. We will rephrase the sentence as: “Similarly, the signals that regulate epidermal dorsal closure do not participate in tracheal development; namely, JNK signaling (Letizia et al., 2023).”

      Page 12 Line 1 "Muscles attach to epidermal tendon cells through a dense meshwork of ECM" this sentence must be referenced.

      We will add the corresponding references for this statement: (Fogerty et al., 1994; Prokop et al., 1998; Urbano et al., 2009). We will change “dense” for “specialized”.

      Fig. S1- Single channel images (A'-C' and A'-C') should be presented in grayscale.

      Fig. S4- Single channel images (A'-D' and A'-D') should be presented in grayscale.

      We will add the grayscale, single-channel images for these figures.

      Reviewer #1 (Significance (Required)):

      The findings shown in this manuscript shed light on the interactions and cooperation between two organs, the tracheal system and the epidermis. These interactions are mediated by cell-ECM contacts which are important for the correct morphogenesis of both systems. The strengths of the work lie on its novelty and live analysis of these interactions. However, its weaknesses are related to some claims not completely backed by the data, some technical issues regarding imaging and some over-interpreted conclusions.

      This basic research work will be of interest to a broad cell and developmental biology community as they provide a functional advance on the importance of cell-ECM interactions for the morphogenesis of a tubular organ. It is of specific interest to the specialized field of tubulogenesis and tracheal morphogenesis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this paper, the authors explore the relationships between two Drosophila tissues - the epidermis and tracheal dorsal trunk (DT) - that get dorsally displaced during mid-late embryogenesis. The show a nice temporal correlation between the movements of the epithelia during dorsal closure and DT displacement. They also show a correlation between the movement of an endogenously tagged version of collagen and the DT, suggesting that the ECM may contribute to this coordinated movement. Through high magnification TEM, they show that tracheal cells make direct contact with the subset of epithelial cells, known as tendon cells, that also serve as muscle attachment sites. In between these contact sites, tracheae are separated from the epithelia by the muscles. Furthermore, the TEMs and confocal imaging of tracheal cells expressing a membrane marker at these contact sites show that the tracheal cells are extending filopodia toward the tendon cells. The authors then explore how a variety of perturbations to the ECM produced by the tendon and DT cells affect DT and epithelial movement. They find that expressing membrane-associated matrix metalloproteases (MMP1 or MMP2) in tendon cells as well as perturbations in integrin or integrin signaling components leads to delays in dorsal displacement as well as defective lengthening of the tracheal DT tubes. They find that defects in the association between the tracheal and epidermal ECM attachments affect dorsal displacement of the epidermis, disrupting dorsal closure.

      Major comments: I like the goals of this paper testing the idea that the ECM plays important roles in the coordination of tissue placement, and I think they have good evidence of that from this study. However, I disagree with the conclusions of the authors that disrupting contact between DT and the tendon cells has no effect on DT dorsal displacement. DT tracheal positioning is clearly delayed; the fact that it takes a lot longer indicates that the ECM does affect the process. It's just that there are likely backup systems in place - clearly not as good since the tracheal tubes end up being the wrong length.

      We agree with this view; in our deGradFP experiments we see a delayed DT displacement. We focused our analyses on the coordination with epidermal remodelling, which remained unaltered, but we in fact see a delayed progression in dorsal displacement of both tissues (Figure 5I-J). We will emphasize this in the corresponding section of the Results.

      It also seems important that the parts of the DT where the dorsal branches (DB) emanate are moving dorsally ahead of the intervening portions of the trachea. This suggests to me that the DB normally does contribute to DT dorsal displacement and that this activity may be what helps the DT eventually get into its final position. The authors should test whether the portions of the DT that contact the DB are under tension. If the DB migration is providing some dorsal pulling force on the DT, this may also contribute to the observed increases in DT length observed with the perturbations of the ECM between the tendon cells and the trachea - if tube lengthening is a consequence of the pulling forces that would be created by parts of the trachea moving dorsally ahead of the other parts. Here again, it would be good to test if the DT itself is under additional tension when the ECM is disrupted.

      • *

      We thank the reviewer for the suggested experiments. We agree with the fact that the dorsal branches should pull on the dorsal trunk and that this interaction should generate tension. Unfortunately, we are unable to test this with the experiments proposed by the reviewer, but we propose an alternative strategy to overcome this. We understand that the reviewer suggests we do laser cut experiments in dorsal branches to see if there is a recoil in the opposite direction of dorsal branch migration. We carried out our laser cut experiments using a 2-photon laser through a visit to the EMBL imaging facility, using funds from a collaborative grant. Funding a second visit would require us to apply for extra funding, which would delay the preparation of the experiments. We are aware of UV-laser setups within our university, however, UV-laser cuts would also affect the epidermis above the dorsal branches, which we think might contribute to recoil we would expect to see.

      Instead of doing laser cuts, we have designed an experiment based on the suggestion of reviewer #1 of blocking Dpp signaling (with UAS-Dad), which would prevent the formation of dorsal branches. We expect that in this experimental setup, the trunk will bend ventrally in response to thepulling forces of the ventral branches. We will also co-express UAS-Dad (to prevent dorsal branch formation) and UAS-Mmp2 (to ‘detach’ the dorsal trunk from the epidermis), and we would expect to at least partially rescue the wavy trunk phenotype.

      Minor comments: The authors need to do a much better job in the intro and in the discussion of citing the work of the people who made many of the original findings that are relevant to this study. Many citations are missing (especially in the introduction) or the authors cite their own review (which most people will not have read) for almost everything (especially in the discussion). This fails to give credit to decades of work by many other groups and makes it necessary for someone who would want to see the original work to first consult the review before they can find the appropriate reference. I know it saves space (and effort) but I think citing the original work is important.

      • *

      The reviewer is right; we apologize for falling into this practice. We will reference the original works wherever it is needed.

      Figure 7 is not a model. It is a cartoon depicting what they see with confocal and TEM images.

      We will change the figure; we will include our interpretations of the phenotypes we observed under different experimental manipulations.

      Reviewer #2 (Significance (Required)):

      Overall, this study is one of the first to focus on how the ECM affects coordination of tissue placement. The coordination of tracheal movement with that of the epidermis is very nicely documented here and the observation that the trachea make direct contact with the tendon cells/muscle attachment sites is quite convincing. It is less clear from the data how exactly the cells of the trachea and the ECM are affected by the different perturbations of the ECM. It seems like this could be better done with immunostaining of ECM proteins (collagen-GFP?), cell type markers, and super resolution confocal imaging with combinations of these markers. What happens right at the contact site between the tendon cell and the trachea with the perturbation? I think that at the level of analysis presented here, this study would be most appropriate for a specialized audience working in the ECM or fly embryo development field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary The manuscript by Sanchez-Cisneros et al provides a detailed description of the cellular interactions between cells of the Drosophila embryonic trachea and nearby tendon and epidermal cells. The researchers use a combination of genetic experiments, light sheet style live imaging and transmission electron microscopy. The live imaging is particularly clear and detailed, and reveals protruding cells. The results overall suggest that interactions mediated through the ECM contribute to development of trachea and dorsal closure of epidermis. One new aspect is the existence of dorsal trunk filipodia that are under tension and may impact tracheal morphogenesis through required integrin/ECM interactions.

      Major comments: - Are the key conclusions convincing? Generally, the key conclusions are well supported by the data, and the movies are very impressive. Interactions between the cell types are clearly shown, as is the correlations in their development. However, some of the images are challenging to decipher for a non-expert in Drosophila trachea, especially the EM images, and some of the data is indirect or a bit weak.

      We thank the reviewer for their observations. As mentioned above in response to Reviewer #1, we will add an overview image of the embryo we processed for TEM that is presented in Figure 2.

      The data related to failure of dorsal closure affecting trachea relies on one homozygous allele of one gene (kayak), and so this is somewhat weak evidence. Even though kay is not detected in trachea, there could be secondary effects of the mutation or another lesion on the mutant chromosome. The segments look a bit uneven in the mutant examples.

      • *

      The reviewer is right; as we proposed before, we will complement the kayak experiments with independent approaches that will delay dorsal closure.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Some of the experiments have low n values, especially in imaging experiments, so these may be more preliminary, but they are in concordance with other data.

      The problem we face in our live-imaging experiments is related to the probability of finding the experimental embryos. In most of our experiments we combine double-tissue labelling plus the expression of genetic tools. This generally corresponds to a very small proportion of the progeny. We will aim to have at least 4 embryos per condition.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Higher n-values would substantiate the claims. To strengthen the argument that dorsal closure affects trachea morphogenesis mechanically, the authors might consider using of a combination of kay mutant alleles or other mutant genes in this pathway to provide stronger evidence. Or they could try a rescue experiment in epidermis and trachea separately for the kay mutants.

      We think our experiments delaying dorsal closure using the Gal4/UAS system and a variety of drivers should address the point of the possible indirect effects of kay in tracheal development.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Imaging data can take awhile to obtain, but the genetic experiments could be done in a couple of months, and the authors should be able to obtain any needed lines within a few weeks.

      The reviewer is correct, we will be able to plan our crosses for the proposed experiments within a couple of months.

      • Are the data and the methods presented in such a way that they can be reproduced? Generally, yes. For the deGrad experiments, it is not clear how the fluorescent intensity was normalized - was this against a reference marker?

      Briefly, we used signals from within the embryo as internal controls. In the case of en-gal4, we normalized the signal to the sections of the embryo where en is not expressed and therefore, beta-integrin levels should not be affected. In the case of btl-gal4, we normalized against the signal surrounding the trunks which should also not be affected by the deGradFP system. We will elaborate on these analyses in the methods section.

      Are the experiments adequately replicated and statistical analysis adequate? There are several experiments with low n values, so this could fall below statistical significance. For example, data shown in Fig 1G: n=3; Fig 4D n=4, n=3; Fig 6J n=4

      As mentioned above, we will increase our sample sizes.

      Minor comments: - Specific experimental issues that are easily addressable. To make the TEM images more easily interpreted, it would be helpful to provide a fluorescent image of all the relevant cell types (especially trachea, epidermis, muscle, and tendon cells, plus segmental boundaries) labelled accordingly, so that reader can correlate them more easily with the TEM images. They might also include a schematic of an embryo to show where the TEM field of view is.

      We believe this should be addressed by adding the light microscopy section of the embryo with the TEM image overlaid as illustrated above.

      It is hard to be confident that the EM images reflect the cells they claim and that the filopodia are in fact that, at least for people not used to looking at these types of images.

      As we explained in the response to Reviewer #1, we will elaborate on the descriptions of our TEM data. We think that adding the reference micrograph will aid with the interpretations of the TEM images.

      • Are prior studies referenced appropriately? yes
      • Are the text and figures clear and accurate? yes

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? The writing could be revised to be a bit clearer. Since the results of the experiments do not support the initial hypothesis, I found it a bit confusing as I read along. It may help to introduce an alterative hypothesis earlier to make the paper more logical and easy to follow. To be more specific, On page 3, the authors say they "show that dorsal trunk displacement is mechanically coupled to the remodelling of the epidermis" and also in the results comment that "With two opposing forces pulling the trunks other factors likely participate in their dorsal displacement, but so far these have remained unstudied." But that doesn't end up being what they find. The results from figure 5 and related interpretation on page 17 says "cell-ECM interactions are important for proper trunk morphology, but not for its displacement." So this was confusing to read and I would encourage the authors to frame the issues a bit differently in terms of tube morphogenesis.

      We see how this might be confusing. We will rewrite the introduction so that the work is easier to follow. To achieve this, we will state from the beginning the mechanisms we anticipate that regulate trunk displacement: 1) adhesion to the epidermis, 2) pulling forces from the dorsal branches and 3) a combination of both.

      Some minor presentation issues: What orientation is the cross-sectional view in figure 1C and movie 1?

      We will add a dotted box that indicates the region that we turned 90° to show the cross-section.

      On page 12, the authors say the "Electron micrographs also suggested high filopodial activity" but activity suggests dynamics that are not clear from EM. This could be re-phrased.

      As the reviewer indicates, we cannot conclude dynamics from a static image. We will replace “suggested high filopodial activity” with “revealed filopodial abundance”.

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The results of the paper are significant in that they characterize a mechanical interaction between two tissue types in development, which are linked by the extracellular matrix that sits between them. It is not clear to me that this describes a "novel mechanism for tissue coordination" as stated in the abstract, but it does characterize this type of interaction in a detailed cellular way.

      • Place the work in the context of the existing literature (provide references, where appropriate). For specialists, the work identifies a novel protruding cell type in the fly embryonic trachea, and provides beautiful and detailed imaging data on tracheal development. The "wavy" trachea phenotype is also uncommon and very interesting, so this result could be linked to the few papers that also describe this phenotype and be built up.

      • State what audience might be interested in and influenced by the reported findings. As it stands, this is most interesting for a specialized audience because it requires some understanding of the development of this system in particular. As it characterizes this to a new level of detail, it could be influential to those in the field. Some addition clarification of the results and re-framing could make the manuscript more clear and interesting for non-specialists.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. I work with Drosophila and have studied embryonic and adult cell types, although not trachea specifically. I am familiar with all the genetic techniques and imaging techniques used here.

    1. Reviewer #3 (Public review):

      In a characteristically bold fashion, Lee Berger and colleagues argue here that markings they have found in a dark isolated space in the Rising Star Cave system are likely over a quarter of a million years old and were made intentionally by Homo naledi, whose remains nearby they have previously reported. As in a European and much later case they reference ('Neanderthal engraved 'art' from the Pyrenees'), the entangled issues of demonstrable intentionality, persuasive age and likely authorship will generate much debate among the academic community of rock art specialists. The title of the paper and the reference to 'intentional designs', however, leave no room for doubt as to where the authors stand, despite an avoidance of the word art, entering a very disputed terrain. Iain Davidson's (2020) 'Marks, pictures and art: their contributions to revolutions in communication', also referenced here, forms a useful and clearly articulated evolutionary framework for this debate. The key questions are: 'are the markings artefactual or natural?', 'how old are they?' and 'who made them?, questions often intertwined and here, as in the Pyrenees, completely inseparable. I do not think that these questions are definitively answered in this paper and I guess from the language used by the authors (may, might, seem etc) that they do not think so either.

      Before considering the specific arguments of the authors to justify the claims of the title, we should recognise the shift in the academic climate of those concerned with 'ancient markings' that has taken place over the past two or three decades. Before those changes, most specialists would probably have expected all early intentional markings to have been made by Homo sapiens after the African diaspora as part of the explosion of innovative behaviours thought to characterise the 'origins of modern humans'. Now, claims for earlier manifestations of such innovations from a wider geographic range are more favourably received, albeit often fiercely challenged as the case for Pyrenean Neanderthal 'art' shows (White et al. 2020). This change in intellectual thinking does not, however, alter the strict requirements for a successful assertion of earlier intentionality by non-sapiens species. We should also note that stone, despite its ubiquity in early human evolutionary contexts, is a recalcitrant material not easily directly dated whether in the form of walling, artefact manufacture or potentially meaningful markings. The stakes are high but the demands no less so.

      Why are the markings not natural? Berger and co-authors seem to find support for the artefactual nature of the markings in their location along a passage connecting chambers in the underground Rising Star Cave system. The presumption is that the hominins passed by the marked panel frequently. I recognise the thinking but the argument is weak. More confidently they note that "In previous work researchers have noted the limited depth of artificial lines, their manufacture from multiple parallel striations, and their association into clear arrangement or pattern as evidence of hominin manufacture (Fernandez-Jalvo et al. 2014)". The markings in the Rising Star Cave are said to be shallow, made by repeated grooving with a pointed stone tool that has left striations within the grooves, and to form designs that are "geometric expressions" including crosshatching and cruciform shapes. "Composition and ordering" are said to be detectable in the set of grooved markings. Readers of this and their texts will no doubt have various opinions about these matters, mostly related to rather poorly defined or quantified terminology. I reserve judgement, but would draw little comfort from the similarities among equally unconvincing examples of early, especially very early, 'designs'. Two or even three half convincing arguments do not add up to one convincing one.

      The authors draw our attention to one very interesting issue: given the extensive grooving into the dolomite bedrock by sharp stone objects, where are these objects? Only one potential 'lithic artefact' is reported, a "tool-shaped rock [that] does resemble tools from other contexts of more recent age in southern Africa, such as a silcrete tool with abstract ochre designs on it that was recovered from Blombos Cave (Henshilwood et al. 2018)", also figured by Berger and colleagues. A number of problems derive from this comparison. First, 'tool-shaped rock' is surely a meaningless term: in a modern toolshed 'tool-shaped' would surely need to be refined into 'saw-shaped', 'hammer-shaped' or 'chisel-shaped' to convey meaning? The authors here seem to mean that the Rising Star Cave object is shaped like the Blombos painted stone fragment? But the latter is a painted fragment not a tool and so any formal similarity is surely superficial and offers no support to the 'tool-ness' of the Rising Star Cave object. Does this mean that Homo naledi took (several?) pointed stone tools down the dark passsageways, used them extensively and, whether worn out or still usable, took them all out again when they left? Not impossible, of course. And the lighting?

      The authors rightly note that the circumstance of the markings "makes it challenging to assess whether the engravings are contemporary with the Homo naledi burial evidence from only a few metres away" and more pertinently, whether the hominins did the markings. Despite this honest admission, they are prepared to hypothesise that the hominin marked, without, it seems, any convincing evidence. If archaeologists took juxtaposition to demonstrate authorship, there would be any number of unlikely claims for the authorship of rock paintings or even stone tools. The idea that there were no entries into this Cave system between the Homo naledi individuals and the last two decades is an assertion not an observation and the relationship between hominins and designs no less so. In fact the only 'evidence' for the age of the markings is given by the age of the Homo naledi remains, as no attempt at the, admittedly very difficult, perhaps impossible, task of geochronological assessment, has been made.

      The claims relating to artificiality, age and authorship made here seem entangled, premature and speculative. Whilst there is no evidence to refute them, there isn't convincing evidence to confirm them.

      References:

      Davidson, I. 2020. Marks, pictures and art: their contribution to revolutions in communication. Journal of Archaeological Method and Theory 27: 3 745-770.

      Henshilwood, C.S. et al. 2018. An abstract drawing from the 73,000-year-old levels at Blombos Cave, South Africa. Nature 562: 115-118.

      Rodriguez-Vidal, J. et al. 2014. A rock engraving made by Neanderthals in Gibralter. Proceedings of the National Academy of Sciences.

      White, Randall et al. 2020. Still no archaeological evidence that Neanderthals created Iberian cave art.

      Comments on latest version:

      The authors have not modified their stance or the authority of their arguments since the original paper.

    2. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their very constructive and helpful comments on the previous version of this manuscript. They have focused on some important issues and have raised many valuable questions that we expect to answer as research begins on these markings. As has been often the case with preprints, a number of experts beyond the four reviewers and editor have provided comments, questions, and suggestions, and we have taken these on board in our revision of the manuscript. In particular, Martinón-Torres et al. (2024) focused several comments upon this manuscript and raise some points that were not considered by the reviewers, and so we discuss those points here in addition to the reviewer comments.

      Some of us have been engaged in other aspects of the possible cultural activities of Homo naledi. After the discovery of these markings we considered it indefensible to publish further research on the activity of H. naledi within this part of the cave system without making readers aware that the H. naledi skeletal remains occur in a spatial context near markings on cave walls. Of course, the presence of markings leaves many questions open. A spatial context does not answer all questions about the temporal context. The situation of the Dinaledi Subsystem does entail some constraints that would not apply to markings within a more open cave or rock wall, and we discuss those in the text.

      We find ourselves in agreement with most of the reviewers on many points. As reflected by several of the reviewers, and most pointedly in the remarks by reviewer 1, the purpose of this preprint is a preliminary report on the observation of the markings in a very distinctive location. This initial report is an essential step to enable further research to move forward. That research requires careful planning due to the difficulty of working within the Dinaledi Subsystem where the markings are located. This pattern of initial publication followed by more detailed study is common with observations of rock art and other markings identified in South Africa and elsewhere. We appreciate that the reviewers have understood the role of this initial study in that process of research.

      Because of this, the revised manuscript represents relatively minimal changes, and all those at the advice of reviewers. Many thanks to all the reviewers for noting various typographic errors, missed references and other issues that we have done our best to fix in the revised manuscript.

      Expertise of authors. Reviewer 4 mentions that the expertise of the authors does not include previous publication history on the identification of rock art, and other reviewers briefly comment that experts in this area would enhance the description. AF does have several publications on ancient engravings and other markings; LRB has geological training and field experience with rock art. Notwithstanding this, we do take on board the advice to include a wider array of subject experts in this research, and this is already underway.

      Image enhancement. We appreciate the suggestions of some reviewers for possible strategies to use software filters to bring out details that may not be obvious even with our cross-polarization lighting and filtering. These are great ideas to try. In this manuscript we thought that going very far into software editing or image enhancement might be perceived by some readers as excessive manipulation, particularly in an age of AI. In future work we will experiment with the suggested approaches. 

      Natural weathering. In the process of review and commentary by experts and the public there has been broad acceptance that many of the markings illustrated in this paper are artificial and not a product of natural weathering of the dolomite rock. We deeply appreciate this. At the same time, we accept the comments from reviewers that some markings may be difficult to differentiate from natural weathering, and that some natural features that were elaborated or altered may be among the markings we recognize. On pages 3 and 4 we present a description of the process of natural subaerial weathering of dolomite, which we have rooted in several references as well as our own observations of the natural weathering visible on dolomite cave walls in the Rising Star cave system. This includes other cave walls within the Dinaledi Subsystem. We discuss the “elephant skin” patterning of natural dolomite surface weathering, how that patterning emerges, and how that differs from the markings that are the subject of this manuscript.

      Animal claw marks. Martinón-Torres et al. 2024 accept that some of the markings illustrated on Panel A are artificial, but they offer the hypothesis that some of those markings may be consistent with claw marks from carnivores or other mammals. They provide a photo of claw marks within a limestone cave in Europe to illustrate this point. On pages 5 and 6 of the revised manuscript we discuss the hypothesis of claw marks. We discuss the presence of animals in southern Africa that may dig in caves or mark surfaces. However the key aspect of the Malmani dolomite caves is that the hardness of dolomitic limestone rock is much greater than many of the limestone caves in other regions such as Europe and Australia, where claw marks have been noted in rock walls. As we discuss, we have not been able to find evidence of claw marks within the dolomite host bedrock of caves in this region, although carnivores, porcupines, and other animals dig into the soft sediments within and around caves. The form of the markings themselves also counter-indicates the hypothesis that they are claw marks. 

      Recent manufacture. One comment that occurs within the reviews and from other readers of the preprint is that recent human visitors to the cave, either in historic or recent prehistoric times, may have made these marks. We discuss this hypothesis on page 6 of the revised manuscript. The simple answer is that no evidence suggests that any human groups were in the Dinaledi Subsystem between the presence of H. naledi and the entry of explorers within the last 25 years. The list of all explorers and scientific visitors to have entered this portion of the cave system is presented in a table. We can attest that these people did not make the marks. More generally, such marks have not been known to be made by cavers in other contexts within southern Africa.

      Panels B and C. We have limited the text related to these areas, other than indicating that we have observed them. The analysis of these areas and quantification of artificial lines does not match what we have done for the Panel A area and we leave these for future work. 

      Presence of modern humans. We have observed no evidence of modern humans or other hominin populations within the Dinaledi Subsystem, other than H. naledi. Several reviewers raise the question of whether the absence of evidence is evidence of absence of modern humans in this area. This is connected by two of the reviewers to the observation that the investigation of other caves in recent years has shown that markings or paintings were sometimes made by different groups over tens of thousands of years, in some cases including both Neanderthals and modern humans. We have decided it is best for us not to attempt to prove a negative. It is simple enough to say that there is no evidence for modern humans in this area, while there is abundant evidence of H. naledi there.

      Association with H. naledi. Reviewer 2 made an incisive point that the previous version contained some text that appeared contradictory: on the one hand we argued that modern humans were not present in the subsystem due to the absence of evidence of them, yet we accepted that H. naledi may have been present for a longer time than currently established by geochronological methods.

      We appreciate this comment because it helped us to think through the way to describe the context and spatial association of these markings and the skeletal remains, and how it may relate to their timeline. Other reviewers also raised similar questions, whether the context by itself demonstrates an association with H. naledi. We have revised the text, in particular on pages 5 and 7, to simply state that we accept as the most parsimonious alternative at present the hypothesis that the engravings were made by H. naledi, which is the only hominin known to be present in this space.

      Age of H. naledi in the system. At one place in the previous manuscript we indicated that we cannot establish that H. naledi was only active in the cave system within the constraints of the maximum and minimum ages for the Dinaledi Subsystem skeletal remains (viz., 335 ka – 241 ka), because some localities with skeletal material are undated. We have adjusted this paragraph on page 7 to be clear that we are discussing this only to acknowledge uncertainty about the full range of H. naledi use of the cave system.

      Geochronological methods. Several reviewers discuss the issue of geochronology as applied to these markings. This is an area of future investigation for us after the publication of this initial report. As some reviewers note, the prospects for successful placement of these engraved features and other markings with geochronological methods depends on factors that we cannot predict without very high-resolution investigation of the surfaces. We have included greater discussion of the challenges of geochronological placement of engravings on page 6, including more references to previous work on this topic. We also briefly note the ethical problems that may arise as we go further with potentially  invasive, destructive or contact studies of these engravings, which must be carefully considered by not just us, but the entire academy.

      Title. Some reviewers suggested that the title should be rephrased because this paper does not use chronological methods to derive date constraints for the markings. We have rephrased the title to reflect less certainty while hopefully retaining the clear hypothesis discussed in the paper.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1.1. It would be helpful if the authors could discuss whether there is any correlation between cryptic sites and the extent of experimental validation in the Phosphosite database (e.g. those that were only identified in one or a few MS experiments). It is difficult to determine stoichiometry of phosphorylation experimentally, but can any inference be made on the extent of phosphorylation of cryptic sites vs. more conventional sites located in IDRs or on the surface of globular domains?

      We thank the reviewer for this valuable suggestion. To investigate the extent of the experimental validation of phosphosites, we examined the number of supporting studies for each site reported in the PhosphoSitePlus database. Specifically, we summed the values of the LT_LIT (literature-based experiments), MS_LIT (mass spectrometry literature), and MS_CST (Cell Signaling Technology mass spectrometry) fields to count the number of independent studies supporting each phosphorylation site, either cryptic or non-cryptic. To visualize the results, we plotted the number of supporting references vs the relative solvent accessibility (RSA) distribution of phosphosites (Figure R1). The analysis revealed a direct correlation between the RSA of phosphosites and the number of studies supporting their phosphorylation. This observation may arise from an intrinsic difficulty in studying cryptic phosphosites due to their destabilizing effects on native proteins. Notably, no differences were observed in the number of supporting studies within cryptic phosphosites (Figure R1B). We have not mentioned these analyses in the new version of the manuscript. However, we would gladly add it if the editor or the reviewer advises accordingly.

      1.2. The authors note that a larger percentage of tyrosine phosphorylation sites are cryptic compared with serine/threonine sites. I assume that tyrosine itself is more highly enriched in the hydrophobic cores of proteins relative to serine or threonine, due to its bulky hydrophobic side chain. Is the increased proportion of cryptic tyrosine phosphorylation sites more, less, or the same as the proportion of tyrosine in hydrophobic cores relative to serine and threonine?

      We thank the reviewer for this insightful comment. As correctly noted, tyrosine residues tend to be enriched in the hydrophobic cores of proteins, as reflected by their generally lower relative solvent accessibility (RSA) values, regardless of phosphorylation state. This enrichment is likely due to the tyrosine side chain's bulky and partially hydrophobic nature. To address the reviewer's question, we compared the RSA distributions of phosphorylated tyrosine, serine, and threonine residues with that of the same residues non-phosphorylated in the human proteome (Figure R2). In order to statistically compare the two distributions, we employed the Mann-Whitney test. The large sample size inevitably yields very low p-values, even when the distributions differ mildly (pThr, pSer vs non-p Thr, Ser, p 1.3. Fig. 5D and E: I had some trouble interpreting these figures. Indicating where the native state is in the plots would be helpful (stated in text as lower right, but a rectangle on the plot would make this more obvious). The text discusses three metastable intermediates, but what is the fourth one shown on the figures (well A, close to the native state)? This could be more explicitly explained.

      We added the missing rectangles into the original Fig. 5D and E (see below Figure R3 and R4). The three metastable intermediates discussed in the original text reflect protein conformers in which the cryptic site is exposed to the solvent. Conversely, the fourth state, and the final native state, are conformations in which the site is already partially or fully cryptic. The observation that the masking of cryptic sites coincides with the latest folding steps allows us to hypothesize a mechanism by which cryptic phosphorylation may regulate protein folding. Following the reviewer's suggestion, we now specify more explicitly each conformation in the new version of the legends of the relative figures (text file with track changes, lines 950 and 1017).

      1.4. The fact that phosphomimetic mutations of cyptic sites in SMAD2 and CHK1 lead to lower expression levels and shorter half-lives is not surprising, given the expected disruption of the hydrophobic core by introduction of a charged residue. The results certainly show that if phosphorylated, these sites would decrease expression and half-life. With respect to half-life, however, if the authors are correct and cryptic sites are predominately phosphorylated co-translationally, one would expect that the half-life curves for the wt protein would not be a simple exponential, but would instead reflect two distinct populations: those that are phosphorylated during translation, and are almost immediately degraded, and those that escape phosphorylation and have the same half-life as the non-phosphorylatable mutant. Are the actual experimental results consistent with this two-population model? If not, this would be evidence that some of these cryptic sites can be exposed post-translation, either by thermal fluctuation or biological interactions.

      We thank the reviewer for this insightful point. The readout employed in our study (i.e., western blotting) measures the aggregate signal from the total protein population in the cell culture. It thus reflects average protein levels rather than the dynamics of individual molecules. As such, it is not well-suited to resolving coexisting populations with distinct half-lives. We agree that if phosphorylation of cryptic sites occurs strictly co-translationally, one might expect a biphasic decay curve. However, due to methodological constraints, our assay provides only a single exponential fit to the global turnover kinetics. While we cannot entirely exclude the possibility that cryptic sites may become exposed post-translationally (e.g., due to thermal fluctuations or interactions), our molecular dynamics simulations did not reveal such exposure events within the simulated timescales. Therefore, while the two-population model remains plausible in principle, our results are consistent with a co-translational phosphorylation and degradation model. Forthcoming experiments aimed at characterizing the phosphorylation of ribosome-associated nascent chains in the human proteome may further validate this conclusion.

      1.5. The authors make a point that cryptic phosphosites are more highly conserved than non-cryptic phosphosites, but it is not clear to me whether it is the side chain itself or its ability to be phosphorylated that is conserved. Supplemental Fig. 9, if I am interpreting it correctly, would suggest it is the residue itself and not its phosphorylation that is conserved. If so, wouldn't this suggest that phosphorylation of these cryptic sites is just an inevitable consequence of the conservation of serine, threonine, and tyrosine residues in hydrophobic core regions? If the authors have evidence that argues against this simple hypothesis, they should discuss it (e.g., cryptic phosphosites are more highly conserved in some cases than non-phosphorylated tyrosine, serine, and threonine residues that are not solvent accessible).

      We agree with the reviewer's interpretation. The higher conservation of cryptic phosphosites likely reflects the evolutionary constraint on hydrophobic core residues, which tend to be more conserved due to their role in structural stability. This conservation does not imply phosphorylation at those sites is functionally selected across species. Instead, when such residues are phosphorylated, as we observe in the human proteome, the effect is often destabilizing and associated with protein degradation. Our analysis does not establish that the phosphorylation of cryptic residues is conserved across species, only that the residues themselves are. We appreciate the reviewer's suggestion and now explicitly discuss this point in the revised manuscript to clarify the distinction between residue conservation and phosphorylation conservation (text file with track changes, line 618)

      1.6. Regarding the evolutionary conservation of cryptic sites, have the authors taken into consideration that tyrosine-specific kinases, phosphatases, and reader domains first appeared in the first metazoans, and are for the most part not seen in non-metazoan eukaryotes? I notice some of the proteomes used for the conservation analysis include plants and yeast, which lack most tyrosine phosphorylation.

      We thank the reviewer for this insightful comment. In response to the suggestion, we have recalculated the entropic conservation score by restricting the analysis to metazoan species. This analysis ensures that the evolutionary context more accurately reflects the presence and functional relevance of tyrosine-specific kinases, phosphatases, and reader domains. The comparison between the entropic score distribution calculated by including or not non-metazoan orthologues show statistically significant differences for both serine and threonine, and tyrosine. However, the large sample sizes translate inevitably into statistically significant p-values, even when the differences in mean are minimal and the standard deviations relatively small. To better assess the practical relevance of these differences, we calculated Cohen's d as a measure of effect size (Table R1). The coefficient helps assess the size and biological significance of a difference (>0.2 = small effect; >0.5 = medium effect; >0.8 = large effect). The analysis indicates a very modest deviation in entropic scores by including or not non-metazoan orthologues.

      1.7. I find the argument that phosphorylation of exposed core residues is part of normal protein quality control/proteostasis to be convincing. Can the authors provide any experimental evidence to support this model (for example, greater phosphorylation of cryptic sites under stress conditions)? I don't think these experiments are necessary, but would seem to be a logical next step and could be done quite easily through collaboration.

      We appreciate the reviewer's suggestion and fully agree that showing more significant phosphorylation of cryptic sites under stress conditions could represent an exciting future direction. We are conducting experiments on individual tumor suppressors such as p53 and PTEN, which harbor cryptic phosphosites, to test whether cellular stress conditions enhance phosphorylation at these positions. These studies assess whether such modifications contribute to altered protein stability or function in stress or disease contexts, particularly cancer. We plan to communicate these results in forthcoming publications and are currently open to collaborations to broaden this line of investigation.

      1.8. The authors note at the end of the discussion that targeting cryptic phosphosites might be a strategy to selectively degrade some proteins in cancer. Practically, how would this work? I can't think of how, but perhaps the authors can provide more specific suggestions.

      We thank the reviewer for raising this important point. One promising approach to therapeutically exploit cryptic phosphosites builds on the PPI-FIT principles (Pharmacological Protein Inactivation by Folding Intermediate Targeting). This strategy targets transient structural pockets appearing only in folding intermediates (Spagnolli et al., Comm Biology 2021). In this context, kinases that phosphorylate cryptic sites could be modulated, either inhibited or redirected, so that misfolded or oncogenic proteins are selectively marked for degradation. For example, selectively enhancing the phosphorylation of a cryptic site on an oncogenic protein could destabilize it and promote its degradation via the proteasome. Conversely, preventing phosphorylation at a cryptic site on a tumor suppressor (e.g., by inhibiting the specific kinase) could enhance protein stability and restore function. While this concept is still emerging, it offers an exciting therapeutic avenue that complements our findings. We added a paragraph addressing this point in the discussion section of the new version of the manuscript (text file with track changes, line 716).

      1.9. Introduction: "It involves the addition of a phosphate to an hydroxyl group found in the side chain of specific amino acids, typically serine, threonine or tyrosine residues." Of course serine, threonine, and tyrosine are the only standard amino acids with a simple hydroxyl group, so "typically" is not needed here.

      We have removed the word "typically" to reflect the accurate chemical specificity of phosphorylation events (text file with track changes, line 82).

      1.10. In my view this is an important study, bringing rigor and a broad proteomic perspective to a phenomenon that (to my knowledge) had not been carefully examined previously. In terms of the big picture, I am of two minds. On the one hand, showing that phosphorylation of hydrophobic core residues exposed during translation or the early stages of folding can regulate steady state levels of some proteins provides an intriguing new mechanism to control the complement of proteins in the cell, and is potentially an area of regulation in normal physiology or in disease. On the other hand, if this is just part of the normal proteostatic mechanisms (hydrophobic core residues exposed for too long consign the protein to degradation, before it can lead to aggregation and other problems), that is a little less interesting to me. I think future work to tease out whether this mechanism is actually regulated and used by the cell to transmit information will be key. But the first step is showing that the phenomenon is real and widespread, and in my view this preprint accomplishes that goal very well.

      We appreciate the reviewer's thoughtful summary and agree that distinguishing between passive proteostatic clearance and active regulatory function is essential. Toward this goal, we plan to carry out a phosphoproteomic analysis of ribosome-associated nascent chains. By mapping phosphorylation events during translation, we aim to validate our cryptic phosphosite dataset in a co-translational context and potentially identify novel regulatory modifications. This approach will also help us assess whether phosphorylation at cryptic sites is modulated context-dependently, thereby supporting a role in regulated protein expression rather than solely quality control.

      2.1. Evolutionary comparison whether cryptic and non-cryptic sites are differently conserved. Two distinct distributions for cryptic and non-cryptic phospho-sites are observed and Figure 6 shows two entropy distributions of cryptic v non-cryptic. Here it is unclear whether this is significant given the different distributions of the two types when non modified.

      We thank the reviewer for raising this critical point. Due to the large sample sizes in our analysis, statistical tests inevitably yield very low p-values, even when differences in mean are minimal and the standard deviations relatively small. To better assess the practical relevance of these differences, we calculated Cohen's d as a measure of effect size (Table R2). The comparison between cryptic and non-cryptic phosphosites yielded an effect size (Cohen's d = 0.4028) slightly lower than the one obtained for residues lying within protein cores or exposed on protein surfaces (Cohen's d = 0.5126), both indicating a modest but meaningful shift in entropic scores. In contrast, the comparisons between cryptic phosphosites and all core residues, as well as non-cryptic phosphosites and all surface residues, showed negligible effect sizes (Cohen's d = 0.0245 and 0.1326, respectively). These findings suggest that while statistical significance is achieved in all cases, only the difference between cryptic and non-cryptic phosphosites, or core and surface residues, reflects a meaningful biological signal. We have now included these data in the new version of the manuscript (text file with track changes, line 544).

      2.2. The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified ____clusters, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      We fully agree with the reviewer that PTM distribution is non-random and influenced by structural and functional constraints, making comparative analyses challenging. To ensure rigor, we implemented a robust computational pipeline. Unlike other PTMs found almost exclusively on solvent-exposed residues, phosphorylation uniquely showed a distinct subset of sites with extremely low solvent accessibility. This pattern held even after applying stringent structural and dynamical filters. Specifically, we excluded low-confidence residues, small or unstructured domains, and sites that become exposed due to thermal fluctuations, using the SPECTRUS-based dynamic analysis. While we cannot entirely rule out context-specific exposure in fully folded proteins (e.g., during protein-protein interactions), we validated selected cryptic sites experimentally, and our findings were consistent with the computational predictions. We believe this multilayered approach strengthens the reliability of our classification and distinguishes cryptic phosphosites from the broader PTM landscape.

      2.3. Very basic question: How do you assessed the RSA value of the residues from the alphafold structure. If it is sequence based, then it is unclear what the alpha fold structure actually contributes in this step? Although I assume it is structure based, it is not well described, only a reference.

      We calculated the RSA values using the Shrake-Rupley algorithm implemented in the MDTraj Python library. This is a structure-based metric: for each PTM-carrying residue, we evaluated the absolute SASA from the 3D AlphaFold structure and normalized it against the theoretical maximum exposure for that residue in a Gly-X-Gly tripeptide, as defined in Tien et al. (2013). Thus, AlphaFold structures directly provide the atomic coordinates necessary for solvent accessibility estimation. We have now revised the Methods section to describe this process more explicitly (text file with track changes, lines 110 and 113).

      2.4. Given that the different residues S,T,Y but also K for glycosylations etc. have a very different baseline RSA distribution, the distributions of modified residues as such are not so informative. Are the distributions of residues with the alpha fold LOD 0.65 different between modified and non-modified?

      2.5. Same point: it is very clear that "tyrosine presenting a larger proportion of cryptic phosphor-sites", as they mainly are within folded domains to begin with. The pattern of phosphorylation and clustering is very different between the modified amino acid residue T,S,Y and needs consideration, given the large number of PTMs, a simple distribution is not sufficient to argue.

      As already discussed in point 1.2 above, and correctly noted also by this reviewer, tyrosine residues are generally enriched in the hydrophobic cores of proteins, which is reflected by their typically low RSA, regardless of phosphorylation status. This tendency likely arises from the bulky and partially hydrophobic nature of the tyrosine side chain. To address the reviewer's question, we compared the RSA distributions of phosphorylated tyrosine, serine, and threonine residues with those of all these amino acids in the human proteome. We found that phosphorylated residues consistently exhibit higher RSA values than the overall averages for their respective amino acids. This is expected, as phosphorylation within protein cores would likely be destabilizing. Indeed, the existence of low-RSA phosphorylated residues, represents a significant deviation from the intrinsic tendency of tyrosine, serine, and threonine residues and suggests that cryptic sites may become accessible only transiently along protein folding pathways.

      2.6. Figure 3E (proteins need names in the figure ): the cryptic site T222 (Chk1) is not in the quasi ridged domain, it is in a light color region. What is actually the SPECTRUS cutoff? The Pidc is only one sentence in the main text? It says fewer than 80% intradomain contacts in rigid domains i.e. >0.8, right, but is the domain rigid?

      We have revised the original figure in the new version of the manuscript to include protein names, and clarified the domain assignments. The cryptic phosphosite T222 in Chk1 lies within a quasi-rigid domain, as identified by SPECTRUS. The color of the image does not reflect any structural property but instead it is used to distinguish different quasi-rigid domains. In particular, black regions identify unstructured domains, whereas shadows from dark grey to white identify quasi rigid domains. We apologize for the lack of clarity. We have corrected the figure legend accordingly (text file with track changes, line 912).

      There is no cutoff in SPECTRUS' identification of quasi-rigid domain. Non quasi-rigid domains are simply regions of the protein that SPECTRUS cannot process properly. Meaning regions that, due to the large degree of intrinsic fluctuations, cannot be modelled as quasi-rigid.

      We also expanded the description of Pidc in the main text to clarify that it quantifies the proportion of intra-domain contacts made by the phosphosite's side chain, and that a cutoff of {greater than or equal to}0.8 was used to retain only residues well-integrated within rigid domains (text file with track changes, line 243).

      We hope these updates will resolve the ambiguities noted and more clearly define the criteria used in our filtering pipeline.

      2.7. The evolutionary comparison (which is not my core expertise), seems again like comparing different things. Why not comparing cryptic and non-cryptic sites in the same protein regions? Also p-Y are, evolutionarily speaking, very different to p-S and p-T. How is this possibly considered in one distribution. p-Y analysis needs to be separated from the p-T and p-S analyses here.

      We want to clarify that our evolutionary analyses compare residues at the aligned positions in orthologous proteins across multiple species. This approach ensures that each cryptic or non-cryptic phosphosites is assessed in its native structural and sequence context. Therefore, the comparison is not between different regions but evaluates the evolutionary conservation of specific sites across species, allowing for a direct and meaningful comparison of cryptic and non-cryptic phosphosites. In order to address the second point, we report below the entropic score distributions for serine/threonine and tyrosine, separately (Figure R5).

      2.8. Have the authors thought of randomization of their data to see whether the distributions are significant?

      We are unsure we fully understand what the referee means by randomizing the data in this case.

      However, according to the mathematical definition of entropic score, the limit case in which, within each orthogroup, the phosphorylated amino acid is replaced by a completely random residue yields an entropic score of 1. The opposite limit, in which all members of the orthogroups have the same amino acid in the position of the phosphorylated amino acid, yields an ES of 0. We have added a paragraph in the methods to stress this point (text file with track changes, line 354).

      2.9. Labeling in Suppl Figures is insufficient. E.g. In S6 what are the various WT, A and D numbering, are this independent stable transfections/clones? Figure S7 what is R? Thank you for pointing this out. We have now corrected the missing information in the revised version of the manuscript (text file with track changes, from line 992 to 1008)

      2.10. Whether or not findings are "impressive" should be up to the reader, please remove these attributes in the text.

      We agree with the reviewer's suggestion. We have removed subjective language such as "impressive" from the revised manuscript to ensure an objective and neutral tone, allowing readers to independently evaluate the significance of our findings (text file with track changes, line 454).

      3.1. Residues with pLDDT scores below 65 were excluded from the analysis. The high-confidence measure applies to individual residues, regardless of whether the domains they belong to are also predicted with high confidence. Identifying the number of domains containing PTMs with overall high-confidence predictions could provide better insights into the orientation of modified residues within domain structures. To assess the relationship between residue-specific confidence and domain stability, we can analyze the correlation between high-confidence modified residues and the overall prediction accuracy of their domains. This could be quantified using the average error scores of domain residues. Additionally, using the average pLDDT score would indicate how many individual residues were predicted with high local structural confidence. In contrast, the average PAE (Predicted Aligned Error) score would provide insights into how well each residue's position is predicted relative to others within the domain, reflecting overall domain structural confidence.

      Our analysis excluded residues with pLDDT scores below 65 to ensure high local confidence. While pLDDT provides residue-level structural confidence, assessing domain-wide prediction quality offers additional insights into modified residues' spatial organization and exposure. However, a domain-level interpretation is currently limited by the format of AlphaFold structural predictions. Specifically, AlphaFold does not provide Predicted Aligned Error (PAE) matrices for sequences split into overlapping fragments, a method used for proteins longer than 2,700 amino acids. These fragment predictions are only available in the downloadable AlphaFold proteome archives, not through the web interface, and lack the global alignment metrics (such as PAE) necessary for analyzing domain stability or inter-residue confidence within the domain context.

      3.2. "Approximately 65% of proteins with cryptic phosphosites contained only one or two such residues, while less than 10% had five or more sites (Supp. Figure 3)." To better interpret this trend, it would be useful to analyze the total number of cryptic PTMs on proteins part of this study, including all modification types-not just phosphorylation. This would help determine whether the observed pattern is specific to phosphorylation or if it extends to other post-translational modifications as well.

      To compare the occurrence of different cryptic PTMs, we extended our analysis to include all cryptic post-translational modifications annotated in PhosphoSitePlus, including phosphorylation, glycosylation, methylation, sumoylation, and ubiquitination. The approach allowed us to assess whether the observed distribution of cryptic phosphosites is unique or represents a more general feature of all cryptic PTMs. We observed extensive variation among the different PTMs in the proportion of proteins carrying 1, 2, or more of the same cryptic PTM (see Table R3). However, it must be noted that the relatively low number of cryptic PTMs, excluding phosphorylation, could make it difficult to determine whether these patterns reflect actual biological trends or are simply influenced by the sample size. We have not included these data in the new version of the manuscript, but we would be willing to add them if the editor or the reviewer advises us accordingly.

      3.3. For the validation of cryptic sites, selecting domains under 200 amino acids was mentioned. However, was there also a minimum length threshold applied, similar to the filtering criteria used for false positives (less than 40 ignored)?

      The 40-residue threshold was applied because protein domains that are too small cannot be reliably subdivided into quasi-rigid domains. Trying to run SPECTRUS on structures with fewer than 40 residues inevitably returns a warning, reflecting the intrinsic cooperative nature of quasi-rigid domains. In fact, entities composed of too few amino acids cannot properly arrange themselves into 3D structures and tend to be disordered. The same reasoning was applied when choosing the proteins to simulate. In particular, for the refolding simulations, we selected protein domains possessing the following properties:

      1. Shorter than 200 amino acids to limit the computational demands.
      2. Long enough to fold into an ordered 3-dimensional conformation reliably.
      3. Have an experimentally determined NMR or X-ray crystal structure 3.4. To test their hypothesis that phosphorylation affects protein expression, they selected candidates for serine and threonine but excluded tyrosine. What were the reasons for not including tyrosine-related PTMs in their analysis?

      Our experimental assays relied on phosphomimetic substitutions to mimic the effect of phosphorylation. While serine/threonine phosphorylation can be reasonably mimicked by E or D substitutions, there is no reliable single-residue mimic for phosphotyrosine. Indeed, E or D substitutions do not recapitulate the structural or electronic features of pTyr. Given these limitations, we excluded tyrosine phosphosites from experimental validation to avoid generating inconclusive or misleading data.

      3.5. Do we know that the regulatory role of S300 on PYST1 is associated with the dual specificity of the phosphatase, and is this why it was selected as a negative regulator? While the regulatory roles of the other analyzed phosphosites on SMAD and CHK1 are discussed, there is limited mention of the specific role of S300 on PYST1 within the scope of the study.

      S300 of PYST1 was selected not due to known regulatory relevance, but for technical convenience. PYST1 is a relatively small protein, facilitating computational simulations. We also had suitable reagents for detection (i.e., expression vector), and importantly, S300 was identified as a false-positive cryptic phosphosite removed by our dynamic filtering. It was a practical and structurally matched negative control for validating our computational pipeline.

      3.6. When comparing the entropic scores between cryptic and non-cryptic residues, the medians are 0.43 and 0.52, respectively. Although this difference is not very high, they do observe that cryptic residues have lower scores than non-cryptic ones. The distributions also show greater overlap (Figure 6). I'm wondering if any statistical testing would help assess how distinct these two groups really are.

      We thank the reviewer for the comment raised by reviewer #2, for which we provide an answer above. Briefly, given our large sample sizes, statistical tests often yield very low p-values even for minor differences. To assess the biological significance, we calculated Cohen's d (Table R2 above). The effect size between cryptic and non-cryptic phosphosites (d = 0.4028) was modest but meaningful, and slightly lower than between core and surface residues (d = 0.5126).

      3.7. Why did the authors choose to rely on AlphaFold data instead of examining PDB structures? I didn't see any explanation or rationale provided for preferring AlphaFold predictions over experimentally determined structures from the PDB.

      We appreciate the value of this comment. We focused on AlphaFold to maximize proteome-wide coverage. Indeed, although PDB structures offer experimentally validated conformations, their sparse and uneven proteome coverage (particularly for membrane proteins, low-abundance factors, and intrinsically disordered regions) precludes a truly global analysis. AlphaFold2 models, by contrast, deliver accurate, full-length structures for nearly the entire human proteome, enabling unbiased, large-scale mapping of cryptic phosphosites. Nonetheless, we performed the same analysis using high-resolution structures from the Protein Data Bank (PDB). The results were fully consistent with those based on AlphaFold predictions, indicating that our findings are consistent across the two databases (see Figure R6 below).

      3.8. Novelty - The concept that cryptic site modifications can dysregulate signaling in cancer and other diseases is known, but systematically categorizing PTM sites into cryptic and non-cryptic to generate hypotheses for a wide range of identified PTMs remains an underdeveloped approach. This study establishes a framework for classifying PTMs based on their structural accessibility, integrating AlphaFold predictions, molecular dynamics simulations, solvent accessibility analysis, and phylogenetic conservation metrics. This approach not only enhances our understanding of PTM-mediated regulatory mechanisms but also provides a foundation for exploring how cryptic modifications contribute to protein function, stability, and disease progression.

      We appreciate the reviewer's comment. To our knowledge, this is the first study to introduce and define "cryptic phosphosites" as a structurally distinct and functionally relevant subset of phosphorylation sites. While some individual cases of buried amino acids influencing cancer-related proteins have been reported, no previous study has systematically mapped, filtered, and analyzed these sites across the human proteome using integrated structural, dynamical, evolutionary, and experimental criteria.

      3.9. The study relies primarily on predicted protein structures (e.g., AlphaFold), without exploring experimentally derived structures, which could provide more accurate and physiologically relevant insights.

      We have addressed this point above (see reply to #3.7).

      3.10. While the research demonstrates the impact of cryptic PTMs on protein function, it would be valuable to also investigate non-cryptic sites from their annotated data. By examining the effects of modifications on these non-cryptic sites, the study could further validate the importance of the cryptic versus non-cryptic classifications and help clarify the functional relevance of both types of sites.

      We thank the referee for this thoughtful suggestion. We compared the proportion of cryptic or non-cryptic phosphosites associated with cancer- and disease-related mutations in each group from the COSMIC and PTMVar datasets. The percentage of phosphosites associated with the two repositories is essentially the same for cryptic and non-cryptic sites. This observation suggests that, despite their different structural and regulatory features, both site types occur similarly in disease contexts (see Table R4). We have included these data in the new version of the manuscript (text file with track changes, line 1067; and new Supp. Table 3).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Review on Gasparotto et al "Mapping Cryptic Phosphorylation Sites in the Human Proteome"

      Gasparotte et al assess the solvent accessibility of 87,138 post-translationally modified amino acids in the human proteome (from phosphosite plus). There initial observation is that a large fraction of modified sites are buried, a finding that is pronounced for phosphorylation but not other modifications. Their approach is using alpha fold 3D structures (0.65 cut off) and RSA prediction to get a set of buried sites. Further refinement includes the removing of low-confidence segments (such as loops, linkers, or short disordered regions) and to use SPECTRUS to identified quasi-rigid domains. The idea is that quasi rigid domains may not breathe and thus will be modified during the synthesis or folding.

      They generated a final dataset of 10,606 cryptic T, S and Y phosphor-sites in 5,496 proteins and state that: "These data indicate that ~5% of all known phospho-sites are cryptic. Impressively, the number translates to ~33% of phosphorylated proteins in the human proteome presenting at least one cryptic phospho-site." They focus on S417 of the SMAD2, T382 of Chk1, known to be associated with loss of function effects or proteasomal degradation and S300 of PYST1 negative control. They stably express these proteins as phospho-mimicry or alanine substitution in HEK293. Expression levels were reduced in the phosphor-D- mutant versions and upon cycloheximide treatment a reduction of the turnover time for the phospho-D CHK1 was observed. I think we are looking a large clonal difference in the supplemental figures.

      The examples are supported by MD simulations that suggest that cryptic phospho-sites can occur during the folding process and affect protein homeostasis by drastically increasing degradation rate and leading to rapid turnover; Essentially the phospho-versions show a solvent exposure. Evolutionary comparison whether cryptic and non-cryptic sites are differently conserved. Two distinct distributions for cryptic and non-cryptic phospho-sites are observed and Figure 6 shows two entropy distributions of cryptic v non-cryptic. Here it is unclear whether this is significant given the different distributions of the two types when non modified. Finally, overlay of the sites with cancer mutations lists 221 mutations in COSMIC associated with cryptic phosphosites that have been annotated as cancer-related and 138 mutations in PTMVar linked to cancer and other human pathologies. The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified cluster, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      Points for consideration

      • Very basic question: How do you assessed the RSA value of the residues from the alphafold structure. If it is sequence based, then it is unclear what the alpha fold structure actually contributes in this step? Although I assume it is structure based, it is not well described, only a reference.
      • Given that the different residues S,T,Y but also K for glycosylations etc. have a very different baseline RSA distribution, the distributions of modified residues as such are not so informative. Are the distributions of residues with the alpha fold LOD 0.65 different between modified and non-modified?
      • Same point: it is very clear that "tyrosine presenting a larger proportion of cryptic phosphor-sites", as they mainly are within folded domains to begin with. The pattern of phosphorylation and clustering is very different between the modified amino acid residue T,S,Y and needs consideration, given the large number of PTMs, a simple distribution is not sufficient to argue.
      • Figure 3 E (proteins need names in the figure ): the cryptic site T222 (Chk1) is not in the quasi ridged domain, it is in a light color region. What is actually the SPECTRUS cutoff? The Pidc is only one sentence in the main text? It says fewer than 80% intradomain contacts in rigid domains i.e. >0.8, right, but is the domain rigid?
      • The evolutionary comparison (which is not my core expertise), seems again like comparing different things. Why not comparing cryptic and non-cryptic sites in the same protein regions? Also p-Y are, evolutionarily speaking, very different to p-S and p-T. How is this possibly considered in one distribution. p-Y analysis needs to be separated from the p-T and p-S analyses here.
      • Have the authors thought of randomization of their data to see whether the distributions are significant?
      • Labeling in Suppl Figures is insufficient. E.g. In S6 what are the various WT, A and D numbering, are this independent stable transfections/clones? Figure S7 what is R?
      • Whether or not findings are "impressive" should be up to the reader, please remove these attributes in the text.

      Significance

      The identification of buried modification sites and what the biological meaning / implications are is a very interesting topic. However PTM distribution on proteins is very skewed (many papers have identified cluster, hot spots, structural dependencies etc...) and therefore comparing modified sites on different residues and in different protein regions and with non-modified residues has to be very stringently controlled.

      main conclusion: 5% of all known phospho-sites are cryptic, at least one in 1/3 of structured protein regions.

    1. Reviewer #1 (Public review):

      Bredenberg et al. aim to model some of the visual and neural effects of psychedelics via the Wake-Sleep algorithm. This is an interesting study with findings that go against certain mainstream ideas in psychedelic neuroscience (that I largely agree with). I cannot speak to the math in this manuscript, but it seems like quite a conceptual leap to set a parameter of the model in between wake and sleep and state that this is a proxy to acute psychedelic effects (point #20). My other concerns below are related to the review of the psychedelic literature:

      (1) Page 1, Introduction, "...they are agonists for the 5-HT2a serotonin receptor commonly expressed on the apical dendrites of cortical pyramidal neurons..." It is a bit redundant to say "5-HT2A serotonin receptor," as serotonin is already captured by its abbreviation (i.e., 5-HT).

      While psychedelic research has focused on 5-HT2A expression on cortical pyramidal cells, note that the 5-HT2A receptor is also expressed on interneurons in the medial temporal lobe (entorhinal cortex, hippocampus, and amygdala) with some estimates being >50% of these neurons (https://doi.org/10.1016/j.brainresbull.2011.11.006, https://doi.org/10.1007/s00221-013-3512-6, https://doi.org/10.7554/eLife.66960, https://doi.org/10.1016/j.mcn.2008.07.005, https://doi.org/10.1038/npp.2008.71, https://doi.org/10.1038/s41386-023-01744-8, https://doi.org/10.1016/j.brainres.2004.03.016, https://doi.org/10.1016/S0022-3565(24)37472-5, https://doi.org/10.1002/hipo.22611, https://doi.org/10.1016/j.neuron.2024.08.016). However, with ~1:4 ratio of inhibitory to excitatory neurons in the brain (https://doi.org/10.1101/2024.09.24.614724), this can make it seem as if 5-HT2A expression is negligible in the MTL. I think it might be important to mention these receptors, as this manuscript discusses replay.

      I see now that Figure 1 mentions that PV cells also express 5-HT2A receptors. This should probably be mentioned earlier.

      (2) Page 1, Introduction, "They have further been used for millennia as medicine and in religious rituals..." This might be a romanticization of psychedelics and indigenous groups, as anthropological evidence suggests that intentional psychedelic use might actually be more recent (see work by Manvir Singh and Andy Letcher).

      (3) When discussing oneirogens, it could be worth differentiating psychedelics from kappa opioid agonists such as ibogaine and salvinorin A, another class of hallucinogens that some refer to as "oneirogens" (similar to how "psychedelic" is the colloquial term for 5-HT2A agonists). Note that studies have found the effects of Salvia divinorum (which contains salvinorin A) to be described more similarly to dreams than psychedelics (https://doi.org/10.1007/s00213-011-2470-6). This makes me wonder why the present study is more applicable to 5-HT2A psychedelics than other kappa opioid agonists or other classes of hallucinogens (e.g., NMDA antagonists, muscarinic antagonists, GABAA agonists).

      (4) Page 2, Introduction, "Replay sequences have been shown to be important for learning during sleep [14, 15, 16, 17, 18]: we propose that mechanisms supporting replay-dependent learning during sleep are key to explaining the increases in plasticity caused by psychedelic drug administration." I'm not sure I follow the logic of this point. Dreams happen during REM sleep, whereas replay is most prominent during non-REM sleep. Moreover, while it's not clear what psychedelics do to hippocampal function, most evidence would suggest they impair it. As mentioned, most 5-HT2A receptors in the hippocampus seem to be on inhibitory neurons, and human and animal work finds that psychedelics impair hippocampal-dependent memory encoding (https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455, https://doi.org/10.3389/fnbeh.2014.00180, https://doi.org/10.1002/hipo.22712). One study even found that psilocin impairs hippocampal-dependent memory retrieval (https://doi.org/10.3389/fnbeh.2014.00180). Note that this is all in reference to the acute effects (psychedelics may post-acutely enhance hippocampal-dependent memory, https://doi.org/10.1007/s40265-024-02106-4).

      (5) Page 2, Introduction, "In total, our model of the functional effect of psychedelics on pyramidal neurons could provide a explanation for the perceptual psychedelic experience in terms of learning mechanisms for consolidation during sleep..." In contrast to my previous point, I think this could be possible. Three datasets have found that psychedelics may enhance cortical-dependent memory encoding (i.e., familiarity; https://doi.org/10.1037/rev0000455, https://doi.org/10.1037/rev0000455), and two studies found that post-encoding administration of psychedelics retroactively enhanced memory that may be less hippocampal-dependent/more cortical-dependent (https://doi.org/10.1016/j.neuropharm.2012.06.007, https://doi.org/10.1016/j.euroneuro.2022.01.114). Moreover, and as mentioned below, 5 studies have found decoupling between the hippocampus and the cortex (https://doi.org/10.3389/fnhum.2014.00020, https://doi.org/10.1002/hbm.22833, https://doi.org/10.1016/j.celrep.2021.109714, https://doi.org/10.1162/netn_a_00349, https://doi.org/10.1038/s41586-024-07624-5), something potentially also observed during REM sleep that is thought to support consolidation (https://doi.org/10.1073/pnas.2123432119). These findings should probably be discussed.

      (6) Page 2, Introduction, "In this work, we show that within a neural network trained via Wake-Sleep, it is possible to model the action of classical psychedelics (i.e. 5-HT2a receptor agonism)..." Note that 5-HT2A agonism alone is not sufficient to explain the effects of psychedelics, given that there are 5-HT2A agonists that are non-hallucinogenic (e.g., lisuride).

      (7) Page 2, Introduction, "...by shifting the balance during the wake state from the bottom-up pathways to the top-down pathways, thereby making the 'wake' network states more 'dream-like'." I could have included this in the previous point, but I felt that this idea deserved its own point. There has been a rather dogmatic assertion that psychedelics diminish top-down processing and/or enhance bottom-up processing, and I appreciate that the authors have not accepted this as fact. However, because this is an unfortunately prominent idea, I think it ought to be fleshed out more by first mentioning that it's one of the tenets of REBUS. REBUS has become a popular model of psychedelic drug action, but it's largely unfalsifiable (it's based on two unfalsifiable models, predictive processing and integrated information theory), so the findings from this study could tighten it up a bit. Second, there have now been a handful of studies that have attempted to study directionality in information flow under psychedelics, and the findings are rather mixed including increased bottom-up/decreased top-down effects (https://doi.org/10.7554/eLife.59784, https://doi.org/10.1073/pnas.1815129116; note that the latter "bottom-up" effect involves subcortical-cortical connections in which it's less clear what's actually "higher-/lower-level"), increased top-down/decreased bottom-up effects (https://doi.org/10.1038/s41380-024-02632-3, https://doi.org/10.1016/j.euroneuro.2016.03.018), or both (https://doi.org/10.1016/j.neuroimage.2019.116462, https://doi.org/10.1016/j.neuropharm.2017.10.039), though most of these studies are aggregating across largely inhomogeneous states (i.e., resting-state). Lastly, and somewhat problematically, facilitated top-down processing is also an idea proposed in psychosis that's based partially on findings with acute ketamine administration (note that all hallucinations to some degree might rely on top-down facilitation, as a hallucination involves a high-level concept that impinges on lower-level sensory areas; see work by Phil Corlett). While psychosis and the effects of ketamine have some similarities with psychedelics, there are certainly differences, and I think the goal of this manuscript is to uniquely describe 5-HT2A psychedelics (again, I'm left wondering why tweaking alpha in the Wake-Sleep algorithm is any more applicable to psychedelics than other hallucinogenic conditions).

      (8) Figure 2 equates alpha with a "psychedelic dose," but this is a bit misleading, as neither the algorithm nor an individual was administered a psychedelic. Alpha is instead a hypothetical proxy for a psychedelic dose. Moreover, if the model were recapitulating the effects of psychedelics, shouldn't these images look more psychedelic as alpha increases (e.g., they may look like images put through the DeepDream algorithm).

      (9) Page 11, Methods, "...and the gate α ensures that learning only occurs during sleep mode... The (1 − α) gate in this case ensures that plasticity only occurs during the Wake mode." Much of the math escapes me, so perhaps I'm misunderstanding these statements, but learning and plasticity certainly happen during both wake and sleep, making me wonder what is meant by these statements. Moreover, if plasticity is simply neural changes, couldn't plasticity be synonymous with neural learning? Perhaps plasticity and learning are meant to refer to different types of neural changes. It might be worth clarifying this, as a general problem in psychedelic research is that psychedelics are described as facilitating plasticity when brains are changing at every moment (hence not experiencing every moment as the same), and psychedelics don't impact all forms of plasticity equally. For example, psychedelics may not necessarily enhance neurogenesis or the addition of certain receptor types, and they impair certain forms of learning (i.e., episodic memory encoding). What is typically meant by plasticity enhancements induced by psychedelics (and where there's the most evidence) is dendritic plasticity (i.e., the growth of dendrites and spines). Whatever is meant by "plasticity" should be clarified in its first instance in this manuscript.

      (10) Page 12, Methods, "During training, neural network activity is either dominated entirely by bottom-up inputs (Wake, α = 0) or by top-down inputs (Sleep, α = 1)." Again, I could be misunderstanding the mathematical formulation, but top-down inputs operate during wake, and bottom-up inputs can operate during sleep (people can wake up or even incorporate noise from their environments into sleep.

      (11) Page 4, Results, "Thus, we can capture the core idea behind the oneirogen hypothesis using the Wake-Sleep algorithm, by postulating that the bottom-up basal synapses are predominantly driving neural activity during the Wake phase (when α is low)." However, several pieces of evidence (and the first circuit model of psychedelic drug action) suggest that psychedelics enhance functional connectivity and potentially even effective connectivity from the thalamus to the cortex (https://doi.org/10.1093/brain/awab406). Note that psychedelics may not equally impact all subcortical structures. REBUS proposes the opposite of the current study, that psychedelics facilitate bottom-up information flow, with one of the few explicit predictions being that psychedelics should facilitate information flow from the hippocampus to the default mode network. However, as mentioned earlier, 5 studies have found that psychedelics diminish functional connectivity between the hippocampus and cortex (including the DMN but also V1).

      (12) Page 4, Results, "...and have an excitatory effect that positively modulates glutamatergic transmission..." Note that this may not be brainwide. While psychedelics were found to increase glutamatergic transmission in the cortex, they were also found to decrease hippocampal glutamate (consistent with inhibition of the hippocampus, https://doi.org/10.1038/s41386-020-0718-8).

      (13) Page 5, "...which are similar to the 'breathing' and 'rippling' phenomena reported by psychedelic drug users at low doses..." Although it's sometimes unclear what is meant by "low doses," the breathing/rippling effect of psychedelics occurs at moderate and high doses as well.

      (14) I watched the videos, and it's hard for me to say there was some stark resemblance to psychedelic imagery. In contrast, for example, when the DeepDream algorithm came out, it did seem to capture something quite psychedelic.

      (15) Page 5, "This form of strongly correlated tuning has been observed in both cortex and the hippocampus." If this has been observed under non-psychedelic conditions, what does this tell us about this supposed model of psychedelics?

      (16) Page 6, with regards to neural variability, "...but whether this phenomenon [increased variability] is general across tasks and cortical areas remains to be seen." First, is variability here measured as variance? In fMRI datasets that have been used to support the Entropic Brain Hypothesis, note that variance tends to decrease, though certain measures of entropy increase (e.g., Figure 4A here https://doi.org/10.1073/pnas.1518377113 shows global variance decreases, and this reanalysis of those data https://doi.org/10.1002/hbm.23234 finds some entropy increases). Thus, variance and entropy should not be confused (in theory, one could cycle through several more brain states that are however, similar to each other, which would produce more entropy with decreased variance). Second, and perhaps more problematically for the EBH, is that the entropy effects of psychedelics completely disappear when one does a task, and unfortunately, the authors of these findings have misinterpreted them. What they'll say is that engaging in boring cognitive tasks or watching a video decreases entropy under psychedelics, but what you can see in Figure 1b of https://doi.org/10.1021/acschemneuro.3c00289 and Figure 4b of https://doi.org/10.1038/s41586-024-07624-5 is that entropy actually increases under sober conditions when you do a task. That is, it's a rather boring finding. Essentially, when resting in a scanner while sober, many may actually rest (including falling asleep, especially when subjects are asked to keep their eyes closed), and if you perform a task, brain activity should become more complex relative to doing nothing/falling asleep. When under a psychedelic, one can't fall asleep and thus, there's less change (though note that both of the above studies found numerical increases when performing tasks). Lastly, again I should note that the findings of the present study actually go against EBH/REBUS, given that the findings are increased top-down effects when EBH/REBUS predicts decreased top-down/increased bottom-up effects.

      (17) Page 6, "Because psychedelic drug administration increases influence of apical dendritic inputs on neural activity in our model, we found that silencing apical dendritic activity reduced across stimulus neural variability more as the psychedelic drug dose increases." I again want to point out that alpha is not the equivalent of a psychedelic dose here, but rather a parameter in the model that is being proposed as a proxy.

      (18) Page 8, "Experimentally, plasticity dynamics which could, theoretically, minimize such a prediction error have been observed in cortex [66, 67], and it has also been proposed that behavioral timescale plasticity in the hippocampus could subserve a similar function [68]. We found that plasticity rules of this kind induce strong correlations between inputs to the apical and basal dendritic compartments of pyramidal neurons, which have been observed in the hippocampus and cortex [55, 56]." Note that the plasticity effects of psychedelics are sometimes not observed in the hippocampus or are even observed as decreases (reviewed in https://doi.org/10.1038/s41386-022-01389-z).

      (19) Page 9, as is mentioned, REBUS proposes that there should be a decrease in top-down effects under psychedelics, which goes against what is found here, but as I describe above, the effects of psychedelics on various measures of directionality have been quite mixed.

      (20) Unless I'm misunderstanding something, it seems to be a bit of a jump to infer that simply changing alpha in your model is akin to psychedelic dosing. Perhaps if the model implemented biologically plausible 5-HT2A expression and/or its behavior were constrained by common features of a psychedelic experience (e.g., fractal-like visuals imposed onto perception, inability to fall asleep, etc.), I'd be more inclined to see the parallels between alpha and psychedelics dosing. However, it would still need to recapitulate unique effects of psychedelics (e.g., impairments in hippocampal-dependent memory with sparing/facilitation of cortical memory). At the moment, it seems like whatever the model is doing is applicable to any hallucinogenic drug or even psychosis.

    2. Author response:

      We thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article will be considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we will make to the text.

      Common Concerns (Reviewer 1 & Reviewer 2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in the present manuscript–we will add them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We will provide a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We will clarify that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We will elaborate on this point, and will move the discussion earlier in the text.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We will take great care to conduct a more thorough literature review to reevaluate our statement regarding indigenous psychedelic use (including the citations you suggested), and will either provide a more careful statement or remove this discussion from our introduction entirely, as it has little bearing on the rest of the text. The Ethics Statement will also be modified accordingly.

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. We will discuss two such factors in our revision: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We will provide a considerably extended discussion of these nuances in our revision.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration in our revision. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our algorithm and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic, but we believe this falls outside the scope of the present study.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide a biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b). We will provide a more detailed explanation of this phenomenon when we discuss our model limitations in our revised manuscript.

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We will make sure to discuss this in our ‘Model Limitations’ section.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      We will do this! For now, you can get a decent understanding of what the ‘ripple effect’ looks like from the ‘eyes closed’ hallucination condition for networks trained on CIFAR10 (Fig. 2d). The ripple effect that we are referring to is very similar, except it is superimposed on a naturalistic image under ordinary viewing conditions; to give a higher quality visualization of the ripple phenomenon itself, we will subtract out the static contribution of the image itself, leaving only the ripple phenomenon.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results.

    1. Reviewer #3 (Public review):

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to model-predicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1) Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by Domínguez-Borràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We thank reviewer 1 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 plays a role beyond mitochondrial fission in zygotes. However, the reasons why Drp1 KO zygotes differ from the somatic Drp1 KO model can be discussed as follows.

      First, the reviewer mentioned that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures (Udagawa et al., Curr Biol. 2014, PMID: 25264261, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. Mitochondria in oocytes/zygotes have the shape of a small sphere with an irregular cristae located peripherally. These structural features may be the cause of insensitivity or resistance to inner membrane fusion the resultant failure to form tubular mitochondria as seen in somatic cell models. Nonetheless, quantitative analysis of EM images in the revised version confirmed that the mitochondria of Drp1-depleted embryos were not only enlarged but also significantly elongated (Figure 2J-2M). Therefore, in Drp1-depleted embryos, significant structural and functional (e.g., asymmetry between daughters) changes in mitochondria were observed, and these are expected to lead to defects in the embryonic development.

      As for mitochondrial transport, we do not fully understand the intent of this question, but we do not entirely rule out mitochondrial transport. At least clustered mitochondria did not disperse again, but how mitochondria behave through the cytoskeleton within clusters will require further study, as the reviewer pointed out.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors show no effect of Myo19 Trim-Away, yet it remains unclear whether myo19 is involved in the positioning of mitochondria around the spindle. Judging by their co-localization during that stage, it might be. Therefore, in the absence of myo19, mitochondria might remain evenly distributed throughout mitosis, thus passively resulting in equal partitioning to daughter cells, with no severe developmental defects. Could the authors show a video of the whole process and discuss it?

      We have newly performed live imaging of mitochondria and chromosomes in Myo19 Trim-Away zygotes (n=13). As shown in Figure 1-figure supplement 2 and Figure 1-Video 2, there were no obvious changes in mitochondrial (and chromosomal) dynamics throughout the first cleavage and no significant mitochondrial asymmetry was observed, Therefore, we conclude that depletion of Myo19 does not cause mitochondrial asymmetry during embryonic cleavage. These results are described in the revised manuscript (Line 218-221).

      (2) Mitochondrial aggregation upon Drp1 depletion should be characterized in more detail: for example, % of mitochondria free, % in small clusters (> X diameter), and % in big clusters (>Y diameter).

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). In control embryos, mitochondria were interspersed in a large number of small clusters, while in Drp1-depleted embryos, mitochondria became highly aggregated into a small number of large clusters that was reversed by expression of mCh-Drp1. These results are described in the revised manuscript (Line 242-245).

      (3) The discrepancies with parthenogenetic embryos derived from Drp1 (-/-) parthenotes should be commented on. Quantification of the dimensions of the clusters would help establish the degree of similarity/difference. Could the authors comment on their hypothesis as to why the clusters are remarkably larger in Drp1 depleted zygotes?

      In the revised version, we have quantified the mitochondrial aggregation in Drp1 KO parthenotes (Figure 2-figure supplement 1; the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). The size of mitochondrial clusters in Drp1 KO parthenotes was significantly increased compared to controls, but as the reviewer noted, mitochondrial aggregation appears to be moderate compared to that in Drp1-depleted embryos. The phenotypic discrepancies in two Drp1-deficient embryo models is discussed below.

      First, it is clear that phenotypic severity of Drp1 KO oocytes is dependent on the age of the female. Indeed, oocytes collected from 8-week-old female arrested meiosis after NEB, mainly due to marked mitochondrial aggregation (Udagawa et al., Curr Biol. 2014, PMID: 25264261), whereas oocytes from juvenile female completed meiosis (Adhikari et al., Sci Adv. 2022, PMID: 35704569), and thus Drp1 KO pathenotes were obtained from juvenile female in the present study. Comparison of mitochondrial morphology in Drp1 KO oocytes in both papers also suggests that mitochondrial aggregation in adult mice is more intense (Udagawa et al., Curr Biol. Fig. 2A) than in juvenile mice (Adhikari et al., Sci Adv. 2022: Fig. 1G, 1H), and appears to be similar to Drp1-depleted embryos in this study (Figure 2E). There may be differences in the level of Drp1 depletion in these Drp1-deficient oocytes/zygotes. Similar results occurring between juvenile and adult KO female have been reported in a previous paper (Yueh et al., Development 2021, PMID: 34935904), as adult-derived Smac3<sup>Δ/Δ<?sup> zygotes arrested at the 2-cell stage, whereas juvenile-derived Smac3<sup>Δ/Δ<?sup> zygotes have developmental competence comparable to the wild type. Remarkably, the SMC3 protein levels in juvenile Smac3<sup>Δ/Δ<?sup> oocytes was also comparable to Smc3<sup>fl/fl</sup>. The authors surmised that the decline maternal SMC3 between juvenile and sexual maturity is probably due to the continuous induction of the promoter-Cre driver, suggesting that similar induction may also occur in Drp1 KO oocytes. In addition, we also observed not only age differences but also batch differences in Drp1 KO oocytes (and resulting embryos) such that little mitochondrial aggregation was observed in oocytes collected from some juvenile KO colonies. Therefore, for KO models showing age (sexual maturation)-dependent gradual phenotypic changes, Trim-way may be an approach that provides more reproducible results as it induces acute degradation of maternal proteins.

      (4) Mitochondrial clusters in Drp1 trim-away zygotes resemble those seen when defects in mitochondrial positioning are obtained by TRAK2 induction (PMID: 38917013), pointing again to a role of actin in the clustering process. Could the authors explore the role of actin further?

      TRAK2 and microtubule-dependent mechanisms may also be involved in mitochondrial dynamics during the first cleavage division, possibly in association with migration of two pronuclei. Although the mitochondrial aggregation induced by TRAK2 overexpression is similar to that in Drp1-depleted embryos, it is unlikely that changes at the EM level occurred as seen in Drp1-depleted embryos (enlarged mitochondria, etc.). In addition, in TRAK2-overexpressing embryos, rather than uneven partitioning of mitochondria, the daughter blatomeres themselves were uneven in size after cleavage, making it difficult to precisely assess the similarity between the two models.

      Regarding the role of F-actin, we show that the subcellular distribution of cytoplasmic actin overlaps with that of mitochondria throughout the first cleavage and seems to accumulate in aggregated mitochondria, particularly during the mitotic phase, as higher correlation was observed (Figure 1E). Although it was not observed that actin and the myo19 motor regulate mitochondrial partitioning, as reported in somatic cell-based studies, it is possible that actin accumulated in mitochondria may be indirectly involved in mitochondrial dynamics via mitochondrial fission. For example, inverted formin 2 (INF2) enhance actin polymerization and is required for efficient mitochondrial fission as an upstream function of Drp1 (Korobova et al., Science 2013, PMID: 23349293). In the revised manuscript, we have added the description on this point. (Line 452-456)

      (5) Electron microscopy images showed indeed aberrant morphology of the mitochondria, yet not a hyperfused morphology. Aspect ratio (long/short axis) quantification should be included, besides the current measurement, since mitochondria in Drp1 trim-away look bigger yet as round as in the control.

      In the revised version, detailed quantitative data on EM images has been added (Figure 2J-2M). In Drp1 depleted embryos, significant increases were observed in both the major and minor axes of mitochondria. As the reviewer noted, we also assumed that mitochondria in depleted embryos were enlarged rather than elongated, but the quantification of aspect ratio shows that significant elongation occurred. These results has been described in the revised manuscript (Line 252-256).

      (6) Why are mitochondria in golgi-mcherry-expressing cells showing a different morphology of the clusters?

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      (7) Authors comment on ROS being enriched (highly accumulated) in mitochondria. However, while quantification is missing, it might seem that ROS are equally distributed in control or Drp1 Trim-Away embryos. Could the authors quantify ROS signal inside and outside of the mitochondria, perhaps using a mask drawn by mitotracker? Furthermore, it would make these data more convincing to artificially induce/deplete ROS to validate the sensitivity of the technique to variations. Also, why is ROS pattern referred to as ectopic?

      Thank you for your useful suggestions. In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E). The term ectopic was used to mean excessive accumulation of ROS in the mitochondria compared to normal embryos, but has been deleted as it is not very accurate.

      Minor comments:

      (A) Video 1: images at t=-00:20 and t=00:00 of the mtGFP are actually the same images as H2B-mCherry.

      Probably a faulty filter/shutter control failed to capture GFP fluorescence at these times. It appears that the autocontrast function detected a small amount of mCherry fluorescence leakage. It would be possible to replace it with another video, but as the relevant frame were unrelated to the analysis, the previous video was used as is. The same problem also occurs in the newly added Myo19-depleted zygote movie (Figure 1-Video 2, 03:15).

      (B) Could you calculate the degree of colocalization between mt-GFP and ER-mCherry in ctrl and Drp1 trim-away? While it is apparent that ER is somehow more associated with mitochondrial clusters, it would be informative to quantify it.

      Since the ER is partially confined to the mitochondrial aggregation site, it was difficult to calculate correlation coefficients from fluorescence images of mt-GFP and ER-mCherry to quantitatively assess colocalization. Instead, line scan analysis of whole mitochondrial clumps showed that the peak of the ER-mCherry signal overlaps with that of mt-GFP, but this is not the case for Golgi-mCherry or peroxisome-mCherry (Figure 2-figure supplement 2A-2C).

      (C) Regarding the developmental arrest: The quantification of the different stages at each developmental time could be more informative. For example, at E4.5 how many embryos are at each stage (2-cell, 4-cell, ... blastocyst)? Also, could the authors comment on the reduction in developmental competence in Figure 4C, regarding the blastocyst stage?

      Many arrested embryos do not maintain their morphologies and undergo a unique degenerative process over time, known as cell fragmentation. Therefore, it is difficult to accurately determine the number of each developmental stage at, for example, E4.5 days. In this study, the 2-cell stage was observed at E1.5, the 4-8 cell at E2.5-E3.0, morula at E3.5 and the blastocyst at E4.5.

      Although the rate of embryos reaching the blastocyst stage was reduced compared to that of normal embryos, the overexpression of mCh-Drp1 may explain the failure of complete restoration of developmental competence, since embryos injected solely with mCh-Drp1 mRNA also showed reduced developmental competence. For rescue experiments, the comparison with internal controls is more important and therefore we described below. This is a specific effect of Drp1 deletion because none of the internal control conditions increased arrest at the 2-cell stage and arrest was completely reversed by microinjecting Trim-away insensitive exogenous mCh-Drp1 mRNA (Line 337-340).

      (D) In lines 103 to 105, proliferation should be changed to division or development.

      In the revised version, proliferation has been changed to division (Line 103).

      (E) Could the authors reference the statement in lines 168-169?

      The following 3 references have been added (Hardy et al., 1993, PMID: 8410824; Meriano et al., 2004, PMID: 15588469; Seikkula et al., 2018, PMID: 29525505).

      (F) Line 448: "Cells lacking Drp1 have highly elongated mitochondria that cannot be divided into transportable units,..." This is clearly not the case for zygotes, so why are then these mitochondria still clustering and not transported elsewhere?

      Although it is difficult to answer this reviewer's question precisely, EM images of Drp1-depleted embryos suggest that individual mitochondria appear not only to be enlarged but also to have increased outer membrane attachment due to excessive aggregation. Thus, these large mitochondrial clumps may therefore be preventing transport.

      Reviewer #2 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      In the revised version, the time after hCG has been indicated (Line 176-182). In subsequent Drp1 depletion experiments, the revised version notes that “no significant delay in cell cycle progression was observed following Drp1 depletion (data not shown) compared to control embryos (Figure 1A)” (Line 291-193). There was a slight discrepancy in the time post-hCG between live imaging and immunofluorescence analysis (Figure 1-figure supplement 1A), which may be due to manipulation of zygotes outside incubator during the microinjection of mRNA.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various mRNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 h of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the Western blot analysis, samples were prepared according to the time of the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      In the revised version, mitochondrial aggregation has been quantified by comparing the cluster size and number in control, Drp1 Trim-Away and Drp1 Trim-Away embryos expressing exogenous Drp1 (mCh-Drp1) (Figure 2G, 2H). We have also quantified the mitochondrial aggregation in Drp1<sup>fl/fl</sup> and Drp1<sup>Δ/Δ</sup> parhenotes (Figure 2-figure supplement 1; note that the data for Drp1 KO parthenotes has been reorganized into the supplemental figure, due to lack of space in figure 2 caused by the addition of quantitative data for Drp1 Trim-Away embryos). Mitochondria appear to be slightly more aggregated in Drp1<sup>fl/fl</sup> embryos than in control, but no significant differences in cluster size or number were observed (data not shown). On the other hand, mitochondrial clusters in Drp1 Trim-Away embryos were remarkably larger than Drp1<sup>Δ/Δ</sup> parhenotes, Please refer to the response to reviewer 1's comment (3) for discussion of this discrepancy.

      As noted by the reviewer, compared to other mitochondrial images, Drp1-depleted embryos expressing Golgi-mCherry appear to have less mitochondrial aggregation. The exact reason is not known, but may be due to inter-lot variation of Trim21 mRNA used in this experiment. Nevertheless, substantial mitochondrial aggregation was observed compared to the control, which does not seem to affect the conclusion.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      In the revised version, the band intensities in Western blot analysis were quantified and validated the previous results (Figure 1H for Myo19 depletion, Figure 2B for Drp1 expression during preimplantation development, Figure 2D for Drp1 depletion). The number of embryos analyzed was described in Figure legends (Pooled samples ranging from 20 to 100 were used).

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      In the revised version, masked binary images were created from mitochondrial images to quantify ROS levels inside and outside mitochondria (Line 734-741). The result shows the accumulation of ROS to mitochondria in Drp1-depleted embryos (Figure 4-figure supplement 1E).

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      In the revised manuscript, we have discussed this reference (Zhou et al., Nature Communications, PMID: 36513638) (Line 482-483).

      Reviewer #2 (Recommendations For The Authors):

      The authors report that disruption of F-actin organization led to asymmetry in mitochondrial inheritance, however depletion of Myo19 does not impact inheritance. The authors note in the discussion that loss of another mitochondrial motor protein, Miro, has been shown to affect mitochondrial inheritance. They suggest this may be due to reduced levels of Myo19, despite data from the present study suggesting a lack of involvement of Myo19. Given that Miro1 also interacts with microtubules, and crosstalk between actin filaments and microtubules has been reported, have the authors considered whether other motor proteins, such as KIF5, may be involved in mitochondrial movement in the zygote and therefore inheritance? Myo19 also plays a role in mitochondrial architecture. Were any differences noted at the EM level?

      During oocyte meiosis and early embryonic cleavage, kinesin-5 has been reported to be important for the formation of bipolar spindles (Fitzharris, Curr Biol., 2009, PMID: 19465601) and may have some involvement in mitochondrial dynamics. Given that the migration of two pronuclei towards the zygotic centre is dynein-dependent manner (Scheffler Nat Commun. 2021PMID: 33547291), dynein may also be involved in the process of mitochondrial accumulation around the pronuclei. Nevertheless, whether microtubule-dependent mechanisms regulate mitochondrial partitioning remains controversial. Mitochondria basically diverge from microtubules at the onset of mitosis, and indeed Miro1-deleted zygotes did not show the asymmetric mitochondrial partitioning (Lee et al., Front Cell Dev Biol. 2022, PMID: 36325364). More recently, it was reported that overexpression of TRAK2 causes significant mitochondrial aggregation in embryos (Lee et al., Proc Natl Acad Sci U S A. 2024, PMID: 36325364), but since overexpression might disrupt a regulatory balance by other motors/adaptor complexes, further investigation using TRAK2-deficient embryos is expected.

      As noted by the reviewer, myo19 seems to be important for the maintenance of mitochondrial cristae architecture and, consequently, for the regulation of mitochondrial function (Shi et al., Nat Commun. 2022, PMID: 35562374). We have not observed the EM images in myo19-depleted embryos, but we examined their membrane potential and ROS by TMRM and H2DCF staining, respectively, and confirmed that they were comparable to control embryos (data not shown). The loss of myo19 in zygotes/embryos did not cause any functional changes in mitochondria, suggesting that mitochondrial architecture may not be substantially affected either.

      Transcriptomic analysis would be useful to identify alterations in cell cycle checkpoint regulators, as well as immunofluorescence to identify changes in spindle assembly checkpoint protein recruitment.

      The present results showed that the majority of Drp1-depleted embryos arrest at the G2 stage, possibly due to cell cycle checkpoint mechanisms. Transcriptome analysis would certainly be beneficial, but eventually more detailed analysis of proteins and their phosphorylation modifications, etc. is needed for accurate assessment. These studies will be the subject of future work.

      Minor comments:

      There are many instances where the English could be improved, particularly the overuse of the word 'the'.

      We have checked the manuscript again carefully and hopefully it has been improved some.

      Line 144: replace 'took' with 'take'.

      We have corrected this in the revised version (Line 140).

      Line 157: it is unclear what is meant by 'hinders the functional importance of Drp1 in mature oocytes and embryos'.

      This description has been corrected to “complicates the functional analysis of Drp1 in mature oocytes and embryos” (Line 152-153)

      Line 198: replace with 'displayed a mitochondrial distribution pattern closely associated with'

      We have corrected this in the revised version (Line 195-196).

      Line 200: provide a time to clarify when the cytoplasmic meshwork was 'subsequently reorganized'

      In the revised version, “at the metaphase” has been added (Line 198).

      Line 204: replace 'to' with 'for'

      We have corrected this in the revised version (Line 203).

      Lines 285-87: consider rearranging the text to improve the flow.

      To improve the flow of text before and after, the following sentence has been added; We postulated that this asymmetry was due to non-uniformity in the distribution of mitochondria around the spindle (Line 295-297)

      Line 418: replace 'central' with 'centre'

      We have corrected this in the revised version (Line 430).

      Line 427: replace 'pertaining' with 'partitioning'

      We have corrected this in the revised version (Line 438).

      Line 574: clarify to what '1-5% of that of the oocytes' refers

      We have corrected it to “1-5% of the total volume of the zygote.” (Line 587-588).

      Line 619: indicate the dilution used

      We apologize for the previous incorrect description. We used a part of the extract as the template, not a dilution, and have corrected it to be accurate (Line 631-632).

      Line 634: replace 'on' with 'in' and detail in which medium embryos were mounted.

      We have corrected this in the revised version (Line 647).

      Please check all spelling in the figures.

      Figure 1J - inheritance is spelt incorrectly.

      Figure-Suppl 1, D: Interphase (PN) and (2-cell) is spelt incorrectly. G: inheritance is spelt incorrectly.

      Figure 5F - bottom section prior to cytokinesis, spindle is spelt 'spincle'

      Ensure consistency in abbreviation use (e.g. use of NEB and NEBD).

      Thank you for your careful correction of typographical errors. In the revised version, all points raised by the reviewers have been corrected.

      Reviewer #3 (Public review):

      We thank reviewer 2 for the helpful comments. As indicated in the responses below, we have taken all comments and suggestions into consideration in this revised version of the manuscript.

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      In the revised manuscript, we have added the following comment; swollen or partially elongated mitochondria with lamella cristae structures in the inner membrane were observed in Drp1 depleted embryos. In addition, the quantification of aspect ratio (long/short axis) shows that significant mitochondrial elongation was occurred (Figure 2M). These results has been described in the revised manuscript (Line 251-256).

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      Thank you for your very useful comments. Although it would be interesting to investigate whether alterations in ATP levels occurred in localized areas (e.g., around the spindle), the present study used conventional fluorescence microscope instead of confocal laser microscopy to observe ATeam fluorescence in order to quantify the fluorescence intensity in the whole embryo (or whole blastomere) and thus we currently cannot provide the images that reviewer expected. As shown in Figure-figure supplement 1C, the ATP levels tend to be higher at the cell periphery in control and at the mitochondrial aggregation areas in Drp1-depleted embryos, but it would need high resolution images using confocal microscopy to show it clearly.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Review of multiple videos shows that aggregated mitochondria were localized toward the cell center, but did not exhibit the behavior of preferentially concentrating near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca<sup>2+</sup> response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We think that the reviewer's comments are mostly correct. It is clear that there is a bias in Ca<sup>2+</sup> store levels between blastomeres of Drp1 depleted embryos, However, since mitochondria were not stained simultaneously in this experiment, we cannot draw conclusions in detail, such that daughter blastomere that inherit more mitochondria have higher Ca<sup>2+</sup> stores, or that blastomere with more aggregated mitochondria have lower Ca<sup>2+</sup> stores.

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked centration of mitochondrial clusters in Drp1-depleted embryos appears to be associated with migration of the pronuclei toward the cell center, which is unique to the first embryonic cleavage. Since the assembly of the male and female pronuclei at the cell center is also unique to the first cleavage, binucleation due to mitochondrial misplacement was observed only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments

      - Line 262: "Since mitochondrial dynamics are spatially coordinated at the ER-mitochondria MCSs," adequate ref. would better be added.

      We have added an adequate reference to the revised manuscript (Friedman et al., 2011, PMID: 21885730).

      - Line 333-336: "...as assessed by the presence of the nuclear envelope." Do authors show the data? In Figure 4-figure supplement 1A, the difference of the phosphoH3-ser10 signal between control and Trim-Away group might be weak. For clarity, it would be helpful if authors indicate the different points to note in the figure.

      Although the data is not shown, nuclear staining of arrested 2-cell stage embryos exhibited clear nuclear membranes, similar to the DAPI image in Figure 4-figure supplement 1A. We have indicated that the data is not shown in the revised version (Line 345). Based on a report that phosphorylated histone H3 (Ser10) localizes in pericentromeric heterochromatin that hat can be visualized by DAPI staining in late G2 interphase cell (Hendzel et al., 1997, Chromosoma, PMID: 9362543), this study qualitatively estimated the G2 phase from the phosphorylated histone H3 signal and the DAPI counterstained images. We have noted this point in the revised figure legend (Line 1012-1014).

      Typos or points for reword/rephrase

      - Line 149: "molecular identification" may better be " molecular characteristics".

      We have corrected this in the revised version (Line 145).

      - Line 157: "hinders the functional importance" would be "implies the functional importance" or "complicates the functional analysis".

      We have corrected this in the revised version (Line 152-153).

      - Line 208: "Since the role of F-actin in many cellular events, such as cytokinesis, preclude them as targets for experimentally manipulating mitochondrial distribution, " may better be "Given many cellular roles, disruption of F-actin per se was unsuitable as a strategy for manipulating mitochondrial distribution", for example.

      We have corrected this in the revised version (Line 207-208).

      - Line 260: "with MCSs with the plasma.." may better be "with MCSs such as with the plasma..".

      We have corrected this in the revised version (Line 267-268).

      - Line 312: "distribution and segregation" may better be "distribution and the resulting segregation of the inter-organelle contacts".

      We have corrected this in the revised version (Line 324-325).

      - Line 427: "pertaining" might be "partitioning".

      We have corrected this in the revised version (Line 438).

      Line 463: "loss of Drp1 induced mitochondrial aggregation disturbs" may better be "mitochondrial aggregation induced by the loss of Drp1 disturbs".

      We have corrected this in the revised version (Line 478-479).

      - Line 752: "endoplasmic reticulum (pink) " would be " endoplasmic reticulum (aqua) ".

      We have corrected this in the revised version (Line 780).

      - Figure 5E: "(Noma 2-cell embryos)" would be "(Nomal 2-cell embryos)".

      - Figure 5F: "Mitochondrial centration prevents dual spincle assembly" would be "Mitochondrial centration prevents dual spindle assembly".

      Thank you for your careful correction of typographical errors. We have corrected all the words/expressions the reviewer pointed out in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the “most likely” inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the “Streetlight effect”. It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We sincerely thank the reviewer for their perceptive critique highlighting potential bias in selecting the inactivated conformation. We recognize that over-relying on preconceived traits could limit exploration of diverse inactivated states, and we appreciate the opportunity to address this concern.

      Although we selected the model with the flipped V625 in the selectivity filter (SF) from the first round of inactivated-state sampling as the template for the second round, the resulting models still exhibited substantial diversity in their SF conformations. This selection primarily served to steer predictions away from the open-state configuration observed in the PDB 5VA2 SF, and we have clarified this rationale in the Methodology section. To assess conformational variability, we examined backbone dihedral angles (phi φ and psi ψ) at key residues in the selectivity filter (S624 – G628) and drugbinding region on the pore-lining S6 segment (Y652, F656), of all 100 models sampled in the subsequent inactivatedstate-sampling attempt. By overlaying the φ and ψ dihedral angles from different models, including the open state (PDB 5VA2-based), the closed state, and representative models from AlphaFold inactivated-state-sampling Cluster 2 and Cluster 3, we found that these conformations consistently fall within or near high-probability regions of the dihedral angle distributions. This indicates that these structural states are well represented within the ensemble of conformations sampled by AlphaFold within the scope of this study, particularly at functionally critical positions.

      Following the analysis above and consistent with the reviewer’s suggestion, we evaluated the top representative model from inactivated-state-sampling Cluster 3 (named “AF ic3”), which we had initially excluded. This model demonstrated SF residue G626 carbonyl oxygen flipped away from the conduction pathway, hinting at potential impact on ion conduction, yet its pore region structurally resembled the open state (Figure S9a, b). To test this objectively, we ran molecular dynamics (MD) simulations (2 runs, 1 μs long each, with applied 750 mV voltage) with varied initial ion/water configurations in the SF, finding it consistently open and conducting throughout (Figure S9c, d), consistent with our previous observations in Figure S11 that ion conduction can still occur when the upper SF is dilated. Drug docking (Figure S12) further revealed that the model exhibited binding affinities similar to those for the PDB 5VA2-based openstate structure. These findings combined led us to classify it as a possible alternative open-state conformation.

      Models from Cluster 4 were not tested due to extensive steric clashes, where residues in the SF overlapped with neighboring residues from adjacent subunits. The remaining models displayed SF conformations that combined features from earlier clusters. However, due to subunit-to-subunit variability, where individual subunits adopted differing conformations, they were classified as outliers. This combination of features may be valuable to investigate further in a follow-up study.

      We acknowledge that our approach is just one of many ways to sample different states, and alternative strategies, such as generating more models, varying multiple sequence alignment (MSA) subsampling, or testing different templates, might reveal improved models. Given that hERG channel inactivation likely spans a spectrum of conformations, our resource limitations may have restricted us to exploring and validating only part of this diversity. Nevertheless, the putative inactivated (AlphaFold Cluster 2) model’s non-conductivity and improved affinity for drugs targeting the inactivated state observed in our study suggests that this approach may be capturing relevant features of the inactivated-state conformation. We look forward to investigating deeper other possibilities in a future study and are grateful for the reviewer’s feedback.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We appreciate the reviewer’s insightful suggestion. To address this, we extended our analysis by incorporating an alternative AlphaFold2-predicted model from inactivated-state-sampling cluster 3 as a structural control. This model was established in a previously discussed analysis to be open and conducting as a follow up to comment #1, so we will call it Open (AF ic3) to differentiate it from Open (PDB 5VA2). We evaluated this new model in single-state and multi-state contexts alongside our original open-state model based on the experimental PDB 5VA2 structure. Additionally, we expanded the drug docking procedure to explore a broader region around the putative drug binding site by increasing the sampling space, and we adopted an improved approach for selecting representative docking poses to better capture relevant binding modes.

      Shown in Figure 7 are comparisons of experimental drug potencies with the binding affinities from the molecular docking calculations under the following conditions:

      (a) Single-state docking using the experimentally derived open-state structure (PDB 5VA2)

      (b) Multi-state docking incorporating open (PDB 5VA2), inactivated, and closed-state conformations weighted by experimentally observed state distributions

      (c) Single-state docking using an alternative AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      (d) Multi-state docking combining the AlphaFold-predicted open-state (inactivated-state-sampling cluster 3, AF ic3)

      Using only the open-state model (PDB 5VA2) yielded a moderate correlation with experimental data (R<sup>2</sup> = 0.43, r = 0.66, Figure 7a). Incorporating multi-state binding (weighted by their experimental distributions) improved the correlation substantially (R<sup>2</sup> = 0.63, r = 0.79, Figure 7b), boosting predictive power by 47% and underscoring the value of multi-state modeling. Importantly, this improvement was achieved without considering potential drug-induced allosteric effects on the hERG channel conformation and gating, which will be addressed in future work.

      Next, we substituted the PDB 5VA2-based open-state model with the AF ic3 open-state model. Docking to this alternative model alone produced similar performance (R<sup>2</sup> = 0.44, r = 0.66, Figure 7c), and incorporating it into the multi-state ensemble further improved the correlation with experiments (R<sup>2</sup> = 0.64, r = 0.80, Figure 7d), representing a 45% gain in R<sup>2</sup> and matching the performance of multi-state docking results based on the PDB 5VA2-derived model.

      These findings suggest that the predictive power of computational drug docking is enhanced not merely by the accuracy of individual models, but by the structural diversity and complementarity provided by an ensemble of protein conformations. Rather than relying solely on a single experimentally determined protein structure, the ensemble benefits from incorporating AlphaFold-predicted models that capture alternative conformations identified through our state-specific sampling approach. These diverse protein models reflect different structural features, which together offer a more comprehensive representation of the ion channel’s binding landscape and enhance the predictive performance of computational drug docking. Overall, these results reinforce that multi-state modeling offers a more realistic and predictive framework for understanding drug – ion channel interactions than traditional single-state approaches, emphasizing the value of both individual model evaluation and their collective integration. We are grateful for the reviewer’s suggestion.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e.g. Figure 3d).

      We appreciate the reviewer’s comment on the statistical significance assessment in Figure 3d. To clarify, the comparisons shown in the subpanels are based on three selected representative models for each state, rather than a broader population sample (similarly for Figure 3b). In the closed-state predicted models, the strong convergence of the voltagesensing domain (VSD), with an all-atom RMSD of 0.36 Å between cluster 1 and 2 closed-state sampling models and 0.95 Å to the outlier cluster, indicates minimal structural variation. Those RMSD values shown in the manuscript text demonstrates good convergence and by themselves represent statistical significance assessment of those models. This trend extends to open-state and inactivated-state AlphaFold models with similarly limited differences in the VSD regions among them. This convergence suggests that population-based statistical analysis may not reveal meaningful deviations, as the low variability among models limits the insights beyond those obtained from comparing representative structures.

      Nonetheless, we acknowledge this limitation. In future studies, we plan to explore alternative modeling approaches to introduce greater variability, enabling a more robust statistical evaluation of state-specific trends in the predictions.

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We have explored these conformational state dynamics through MD simulations for the Open (5VA2-based), Inactivated (AlphaFold Cluster 2), and Closed-state models, as presented in Figures S7, S8, S10, S11. These figures provide detailed insights: Figure S7-S8 analyzes SF and pore conformation dynamics, including averaged pore radii with and without voltage and superimposed conformational ensembles; Figure S10 tracks cross-subunit distances between protein backbone carbonyl oxygens, revealing sequential SF dilation steps near residues F627 an G628; and Figure S11 illustrates this SF dilation process over time, highlighting residue F627 carbonyl flipping and SF expansion. We appreciate the opportunity to clarify our approach.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Protein fragments are used to model the closed and inactivated states of hERG, but the choices of fragments are not well justified. For instance, in Figure 1a, helices from 8EP1 (deactivated voltage-sensing domain) and a helix+loop from 5VA2 (selectivity filter) are used. Why just the selectivity filter and not the cytosolic domain, for instance? Why not some parts of the helices attached to the selectivity filter, or the whole membrane inserted domain of 8EP1? Same for the inactivated conformation in Figure 1c: why the cytosolic domain only?

      We thank the reviewer for their thoughtful questions regarding our choice of protein fragments for modeling the closed and inactivated states of hERG in Figures 1a and 1c, and we appreciate the opportunity to justify these selections more clearly. Our approach to template selection was guided by our experience that providing AlphaFold2 with larger templates often leads it to overly constrain predictions to the input structure, reducing its flexibility to explore alternative conformations. In contrast, smaller, targeted fragments increase the likelihood that AlphaFold2 will incorporate the desired structural features while predicting the rest of the protein. We have provided a more detailed discussion of this in the methodology section.

      For the closed state (Figure 1a), we chose the deactivated voltage-sensing domain (VSD) from the rat EAG channel (PDB 8EP1) to inspire AlphaFold2 to predict a similarly deactivated VSD conformation characteristic of hERG channel closure, as this domain’s downward shift is a hallmark of potassium channel closure. We paired this with the selectivity filter (SF) and adjacent residues from the open-state hERG structure (PDB 5VA2) to maintain its conductive conformation, as it is generally understood that K<sup>+</sup> channel closure primarily involves the intracellular gate rather than significant SF distortion. Including additional helices (e.g., S5–S6) or the entire membrane domain from PDB 8EP1 risked biasing the model toward the EAG channel’s pore structure, which differs from hERG’s, while omitting the cytosolic domain ensured focus on the VSD-driven closure without over-constraining cytoplasmic domain interactions.

      For the inactivated state (Figure 1c), we initially used only the cytosolic domain from PDB 5VA2 to anchor the prediction while allowing AlphaFold2 to freely sample transmembrane domain conformations, particularly the SF, where the inactivation occurs via its distortion. Excluding the SF or attached helices at this stage avoided locking the model into the open-state SF, and the cytosolic domain alone provided a minimal scaffold to maintain hERG’s intracellular architecture without dictating pore dynamics. Following the initial prediction, we initiated more extensive sampling by using one of the predicted SFs that differs from the open-state SF (PDB 5VA2) as a structural seed, aiming to guide predictions away from the open-state configuration. The VSD and cytosolic domain were also included in this state to discourage pore closure during prediction. Using larger fragments, like the full membrane-spanning domains or additional cytosolic regions from the open-state structure might reduce AlphaFold2’s ability to deviate from the open-state conformation, undermining our goal of capturing more diverse, state-specific features.

      It is worth noting that multiple strategies could potentially achieve the predicted models in our study, and here we only present examples of the paths we took and validated. It is likely that many of the steps may be unnecessary and could be skipped, and future work building on our approach can further explore and streamline this process. A consistent theme underlies our choices: for the closed state, we know the VSD should adopt a deactivated (“down”) conformation, so we provide AlphaFold2 with a specific fragment to guide this outcome; for the inactivated state, we recognize that the SF must change to a non-conductive conformation, so we grant AlphaFold2 flexibility to explore diverse conformations by minimizing initial constraints on the transmembrane region.

      With greater sampling and computational resources, it is possible we could identify additional plausible, non-conductive conformations that might better represent an inactivated state, as hERG inactivation may encompass a spectrum of states. In this study, due to resource limitations, we focused on generating and validating a subset of conformations. Still, we acknowledge that broader exploration could further refine these models, which could be pursued in future studies. We updated the Methods and Discussion sections to reflect this perspective, and we are grateful for the reviewer’s input, which encourages us to clarify our rationale and highlight the adaptability of our approach.

      To demonstrate the broader feasibility of this approach, we applied it to another ion channel system, voltage-gated sodium channel Na<sub>V</sub> 1.5, as illustrated in Figure S14. In this example, a deactivated VSD II from the cryo-EM structure of a homologous ion channel Na<sub>V</sub>1.7 (PDB 6N4R) (DOI: 10.1016/j.cell.2018.12.018), which was trapped in a deactivated state by a bound toxin, was used as a structural template. This guided AlphaFold to generate a Na<sub>V</sub>1.5 model in which all four voltage sensor domains (VSD I–IV) exhibit S4 helices in varying degrees of deactivation. Compared to the cryo-EM openstate Na<sub>V</sub>1.5 structure (PDB 6LQA) (DOI: 10.1002/anie.202102196), the predicted model displays a visibly narrower pore, representing a plausible closed state. This example underscores the versatility of our strategy in modeling alternative conformational states across diverse ion channels.

      (2) While the authors rely on AF2 (ColabFold) for the closed and inactivated states, they use Rosetta to model loops of the open state. Why not just supply 5VA2 as a template to ColabFold and rebuild the loops that way? Without clear explanations, these sorts of choices give the impression that the authors were looking for specific answers that they knew from their extensive knowledge of the hERG system. While the modeling done in this paper is very nice, its generalizability is not obvious.

      We appreciate the reviewer’s question about our use of Rosetta to model loops in the open-state hERG channel (PDB

      5VA2) rather than rebuilding it entirely with ColabFold. In the study, we conducted a control experiment supplying parts of PDB 5VA2 to ColabFold to rebuild the loops, generating 100 models (Figure 2a: predicted open state). The top-ranked model (by pLDDT) differed from our Rosetta-modelled structure by only 0.5 Å RMSD, primarily due to the flexible extracellular loops as expected, with the pore and selectivity filter (our areas of focus) remaining nearly identical. We chose the Rosetta-refined cryo-EM structure as this structure and approach have been widely used as an open-state reference in our other hERG channel studies, such as by Miranda et al. (DOI: 10.1073/pnas.1909196117) and Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404), to ensure that our results are more directly comparable to prior work in the field. Nonetheless, as both models (with loops modeled by Rosetta or AlphaFold) were virtually identical, we would expect no significant differences if either were used to represent the open state in our study. We have incorporated this clarification into the main text.

      (3) pLDDT scores were used as a measure of reliable and accurate predictions, but plDDT is not always reliable for selecting new/alternative conformations (see https://doi.org/10.1038/s41467-024-515072 and https://www.nature.com/articles/s41467-024-51801-z).

      We acknowledge that while pLDDT is a valuable indicator of structural confidence in AlphaFold2 predictions, its limitations warrant consideration. In our revision, we mitigated this by not relying solely on pLDDT, but we also performed protein backbone dihedral angle analysis of the protein regions of focus in all predicted models to ensure comprehensive coverage of conformational variations. From our AlphaFold modeling results, we tested a model from cluster 3 of the inactivated-state sampling process, which exhibited lower pLDDT scores, and included these results in our revised analysis. We included a note in the revised manuscript’s Discussion section: “As noted in recent studies, pLDDT scores are not reliable indicators for selecting alternative conformations (DOI: 10.1038/s41467-024-51507-2 and DOI: 10.1038/s41467-024-51801-z). To address this, we performed a protein backbone dihedral angle analysis in the regions of interest to ensure that our evaluation captured a representative range of sampled conformations.”

      (4) Extensive work has been done using AF2 to model alternative protein conformations (https://www.biorxiv.org/content/10.1101/2024.05.28.596195v1.abstract, along with some references the authors cite, such as work by McHaourab); another group recently modeled the ion channel GLIC (https://www.biorxiv.org/content/10.1101/2024.09.05.611464v1.abstract). Therefore, this work, though generally solid and thorough, seems more like a variation on a theme than a groundbreaking new methodology, especially because of the generalizability issues mentioned above.

      We sincerely thank the reviewer for acknowledging the solidity of our study and for drawing our attention to the impressive recent efforts using AlphaFold2 to explore alternative protein conformations. These studies are valuable contributions that highlight the versatility of AlphaFold2, and we are grateful for their context in evaluating our work.

      Building on these efforts, our approach not only enhances the prediction of conformational diversity but also introduces a twist by incorporating structural templates to guide AlphaFold2 toward specific functional protein states. More significantly, our study advances beyond mere structural modeling by integrating these conformations with their rigorous validation by incorporating multiple simulation results tested against experimental data to reveal that AlphaFold-predicted conformations can align with distinct physiological ion channel states. A key finding is that drug binding predictions using AlphaFold-derived hERG channel states substantially improve correlation with experimental data, which is a longstanding challenge in computational screening of multi-state proteins like the hERG channel, for which previous structural models have been mostly limited to the open state based on the cryo-EM structures. Our approach not only captures this critical state dependence but also reveals potential molecular determinants underlying enhanced drug binding during hERG channel inactivation, a phenomenon observed experimentally but poorly understood. These insights advance drug safety assessment by improving predictive screening for hERG-related cardiotoxicity, a major cause of drug attrition and withdrawal.

      We view our methodology as a natural evolution of the advancements cited by the reviewer, offering an approach that predicts diverse hERG channel conformational states and links them to meaningful functional and pharmacological outcomes. To address the reviewer’s concern about generalizability, we have expanded the methodology section to make it easier to follow and include additional details. As an example, we show how our approach can be applied to model another ion channel system, Na<sub>V</sub>1.5, in Figure S14.

      Furthermore, to enhance the applicability of our methodology, we have uploaded the scripts for analyzing AlphaFoldpredicted models to GitHub (https://github.com/k-ngo/AlphaFold_Analysis), ensuring they are adaptable for a wide range of scenarios with extensive documentation. This enables users, even those not focused on ion channels, to effectively apply our tools to analyze AlphaFold predictions for their own projects and produce publication-ready figures.

      While it is likely that multiple modeling approaches could lead AlphaFold to model alternative protein conformations, the key challenge lies in validating the physiological relevance of those predicted states. This study is intended to support other researchers in applying our template-guided approach to different protein systems, and more importantly, in rigorously in silico testing and validation of the biological significance of the conformation-specific structural models they generate.

      Minor concerns:

      (1) The authors mention in the Introduction section that capturing conformational states, especially for membrane proteins that may be significant as drug targets, is crucial. It would be helpful to relate their work to the NMR studies domains of the hERG channel, particularly the N-terminal “eag” domain, which is crucial for channel function and can provide insights into conformational changes associated with different channel states (https://doi.org/10.1016/j.bbrc.2010.10.132 ).

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. The omission was primarily due to hardwareimposed constraints, as including these additional regions would exceed the memory capacity of our current graphics processing unit (GPU) card, leading to failures during the prediction step.

      The PAS domain, even if not serving as a conventional direct drug-binding site, can influence the gating kinetics of hERG channels. By altering the probability and duration with which channels occupy specific states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts hERG channel gating so that more channels enter (and remain in) the inactivated state as was shown previously (e.g., DOI: 10.1085/jgp.201210870), drugs with a higher affinity for that state would appear to bind more potently, as observed in previous electrophysiological experiments (e.g., DOI: 10.1111/j.1476-5381.2011.01378.x). It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the hERG channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We incorporated a discussion of these points into the main text, acknowledging the limitations of our current models and highlighting the need for future studies to explore these regions in greater detail. The addition reads: “…Our models excluded the N-terminal PAS domain due to GPU memory limitations, despite its inclusion in initial templates. This omission may overlook its potential roles in gating kinetics and allosteric effects on drug binding (e.g., PMID: 21449979, PMID: 23319729, PMID: 29706893, PMID: 30826123, DOI:10.4103/jpp.JPP_158_17). Future research will explore the full-length hERG channel with enhanced computational resources to assess these regions’ contributions to conformational state transitions and pharmacology.”

      (2) In the second-to-last paragraph of the Introduction, the authors describe how AlphaFold2 works. They write, “AlphaFold2 primarily requires the amino acid sequence of a protein as its input, but the method utilizes other key elements: in addition to the amino acid sequence, AlphaFold2 can also utilize multiple sequence alignments (MSAs) of similar sequences from different species, templates of related protein structures when available, and/or homologous proteins (Jumper et al., 2021a). Evolutionarily conserved regions over multiple isoforms and species indicated that the sequence is crucial for structural integrity”. The last sentence is confusing; if the authors mean that all information required to fold the protein into its 3D structure is present in its primary sequence, that has been the paradigm. It is unclear from this paragraph what the authors wanted to convey.

      We apologize for any confusion caused by this phrasing. Our intent was not to restate the well-established paradigm that a protein’s primary sequence contains the information needed for its 3D structure, but rather to emphasize how

      AlphaFold2 leverages evolutionary conservation, via multiple sequence alignments (MSAs), to infer structural constraints beyond what a single sequence alone might reveal. Specifically, we aimed to highlight that conserved regions across species and isoforms provide additional context that AlphaFold2 uses to enhance the accuracy of its predictions, complementing the use of templates and homologous structures as described in Jumper et al. (2021). To clarify this, we revised the sentence in the manuscript to read: “AlphaFold2 primarily requires a protein's amino acid sequence as input, but it also leverages other critical data sources. In addition to the sequence, it incorporates multiple sequence alignments (MSAs) of related proteins from different species, available structural templates, and information on homologous proteins. While the primary sequence encodes the 3D structure, AlphaFold2 harnesses evolutionary conservation from MSAs to reveal structural insights that extend beyond what a single sequence can provide.” We thank the reviewer for pointing out this ambiguity.

      (3) In the Results section, the authors state that the predictions generated by their method are evaluated by standard accuracy metrics, please elaborate - what standard metrics were used to judge the predictions and why (some references would be a nice addition). Further, on Page 6, the sentence “There are fewer differences between the open- and closed-state models (Figure S2b, d)” is confusing, fewer differences than what? or there are a few differences between the two states/models? Please clarify.

      The original sentence referring to “standard accuracy metrics” is somewhat misplaced, as our intent was not to apply any conventional “benchmarking” to judge the predictions, but rather to evaluate functional and structural relevance in a physiologically meaningful context. Specifically, we assessed drug binding affinities from molecular docking simulations (in Rosetta Energy Units, R.E.U.) against experimental drug potency data (e.g., IC<sub>50</sub> values converted to free energies in kcal/mol, Figure 7), analyzed differences in interaction networks across states in relation to known mutations affecting hERG inactivation (Figure 4, Table 2), validated ion conduction properties through MD simulations with the applied voltage against expected state-dependent hERG channel behavior (Figure 5), and compared predicted structural models to available experimental cryo-EM structures (Figure 3). We clarified in the text that our assessment emphasized the physiological plausibility of the generated conformations, drawing on evidence from existing computational and experimental studies at each step of the analysis above.

      As for the sentence on page 6, “There are fewer differences between the open- and closed-state models,” we apologize for the ambiguity; we meant that the hydrogen bond networks in the selectivity filter region exhibit fewer differences between the open and closed states compared to the more pronounced variations seen between the open and inactivated states. We revised this sentence to read: “The open- and closed-state models show fewer differences in their selectivity filter hydrogen bond networks compared to those between the open and inactivated states,” to enhance readability.

      (4) In the Discussion, the authors reiterate that this methodology can be extended to sample multiple protein conformations, and their system of choice was hERG potassium channel. I think this methodology can be applied to a system when there is enough knowledge of static structures, and some information on dynamics (through simulations) and mutagenesis analysis available. A well-studied system can benefit from such a protocol to gauge other conformational states.

      We agree that this approach is well-suited to systems with sufficient static structures, dynamic insights from simulations, and mutagenesis data, as seen with the hERG channel. We appreciate the reviewer’s implicit concern about generalizability to less-characterized systems and addressed this in the Discussion as a limitation, noting that the method’s effectiveness may depend on prior knowledge. Future studies can explore whether the advent of AlphaFold3 and other deep learning approaches can enhance its applicability to systems with more limited data. We have added this comment to the Discussion: “…A limitation of our methodology is its reliance on well-characterized systems with ample static structures, molecular dynamics simulation data, and mutagenesis insights, as demonstrated with the hERG channel, which may limit its applicability to less-studied proteins.”

      (5) The Methods section must be broken down into steps to make it easier to follow for the reader (if they want to implement these steps for themselves on their system of choice).

      a. Is possible to share example scripts and code used to piece templates together for AF2. Also, since the AF3 code is now available, the authors may comment on how their protocol can be applicable there or have plans to implement their protocol using AF3 (which is designed to work better for binding small molecules). Please see https://github.com/google-deepmind/alphafold3 for the recently released code for AF3.

      We appreciate the reviewer’s suggestion to improve the Methods section and their comments on scripts and AlphaFold3 (AF3). We revised the Methods to separate it into clear steps (e.g., template preparation, AF2 setup, clustering, and refinement) for better readability and reproducibility, and uploaded the sample scripts along with the instructions to GitHub (https://github.com/k-ngo/AlphaFold_Analysis).

      Regarding AF3’s recent code release, we plan to explore the applicability of our methodology to AF3 in a follow-up study, leveraging its advanced features to refine conformational predictions and state-specific drug docking, and added a brief comment to the Discussion to reflect this future direction: “…Following the recent release of AlphaFold3’s source code, we plan to explore the applicability of our template-guided methodology in a follow-up study, leveraging AF3’s advanced diffusion-based architecture to enhance protein conformational state predictions and state-specific drug docking, particularly given its improved capabilities for modeling small molecule – protein interactions…”

      b. The authors modified the hERG protein by removing a segment, the N-terminal PAS domain (residues M1 - R397) because of graphics card memory limitation. Would the removal of the PAS domain affect the structure and function of the channel protein? HERG and other members of the “eag K<sup>+</sup> channel” family contain a PAS domain on their cytoplasmic N terminus. Removal of this domain alters a physiologically important gating transition in HERG, and the addition of the isolated domain to the cytoplasm of cells expressing truncated HERG reconstitutes wild-type gating. (see https://doi.org/10.1371/journal.pone.0059265). Please elaborate on this.

      We thank the reviewer for raising an important point about the removal of the N-terminal PAS domain and for highlighting its physiological role in hERG channel gating transitions. In our study, unlike experimental settings where PAS removal alters gating, we believe this omission has minimal impact on our key analyses.

      The drug docking procedure focuses on optimizing drug binding poses with minor protein structural refinement around the putative drug binding site, which in our case is the hERG channel pore region, where hERG-blocking drugs predominantly bind. The cytoplasmic PAS domain, located distally from this site, remains outside the protein structure refinement zone during drug docking simulations. However, one aspect we have not yet considered is the potential effect of drug modulation of the hERG channel gating and vice versa particularly given the PAS domain’s role in gating. This interplay could be significant but requires investigation beyond our current drug docking framework. We plan to explore this in future studies using alternative simulation methodologies, such as extended MD simulations or enhanced sampling techniques, to comprehensively capture these dynamic protein - ligand interactions.

      Similarly, in our 1 μs long MD simulations assessing ion conductivity (Figure 4), the timescale is too short for PASmediated gating changes to propagate through the protein and meaningfully influence ion conduction and channel activation dynamics, which occurs on a millisecond time scale (see e.g., DOI: 10.3389/fphys.2018.00207). To fully address this limitation, we plan to explore the inclusion of the PAS domain in a follow-up study with enhanced computational resources, allowing us to investigate its structural and functional contributions more comprehensively.

      (6) The first paragraph of the Methods reads as though AF2 has layers that recycle structures. We doubt that the authors meant it that way. Please update the language to clarify that recycling is an iterative process in which the pairwise representation, MSA, and predicted structures are passed (“recycled”) through the model multiple times to improve predictions.

      We agree that the phrasing might suggest physical layers recycling structures, which was not our intent. Instead, we meant to describe AlphaFold2’s iterative refinement process, where intermediate outputs, such as the pairwise residue representations, multiple sequence alignments (MSAs), and predicted structures, are iteratively passed (or “recycled”) through the model to enhance prediction accuracy. To clarify this, we revised the relevant sentence to read: “A critical feature of AlphaFold2 is its iterative refinement, where pairwise residue representations, MSAs, and initial structural predictions are recycled through the model multiple times, improving accuracy with each iteration.”

      Reviewer #3 (Recommendations for the authors):

      The authors should integrate the very recently published CryoEM experimental data of hERG inhibition by several drugs (Miyashita et al., Structure, 2024; DOI: 10.1016/j.str.2024.08.021).

      We thank the reviewer for the suggestion. Here, we compare drug binding in our open-states (PDB 5VA2-derived and an additional AlphaFold-predicted model from Cluster 3 of inactivated-state-sampling attempt named “AF ic3”) and inactivated-state models, using the cationic forms of astemizole and E-4031, with the corresponding experimental structures (Figure S13). Drug binding in the closed state is excluded as the pore architecture deviates too much from those in the cryo-EM structures. Experimental data (DOI: 10.1124/mol.108.049056) indicate that both astemizole and E4031 bind more potently to the inactivated state of the hERG channel.

      Astemizole (Figure S13a):

      - In the PDB 5VA2-derived open-state model, astemizole binds centrally within the pore cavity, adopting a bent conformation that allows both aromatic ends of the molecule to engage in π–π stacking with the side chains of Y652 from two opposing subunits. Hydrophobic contacts are observed with S649 and F656 residues.

      - In the AF ic3 open-state model, the ligand is stabilized through multiple π–π stacking interactions with Y652 residues from three subunits, forming a tight aromatic cage around its triazine and benzimidazole rings. Hydrophobic interactions are observed with hERG residues T623, S624, Y652, F656, and S660.

      - In the inactivated-state model, astemizole adopts a compact, horizontally oriented pose deeper in the channel pore, forming the most extensive interaction network among all the states. The ligand is tightly stabilized by multiple π–π stacking interactions with Y652 residues across three subunits, and forms hydrogen bonds with residues S624 and Y652. Additional hydrophobic contacts are observed with residues F557, L622, S649, and Y652.

      - Consistent with our findings, electrophysiology study by Saxena et al. identified hERG residues F557 and Y652 as crucial for astemizole binding, as determined through mutagenesis (DOI: 10.1038/srep24182).

      - In the cryo-EM structure (PDB 8ZYO) (DOI: 10.1016/j.str.2024.08.021), astemizole is stabilized by π–π stacking with Y652 residues. However, no hydrogen bonds are detected which may reflect limitations in cryo-EM resolution rather than true absence of contacts. Additional hydrophobic interacts are observed with L622 and G648 residues.

      E-4031 (Figure S13b):

      - In the PDB 5VA2-derived open-state model, E-4031 binds within the central cavity primarily through polar interactions. It forms a π–π stacking interaction with residue Y652, anchoring one end of the molecule. Polar interactions are observed with residues A653 and S660. Additional hydrophobic contacts are observed with residues A652 and Y652.

      - In the AF ic3 open-state model, E-4031 adopts a slightly deeper pose within the central cavity stabilized by dual π–π stacking interactions between its aromatic rings and hERG residue Y652. Additional hydrogen bonds are observed with residues S624 and Y652, and hydrophobic contacts are observed with residues T623 and S624.

      - In the inactivated-state model, E-4031 adopts its deepest and most stabilized binding pose, consistent with its experimentally observed preference for this state. The ligand is stabilized by multiple π–π stacking interactions between its aromatic rings and hERG residues Y652 from opposing subunits. The sulfonamide nitrogen engages in hydrogen bonding with residue S649, while the piperidine nitrogen hydrogen bonds with residue Y652. Hydrophobic contacts with residues S624, Y652, and F656 further reinforce the binding, enclosing the ligand in a densely packed aromatic and polar environment.

      - Previous mutagenesis study showed that mutations involving hERG residues F557, T623, S624, Y652, and F656 affect E-4031 binding (DOI: 10.3390/ph16091204).

      - In the cryo-EM structure (PDB 8ZYP) (DOI: 10.1016/j.str.2024.08.021), E-4031 engages in a single π–π stacking interaction with hERG residue Y652, anchoring one end of the molecule. The remainder of the ligand is stabilized predominantly through hydrophobic contacts involving residues S621, L622, T623, S624, M645, G648, S649, and additional Y652 side chains, forming a largely nonpolar environment around the binding pocket.

      In both cryo-EM structures, astemizole and E-4031 adopt binding poses that closely resembles the inactivated-state model in our docking study, consistent with experimental evidence that these drugs preferentially bind to the inactivated state (DOI: 10.1124/mol.108.049056). This raises the possibility that the cryo-EM structures may capture an inactivatedlike channel state. However, closer examination of the SF reveals that the cryo-EM conformations more closely resemble the open-state PDB 5VA2 structure (DOI: 10.1016/j.cell.2017.03.048), which has been shown to be conductive here and in previous studies (DOI: 10.1073/pnas.1909196117, 10.1161/CIRCRESAHA.119.316404).

      The conformational differences between the cryo-EM and open-state docking results may reflect limitations of the docking protocol itself, as GALigandDock assumes a rigid protein backbone and cannot account for ligand-induced large conformational changes. In our open-state models, the hydrophobic pocket beneath the selectivity filter is too small to accommodate bulky ligands (Figure 3a, b), whereas the cryo-EM structures show a slight outward shift in the S6 helix that expands this space (Figure S13).These allosteric rearrangements, though small, falls outside the scope of the current docking protocol, which lacks flexibility to capture these local, ligand-induced adjustments (DOI: 10.3389/fphar.2024.1411428).

      In contrast, docking to the AlphaFold-predicted inactivated-state model reveals a reorganization beneath the selectivity filter that creates a larger cavity, allowing deeper ligand insertion. Notably, neither our inactivated-state docking nor the available cryo-EM structures show strong interactions with F656 residues. However, in the AlphaFold-predicted inactivated model, the more extensive protrusion of F656 into the central cavity may further occlude the drug’s egress pathway, potentially trapping the ligand more effectively. This could explain why mutation of F656 significantly reduces the binding affinity of E-4031 (DOI: 10.3390/ph16091204). These findings suggest that inactivation may trigger a series of modular structural rearrangements that influence drug access and binding affinity, with different aspects potentially captured in various computational and experimental studies, rather than resulting from a single, uniform conformational change.

      Discussion of the original Wang and Mackinnon finding, DOI: 10.1016/j.cell.2017.03.048 regarding C-inactivation, pore mutation S631A and F627 rearrangement is likely warranted. Since hERG inactivation is present at 0 mV in WT channels (the likely voltage for the CryoEM study) please discuss how this might affect interpretations of starting with this structure as a template for models presented here, perhaps as part of Figure S1.

      We sincerely thank the reviewer for bringing up the insightful findings from Wang and MacKinnon regarding hERG C-type inactivation as well as the voltage context of their cryo-EM structure (PDB 5VA2). We recognize that WT hERG exhibits inactivation at 0 mV, likely the condition of the cryo-EM study, raising the possibility that PDB 5VA2, while classified as an open state, might subtly reflect features of inactivation. Notably, PDB 5VA2 has been widely adopted in numerous studies and consistently found to represent a conducting state, such as in Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) and Miranda et al. (DOI: 10.1073/pnas.1909196117). Our MD simulations further support this, showing K<sup>+</sup> conduction in the 5VA2-based open-state model (Figure 4a, c), consistent with its selectivity filter conformation (Figure S1a). Although we used PDB 5VA2 as a starting template for predicting inactivated and closed states, our AlphaFold2 predictions did not rigidly adhere to this structure, as evidenced by distinct differences in hydrogen bond networks, drug binding affinities, pore radii, and ion conductivity between our state-specific hERG channel models (Figures S2, 5, 3b, 4). Nevertheless, this does not preclude the possibility that PDB 5VA2’s certain potential inactivated-like traits at 0 mV could subtly influence our predictions elsewhere in the model, which warrants further exploration in future studies. In our revised analysis, we also tested an alternative AlphaFold-predicted conformation, referred to as Open (AlphaFold cluster 3), which, while sharing some similarities with PDB 5VA2, exhibits subtle differences in the selectivity filter and pore conformations. This structure was also found to be conducting ions and showed a drug binding profile similar to that of the PDB 5VA2-based open-state model. We greatly appreciate this feedback which helped us refine and strengthen our analysis.

      Page 8, the significance of 750 and 500 mV in terms of physiological role?

      We appreciate this opportunity to clarify the methodological rationale. Although these voltages significantly exceed typical physiological membrane potentials, their use in MD simulations is a well-established practice to accelerate ion conduction events. This approach helps overcome the inherent timescale limitations of conventional MD simulations, as demonstrated in previous studies of hERG and other ion channels. For instance, Miranda et al. (DOI: 10.1073/pnas.1909196117), Lau et al. (DOI: 10.1038/s41467-024-51208-w), Yang et al. (DOI: 10.1161/CIRCRESAHA.119.316404) applied similarly high voltages (500~750 mV) to study hERG K<sup>+</sup> conduction, which is notably small under physiological conditions at ~2 pS (DOI: 10.1161/01.CIR.94.10.2572), necessitating amplification to observe meaningful permeation within nanosecond-to-microsecond timescales. Likewise, studies of other K<sup>+</sup> ion channels, such as Woltz et al. (DOI: 10.1073/pnas.2318900121) on small-conductance calcium-activated K<sup>+</sup> channel SK2 and Wood et al. (DOI: 10.1021/acs.jpcb.6b12639) on Shaker K<sup>+</sup> channel, have used elevated voltages (250~750 mV) to probe ion conduction mechanisms via MD simulations. In addition, the typical timescale of these simulations (1 μs) is too short to capture major structural effects such as those leading to inactivation or deactivation which occur over milliseconds in physiological conditions.

      The abstract could be edited a bit to more clearly state the novel findings in this study.

      We thank the reviewer for their suggestion. We have revised the abstract to read: “To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is K<sub>V</sub>11.1 (hERG), comprising the primary cardiac repolarizing current, I<sub>kr</sub>. hERG is a notorious drug antitarget against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. While prior studies have applied AlphaFold to predict alternative protein conformations, we show that the inclusion of carefully chosen structural templates can guide these predictions toward distinct functional states. This targeted modeling approach is validated through comparisons with experimental data, including proposed state-dependent structural features, drug interactions from molecular docking, and ion conduction properties from molecular dynamics simulations. Remarkably, AlphaFold not only predicts inactivation mechanisms of the hERG channel that prevent ion conduction but also uncovers novel molecular features explaining enhanced drug binding observed during inactivation, offering a deeper understanding of hERG channel function and pharmacology. Furthermore, leveraging AlphaFold-derived states enhances computational screening by significantly improving agreement with experimental drug affinities, an important advance for hERG as a key drug safety target where traditional single-state models miss critical state-dependent effects. By mapping protein residue interaction networks across closed, open, and inactivated states, we identified critical residues driving state transitions validated by prior mutagenesis studies. This innovative methodology sets a new benchmark for integrating deep learning-based protein structure prediction with experimental validation. It also offers a broadly applicable approach using AlphaFold to predict discrete protein conformations, reconcile disparate data, and uncover novel structure-function relationships, ultimately advancing drug safety screening and enabling the design of safer therapeutics.”

      Many of the Supplemental figures would fit in better in the main text, if possible, in my opinion. For instance, the network analysis (Fig. S2) appears to be novel and is mentioned in the abstract so may fit better in the main text. The discussion section could be focused a bit more, perhaps with headers to highlight the key points.

      Yes, we agree with the reviewer and made the suggested changes. We moved Figure S2 as a new main-text figure.

      Additionally, we revised the Discussion section to improve focus and clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Hama et al. explored the molecular regulatory mechanisms underlying the formation of the ULK1 complex. By employing the AlphaFold structural prediction tool, they showed notable differences in the complex formation mechanisms between ULK1 in mammalian cells and Atg1 in yeast cells. Their findings revealed that in mammalian cells, ULK1, ATG13, and FIP200 form a complex with a stoichiometry of 1:1:2. These predicted interaction regions were validated through both in vivo and in vitro assays, enhancing our understanding of the molecular mechanisms governing ULK1 complex formation in mammalian cells. Importantly, they identified a direct interaction between ULK1 and FIP200, which is crucial for autophagy. However, some aspects of this manuscript require further clarification, validation, and correction by the authors.

      Thank you for your thorough evaluation of our manuscript. We have carefully revised the manuscript to address your concerns by performing extra experiments and providing additional clarifications, validations, and corrections as written below.

      Reviewer #2 (Public review):

      Summary:

      This is important work that helps to uncover how the process of autophagy is initiated - via structural analyses of the initiating ULK1 complex. High-resolution structural details and a mechanistic insight of this complex have been lacking and understanding how it assembles and functions is a major goal of a field that impacts many aspects of cell and disease biology. While we know components of the ULK1 complex are essential for autophagy, how they physically interact is far from clear. The work presented makes use of AlphaFold2 to structurally predict interaction sites between the different subunits of the ULK1 complex (namely ULK1, ATG13, and FIP200). Importantly, the authors go on to experimentally validate that these predicted sites are critical for complex formation by using site-directed mutagenesis and then go on to show that the three-way interaction between these components is necessary to induce autophagy in cells.

      Strengths:

      The data are very clear. Each binding interface of ATG13 (ATG13 with FIP300/ATG13 with ULK1) is confirmed biochemically with ITC and IP experiments from cells. Likewise, IP experiments with ULK1 and FIP200 also validate interaction domains. A real strength of the work in in their analyses of the consequences of disrupting ATG13's interactions in cells. The authors make CRISPR KI mutations of the binding interface point mutants. This is not a trivial task and is the best approach as everything is monitored under endogenous conditions. Using these cells the authors show that ATG13's ability to interact with both ULK1 and FIP200 is essential for a full autophagy response.

      Thank you for your thoughtful review and for highlighting the importance of our approach.

      Weaknesses:

      I think a main weakness here is the failure to acknowledge and compare results with an earlier preprint that shows essentially the same thing (https://doi.org/10.1101/2023.06.01.543278). Arguably this earlier work is much stronger from a structural point of view as it relies not only on AlphaFold2 but also actual experimental structural determinations (and takes the mechanisms of autophagy activation further by providing evidence for a super complex between the ULK1 and VPS34 complexes). That is not to say that this work is not important, as in the least it independently helps to build a consensus for ULK1 complex structure. Another weakness is that the downstream "functional" consequences of disrupting the ULK1 complex are only minimally addressed. The authors perform a Halotag-LC3 autophagy assay, which essentially monitors the endpoint of the process. There are a lot of steps in between, knowledge of which could help with mechanistic understanding. Not in the least is the kinase activity of ULK1 - how is this altered by disrupting its interactions with ATG13 and/or FIP200?

      Thank you for this valuable feedback. In response, we performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model. We have summarized both the similarities and differences in newly included figures (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text. Furthermore, to address the downstream consequences of ULK1 complex disruption, we have investigated the impact on ULK1 kinase activity, specifically examining how mutations affecting ATG13 or FIP200 interaction alter ULK1’s phosphorylation of a key substrate ATG14. In addition, we analyzed the effect on ATG9 vesicle recruitment. We provide the corresponding data as Figure S3C-E and detailed discussions in the revised manuscript.

      Reviewer #3 (Public review):

      In this study, the authors employed the protein complex structure prediction tool AlphaFold-Multimer to obtain a predicted structure of the protein complex composed of ULK1-ATG13-FIP200 and validated the structure using mutational analysis. This complex plays a central role in the initiation of autophagy in mammals. Previous attempts at resolving its structure have failed to obtain high-resolution structures that can reveal atomic details of the interactions within the complex. The results obtained in this study reveal extensive binary interactions between ULK1 and ATG13, between ULK1 and FIP200, and between ATG13 and FIP200, and pinpoint the critical residues at each interaction interface. Mutating these critical residues led to the loss of binary interactions. Interestingly, the authors showed that the ATG13-ULK1 interaction and the ATG13-FIP200 interaction are partially redundant for maintaining the complex.

      We are grateful for your high evaluation of our work.

      The experimental data presented by the authors are of high quality and convincing. However, given the core importance of the AlphaFold-Multimer prediction for this study, I recommend the authors improve the presentation and documentation related to the prediction, including the following:

      (1) I suggest the authors consider depositing the predicted structure to a database (e.g. ModelArchive) so that it can be accessed by the readers.

      We have deposited the AlphaFold model to ModelArchive with the accession code ma-jz53c, which is indicated in the revised manuscript.

      (2) I suggest the authors provide more details on the prediction, including explaining why they chose to use the 1:1:2 stoichiometry for ULK1-ATG13-FIP200 and whether they have tried other stoichiometries, and explaining why they chose to use the specific fragments of the three proteins and whether they have used other fragments.

      We appreciate your suggestion. As we noted in the original manuscript, previous studies have shown that the C-terminal region of ULK1 and the C-terminal intrinsically disordered region of ATG13 bind to the N-terminal region of the FIP200 homodimer (Alers, Loffler et al., 2011; Ganley, Lam du et al., 2009; Hieke, Loffler et al., 2015; Hosokawa, Hara et al., 2009; Jung, Jun et al., 2009; Papinski and Kraft, 2016; Wallot-Hieke, Verma et al., 2018). We relied on these findings when determining the specific regions to include in our complex prediction and when selecting a 1:1:2 stoichiometry for ULK1–ATG13–FIP200 which was reported previously (Shi et al., 2020). We also used AlphaFold2 to predict the structures of the full-length ULK1–ATG13 complex and the complex of the FIP200N dimer with full-length ATG13, confirming that there were no issues with our choice of regions (revised Figure S1A-C). In the revised manuscript, we have provided a more detailed explanation of our rationale based on the previous reports and additional AlphaFold predictions.

      (3) I suggest the authors present the PAE plot generated by AlphaFold-Multimer in Figure S1. The PAE plot provides valuable information on the prediction.

      We provided the PAE plot in the revised Figure S1C.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1D, the labels for the input and IP of ATG13-FLAG should be corrected to ATG13-FLAG FIP3A.

      We thank the reviewer for pointing out these labeling mistakes. We revised the labels based on the suggestions.

      (2) In the discussion section, the authors should address why ATG13-FLAG ULK1 2A in Fig. 2D leads to a significantly lower expression of ULK1 and provide possible explanations for this observation.

      ATG13 and ATG101, both core components of the ULK1 complex, are known to stabilize each other through their mutual interaction. Loss or reduction of one protein typically leads to the destabilization of the other. In this context, ULK1 is similarly stabilized by binding to ATG13. Therefore, ATG13-FLAG ULK2A mutant, which has reduced binding to ULK1, likely loses this stabilizing activity and ULK1 becomes destabilized, resulting in the lower expression levels of ULK1. We added these discussions in the revised manuscript.

      (3) In Figure 4B, the authors should explain why Atg13-FLAG KI significantly affects the expression of endogenous ULK1. Could Atg13-FLAG KI be interfering with its binding to ULK1? Experimental evidence should be provided to support this. Additionally, does Atg13-FLAG KI affect autophagy? Wild-type HeLa cells should be included as a control in Figure 4C and 4D to address this question.

      Thank you for your constructive suggestion. We found a technical error in the ULK1 blot of Figure 4B. Therefore, we repeated the experiment. The results show that ULK1 expression did not significantly change in the ATG13-FLAG KI. These findings are consistent with Figure S3A. We have replaced Figure 4B with this new data.

      We agree that including wild-type HeLa cells as a control is essential to determine whether ATG13-FLAG KI affects autophagy. We performed the same experiments in wild-type HeLa cells and found that ATG13-FLAG KI does not significantly impact autophagic flux. Accordingly, we have replaced Figures 4D and 4E with these new data.

      (4) In Figure 3C, the authors used an in vitro GST pulldown assay to detect a direct interaction between ULK1 and FIP200, which was also confirmed in Figure 3E. However, since FLAG-ULK1 FIP2A affects its binding with ATG13 (Fig. 3E), it is possible that ULK1 FIP2A inhibits autophagy by disrupting this interaction. The authors should therefore use an in vitro GST pulldown assay to determine whether GST-ULK1 FIP2A affects its binding with ATG13. Additionally, the authors should investigate whether the interaction between ULK1 and FIP200 in cells requires the involvement of ATG13 by using ATG13 knockout cells to confirm if the ULK1-FIP200 interaction is affected in the absence of ATG13.

      Thank you for the valuable suggestion. We examined the effect of the FIP2A mutation on the ULK1–ATG13 interaction using isothermal titration calorimetry (ITC) to obtain quantitative binding data. The results showed that the FIP2A mutation does not markedly alter the affinity between ULK1 and ATG13 (revised Figure S2B), suggesting that FIP2A mainly weakens the ULK1–FIP200 interaction. Regarding experiments in ATG13 knockout cells, ULK1 becomes destabilized in the absence of ATG13, making it technically difficult to assess how the ULK1–FIP200 interaction is affected under those conditions.

      Reviewer #2 (Recommendations for the authors):

      I feel the manuscript would benefit from a more detailed comparison with the Hurely lab paper - are the structural binding interfaces the same, or are there differences?

      We appreciate the suggestion to compare our results more closely with the work from the Hurley lab. We performed a detailed structural comparison between the cryo-EM structure reported in the referenced preprint and our AlphaFold-based model (revised Figure 2A, B, 3B, S1F) and provided an in-depth discussion in the main text.

      As mentioned, what happens downstream of disrupting the ULK1 complex? How is ULK1 activity changed, both in vitro and in cells? Does disruption of the ULK1 complex binding sites impair VPS34 activity in cells (for example by looking at PtdIns3P levels/staining)?

      Thank you for your insightful comments. We focused on elucidating how disrupting the ULK1 complex leads to impaired autophagy. To assess ULK1 activity, we measured ULK1-dependent phosphorylation of ATG14 at Ser29 (PMID: 27046250; PMID: 27938392). In FIP3A and FU5A knock-in cells, ATG14 phosphorylation was significantly reduced, indicating decreased ULK1 activity (revised Figure S3D, E). This observation is consistent with previous work showing that FIP200 recruits the PI3K complex. Notably, in ATG13 knockout cells, ATG14 phosphorylation became almost undetectable, though the underlying mechanism remains to be fully investigated. Altogether, these data point to reduced ULK1 activity as a key factor explaining the autophagy deficiency observed in FU5A knock-in cells.

      We also explored possible downstream mechanisms. One well-established function of ATG13 is to recruit ATG9 vesicles (PMID: 36791199). These vesicles serve as an upstream platform for the PI3K complex, providing the substrate for phosphoinositide generation (PMID: 38342428). To clarify how our mutations impact this step, we starved ATG13-FLAG knock-in cells and observed ATG9 localization. Unexpectedly, even in FU5A knock-in cells where ATG13 is almost completely dissociated from the ULK1 complex, ATG9A still colocalized with FIP200 (revised Figure S3C). These puncta also overlapped with p62, likely because p62 bodies recruit both FIP200 and ATG9 vesicles. Although we suspect that ATG9 recruitment is nonetheless impaired under these conditions, we were unable to definitively demonstrate this experimentally and consider it an important avenue for future study.

      Reviewer #3 (Recommendations for the authors):

      Here are some additional minor suggestions:

      (1) The UBL domains are only mentioned in the abstract but not anywhere else in the manuscript. I suggest the authors add descriptions related to the UBL domains in the Results section.

      We thank the reviewer for pointing out the lack of description of UBL domains, which we added in Results in the revised manuscript.

      (2) The authors may want to consider adding a diagram in Figure 1A to show the domain organization of the three full-length proteins and the ranges of the three fragments in the predicted structure.

      We have added a proposed diagram as Figure 1A.

      (3) I suggest the authors consider highlighting in Figure 1A the positions of the binding sites shown in Figure 1B, for example, by adding arrows in Figure 1A.

      We have added arrows in the revised Figure 1B (which was Figure 1A in the original submission).

      (4) In Figure 1D, "Atg13-FLAG" should be "Atg13-FLAG FIP3A".

      We have revised the labeling in Figure 1D.

      (5) "the binding of ATG13 and ULK1 to the FIP200 dimer one by one" may need to be re-phrased. "One by one" conveys a meaning of "sequential", which is probably not what the authors meant to say.

      We have revised the sentence as “the binding of one molecule each of ATG13 and ULK1 to the FIP200 dimer”.

      (6) In "Wide interactions were predicted between the four molecules", I suggest changing "wide" to "extensive".

      We have changed “wide” to “extensive” in the revised manuscript.

      (7) In "which revealed that the tandem two microtubule-interacting and transport (MIT) domains in Atg1 bind to the tandem two MIT interacting motifs (MIMs) of ATG13", I suggest changing the two occurrences of "tandem two" to "two tandem" or simply "tandem".

      We simply used "tandem" in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements

      We sincerely thank all three reviewers for their thoughtful and constructive feedback. Your comments were invaluable in improving the clarity and quality of our work.

      In this study, we revisit a previously overlooked lipophilic dye, demonstrating its utility for live-cell imaging that transport in a non-vesicular pathway and label autophagy related structures. Against the backdrop of increasing attention to membrane contact sites (MCSs), bridge-like lipid transfer proteins (BLTPs), and organelle biogenesis, we aim to propose the possibility of a reversible one-way phospholipid transfer activity that really takes place in living cells.

      As Reviewer #1 noted, recent cryo-EM studies (e.g., Oikawa et al.) have highlighted the importance of lipids in autophagosome formation. And there are some existed in vitro studies. However, we believe that we have to think about the consistence of simplified in vitro reconstitution and the complex real cellular environment. In addition, to our knowledge, no studies have directly tracked lipid flow dynamics over time in living cells. We believe our work contributes to this gap by combining three interesting technical approaches: (a) R18 as a lipid-tracing dye, (b) FRAP analysis on the isolation membrane, and (c) the use of Ape1 overexpression to stall autophagosome closure, enabling us to visualize reversible lipid flow in vivo. While these techniques may not appear "fancy," we hope they offer new insights that can inspire further exploration in lipid dynamics story in a real cellular environment.

      We appreciate Reviewer #2's comments on our high imaging quality and Reviewer #3's recognition of our approach as an elegant way to study lipid transfer. We have revised the manuscript accordingly and included additional explanations, figure clarifications, and planned experiments to address remaining concerns.

      As two key concerns were raised repeatedly by all reviewers, we would like to address them here:

      1. Regarding the concern that the evidence for reversible lipid transfer from the IM to the ER is not sufficiently strong:

      We are deeply grateful to Reviewer #2 for the insightful suggestion to compare the fluorescence recovery of the adjacent bleached ER to that of the ER-IM MCS, to exclude the possibility that recovery at the ER-IM MCS originates from nearby ER rather than from the IM. Following this suggestion, we performed a quantitative analysis using unbleached ER as a background. Interestingly, in every sample, the adjacent bleached ER consistently showed a significantly lower fluorescence recovery than the ER-IM MCS. We also used the IM as a background for normalization, the difference became even more pronounced, further supporting the idea that the adjacent ER could not be the source of the recovery signal at the ER-IM MCS. These findings strengthen our conclusion that phospholipid recovery at the MCS could be derived from the IM. The updated analysis and corresponding figure panels (Figure 5K, 5L, and 5M), along with the relevant text (lines 384-396), have been revised accordingly.

      Regarding the concern that the evidence for R18 transfer via Atg2 as a bridge-like lipid transfer protein is not sufficiently direct:

      In addition to the evidence presented in this manuscript, we have now cited our parallel study currently under revision (Sakai et al., bioRxiv 2025.05.24.655882v1), where we provide direct evidence that Atg2 indeed functions as a bridge-like lipid transfer protein, rather than a shuttle. Importantly, we also show in that study that R18 transfer requires the bridge-like structure of Atg2. This new reference has been cited in the revised manuscript, and relevant textual explanations have been added to provide further support.

      We hope that the revisions and our revision plan can address the reviewers key concerns. Please find our detailed point-by-point responses below.

      Response to the Reviewer ____#____1

      In their study, Hao and colleagues exploited the fluorescent fatty acid R18 to follow phospholipid (PL) transfer in vivo from the endoplasmic reticulum to the IM during autophagosome formation. Although the results are interesting, especially the retrograde transport of PLs, based on the provided data, additional control experiments are needed to firmly support the conclusions.

      We sincerely thank the reviewer for the positive assessment and agree that additional controls are necessary to support our conclusion. Detailed responses and corresponding revisions are provided below.

      An additional point is that the authors also study the internalization of R18 into cells and found a role of lipid flippases and oxysterol binding proteins. While this information could be useful for researchers using this dye, these analyses/findings have no specific connection with the topic of the manuscript, i.e. the PL transfer during autophagosome formation. Therefore, they must be removed.

      We thank the reviewer for the thoughtful comment. We understand the concern that the R18 internalization analysis may appear peripheral to the manuscript's main focus on phospholipid transfer during autophagosome formation. However, we respectfully believe that this section is critical for establishing the mechanistic basis as this study represents the first detailed in vivo application of R18 for tracing lipid dynamics. We believe it is interesting that R18 entry is not due to chemically passive diffusion or non-specific adsorption, but occurs through a biologically regulated, non-vesicular lipid transport pathway. This mechanistic context underpins the reliability of using R18 to monitor ER-to-IM lipid transport in the autophagy pathway.

      To improve clarity and coherence, we have added explanatory text in the Introduction and at the start of the Results section to explicitly link the internalization assay to the subsequent autophagy-related experiments (line 94-98, 185-187). We hope this helps guide the reader through the rationale and relevance of this part of the study.

      Major points:

      1) In general, the quality of the microscopy images are quite poor and this make it difficult to assert some of the authors' conclusions.

      We thank the reviewer for the feedback. To better address this concern, we would appreciate clarification regarding which specific images or figure panels were found to be of low quality. Overall, we believe the microscopy data presented are of sufficient resolution and clarity to support our main conclusions, as also noted by Reviewer #2 ("the high-quality images and FRAP experiments").

      We acknowledge that certain phenomena-such as occasional R18 labeling of the vacuole-were not clearly explained in the original manuscript. We have now included additional clarification in the results section and mentioned this limitation in the discussion (lines 170-171, 436-438), along with a note on ongoing experiments to further investigate this point.

      2) It would be important to perform some lipidomics analysis to determine in which PLs and other lipids or lipid intermediates R18 is incorporated. First, it will be important to know which the major PL species are are labelled under the conditions of the experiments done in this study. Second, the authors assume that all the R18 is exclusively incorporated into PLs and this is what they follow in their in vivo experiments. What about acyl-CoA, which has been shown to be a key player in the IM elongation (Graef lab, Cell)?

      We thank the reviewer for raising this point. However, we believe this is based on a misunderstanding of the chemical nature of R18. R18 is not a free fatty acid analog and cannot be incorporated into phospholipids or acyl-CoA via metabolic pathways. Due to its chemical structure-a bulky rhodamine headgroup attached to a long alkyl chain-it cannot undergo enzymatic conjugation or incorporation into membrane lipids. This is why we did not pursue lipidomics analysis. Instead, we focused on characterizing the biological behavior of R18 through a range of live-cell assays, including temperature and ATP dependency, involvement of flippases, OSBP proteins, and Atg2, all of which support a regulated, non-vesicular lipid transport pathway. Additionally, the AF3 structural model presented in this study is consistent with this interpretation, showing no evidence of R18 forming chemical bonds with phospholipids.

      3) Figure 1A and 1B. The authors conclude that Atg2 is involved in the lipid transfer since R18 does not localize to the PAS/ARS in the atg2KO cells. However, another possible explanation is that in those cells the IM is not formed and does not expand, and con sequetly R18 is present in low amounts not detectable by fluorescence microscopy. To support their conclusion, the authors must assess PAS-labelling with R18 in cells lacking another ATG gene in which Atg2 is still recruited to the PAS.

      We thank the reviewer for this important suggestion. As noted, the absence of R18 at the PAS in atg2Δ cells may reflect a lack of membrane formation rather than impaired lipid transfer. However, in support of our interpretation, our previous work (Hirata E, Ohya Y, Suzuki K, 2017) has shown that R18 accumulates at PAS-like structures in delipidation mutants, where the IM fails to expand but Atg2 is still recruited (please refer to the attached revision plan for further details). This suggests that the presence of Atg2, rather than the mere existence of a mature IM, contributes to R18 localization.

      To address this, we revised our statement to the more cautious: "R18 was undetectable at the PAS in atg2Δ cells," to avoid overinterpretation (lines 119-120). 4)

      4) Figure 2. As written, the paragraph this figure seems to indicate that flippases are directly involved in the translocation of R18 from the PM to the ER. As correctly indicated by the authors, flippases flip PLs, not fatty acids. Moreover, there are no PL synthesizing at the PM and thus probably R18 is not flipped upon incorporation into PL. As a result, the relevance of flippase in R18 internalization is probably indirect. This must be explained clearly to avoid confusion/misunderstandings.

      We thank the reviewer for this important clarification. We fully agree that flippases act on phospholipids, not fatty acids, and that R18 is not metabolically incorporated into phospholipids at the plasma membrane. However, our ongoing work (Rev. Figure 1) shows that R18 preferential labeling affinity for PS and PE in vivo (yeast phospholipid synthesis mutants), consistent with its flippase-dependent localization. Flippases are known to specifically flip PS and PE. While R18 itself is not enzymatically modified or incorporated into phospholipids, its membrane distribution may thus depend on the lipid environment and the activity of lipid-translocating proteins.

      Preliminary data supporting this observation are included in the "Supplementary Figures for reviewer reference only" and are not part of the public submission.

      5) A couple of manuscript has shown a (partial) role of Drs2 in autophagy. The authors must explain the discrepancy between their own results and what published, especially because they use the GFP-Atg8 processing assay, which is less sensitive than the Pho8delta60 used in the other studies.

      We thank the reviewer for raising this important point. We are aware of prior reports implicating Drs2 in autophagy and in fact discussed this work directly with the authors during the course of our experiments, who kindly provided helpful suggestions. While our GFP-Atg8 processing assay did not show significant defects upon Drs2 deletion, strain background differences may explain this discrepancy. We also appreciate the suggestion to use the Pho8Δ60 assay and plan to include it in future experiments.

      Additionally, authors should check whether the Atg2 and Atg18 proteins are present at the IM-ER membrane contact sites in the same rates after nutrient replenished than when cells are nitrogen-starved, since this complex would determine the lipid transfer dynamics at this membrane contact site.

      We thank the reviewer for the helpful suggestion. We plan to perform additional experiments to monitor Atg18 localization during the nutrient replenishment assay.

      6) Authors used a predicted Atg2 lipid-transfer mutant (Srinivasan et al, J Cel Biol, 2024), but not direct prove that this mutant is defective for this activity. As previously done for other Atg2/ATG2-related manuscripts (Osawa et al, Nat Struct Mol Biol, 2019; Valverde et al, J Cel Biol, 2019), this must be measure in vitro. Moreover, they do not show whether other known functions of Atg2 are unaffected when expressing this Atg2 mutant, e.g. formation of the IM-ER MCSs, Atg2 interaction with Atg9 and localization at the extremity of the IM...

      We thank the reviewer for this concern. The lipid-transfer-deficient Atg2 mutant used here is based on the same structural rationale as in our recent parallel study (Sakai et al., bioRxiv 2025; https://www.biorxiv.org/content/10.1101/2025.05.24.655882v1, currently under revision). In that study, we addressed whether Atg2 indeed functions as a bridge-like lipid transfer protein, and also used R18 to directly demonstrate the lipid transfer defect of this Atg2 mutant in vivo.

      We therefore believe that referencing this study provides mechanistic support for the use of this Atg2 mutant in the current manuscript. A citation and brief explanation have now been added to the revised text (line 315-316, 439-441). We also plan to perform the lipid transfer assay in vitro.

      7) The mNG-Atg8 signal is not recovered in the fluorescent recovery assays. Based on the observation that R18 signal comes back after photobleaching, authors suggest that the supply of Atg8 is not required for IM expansion. This idea is opposite to data where the levels of Atg8 and deconjugation of lipidated Atg8 determines the size of the forming autophagosomes (e.g., Xie et al, Mol Biol Cell, 2008; Nair et al, Autophagy, 2012). Similar results have also been obtained in mammalian cells (Lazarou and Mizushima results in cell lacking components of the two ubiquitin-like conjugation systems). This discrepancy requires an explanation.

      We thank the reviewer for pointing out this imprecise interpretation, and we sincerely apologize for the confusion it may have caused. We fully agree that Atg8 is essential for the expansion of the isolation membrane (IM), as supported by previous studies. In our FRAP data, mNG-Atg8 showed gradual recovery at the later timepoints, indicating that Atg8 can be replenished over time. The reason why R18 recovery appears much more rapid is likely due to the inherently fast lipid transfer activity of Atg2, the bridge-like lipid transport protein. In contrast, Atg8 signal recovery may have been delayed for two reasons: (1) slower recruitment kinetics to the IM, and (2) partial depletion of the available mNG-Atg8 protein pool due to photobleaching during the experiment.

      We have revised the relevant paragraph in the manuscript (line 326-330) to clarify these points and avoid potential misinterpretation.

      8) Although authors claim that there is a retrograde lipid transfer from the IM to the ER, based on the data, it quite difficult to extract these conclusions as they show a decrease in the lipid flow dynamics rather to an inversion of the lipid flow per se. Can the authors exclude that ER microdomains are formed at the ERES in contact with the IM, and consequently what they measure is a slow diffusion of R18-labeled lipid from other part of the ER to these ERES?

      We appreciate the reviewer's insightful comment. Indeed, we are also considering the possibility that lipid-enriched microdomains may form in the ER and contribute to complex lipid dynamics at contact sites. However, direct visualization of such domains in cells remains technically challenging, this remains one of the important directions we aim to pursue in future studies. While our current data do not allow us to definitively state that all recovered lipids originate from the IM, our FRAP experiments provide indirect yet strong support for the possibility that at least a substantial portion of the recovered lipid signal in the ER derives from the IM. Moreover, following Reviewer 2's major point No.4, we performed a direct comparison of R18 fluorescence recovery between the photobleached ER-IM MCS region and the adjacent bleachedER region (Figure 5K and 5M). Interestingly, each sample consistently showed lower fluorescence recovery in the adjacent bleached ER near the ER-IM MCS (mean = 0.20), compared to the ER-IM MCS region (mean = 0.28). To further validate this observation, we also used the IM as a background reference for normalization. This analysis revealed a more significant difference, with the adjacent bleached ER near the ER-IM MCS showing a lower recovery (mean = 0.47) than the ER-IM MCS (mean = 0.80).

      As the Reviewer2 pointed out, these results support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS is due to the signal coming from the IM, rather than from the adjacent bleached ER, which recovers more slowly and less efficiently. We have incorporated this new analysis into Figure 5, and accordingly revised the figure legend and main text (lines 384-396).

      9) The retrograde PL transfer is studied in cells overexpressing Ape1, in which IM elongation is stalled. This is a non-physiological experimental setup and consequently it is unclear whether what observed applies to normal IM/autophagosomes. This event should be shown to occur in WT cells as well.

      We thank the reviewer for this point. Indeed, it remains technically difficult to visualize lipid flow during normal IM expansion in vivo, as this process is rapid and transient. And to date, there are no reports directly addressing lipid flow in this process.

      But the Ape1 overexpression system provides a strategic advantage by temporally extending the IM elongation phase and spatially enlarging the IM, thus offering a unique opportunity to capture membrane behavior that would otherwise be transient and difficult to resolve. Importantly, this system arrests autophagosome closure, which we leveraged to investigate the potential reversibility of phospholipid transfer in a controlled and prolonged context. Without this system, it would be exceedingly difficult for reaserchers to examine the lipid flow directionality in living cells.

      Furthermore, the use of Ape1 overexpression has been widely employed in previous high-impact autophagy studies. We emphasize that our aim is to understand Atg2-mediated lipid transfer, and in this context, the Ape1 system provides a valuable and informative tool without compromising the validity of our conclusions.

      10) From the images provided, it appears that R18 also labels the vacuole. The vacuole form MCSs with the IM. Can the author exclude a passage of R18 from the vacuole to the IM?

      We thank the reviewer for the insightful comment. Our data suggest that R18 traffics from the plasma membrane to the ER, then to autophagy-related structures. Actually, following that, as we kown, autophagosomes will eventually reaches and fused with the vacuole. This explains the occasional weak R18 signals at the vacuole membrane, particularly in late-stage cells. We have revised the figure and clarified this point in the text to avoid oversimplification of R18 localization (lines 169-171, 426-428)

      Here we also added the results of our onging work (in preparation). R18 tends to accumulate in a dot-like compartment after prolonged rapamycin treatment and incubation (Rev. Figure 2). And the vacuolar labeling of R18 correlates with the degradation status of autophagosomes, rather than reverse lipid transport from the vacuole to the IM (Rev. Figure 2). Taken together, we believe that R18 transport from the vacuole back to the IM is unlikely.

      Preliminary data supporting this response are included in the "Supplementary Figures for reviewer reference only" and are not part of the public submission.

      Minor points:

      1) L66. One report has indicated that Vps13 may also play a role in the transfer of lipids from the ER to the IM (Graef lab, J. Cell Biol).

      Thank you for pointing this out. Their excellent work also suggested that the inherent lipid transfer activity of Atg2 is required for IM expansion. We have revised the sentence (lines 67-68, 312-314) and included the appropriate citation at these two places.

      2) L70. It must be indicated that IM is also called phagophore.

      We have revised the sentence (line 70-71). Thank you for pointing this out.

      3) L74. It is mentioned "Additionally, a hydrophobic cavity in the N-terminal region of Atg2 directly tethers Atg2 to the ER, particularly the ER exit site (ERES), which is considered a key hub for autophagosome biogenesis", but there is no experimental evidence supporting that Atg2 is involved in the tethering with the ERES.

      Thank you for pointing this out. We have removed the N-terminal region part and revised the sentence accordingly (line 79-81) to avoid overstatement.

      4) L90. PAS must be listed between the ARS.

      We have revised the sentence (line 97-98). Thank you for pointing this out.

      5) Upon deletion of ATG39 and ATG40, there is a pronounced reduction of mNG-Atg8 labelled with R18. This would suggest that these two ER-phagy receptors are required for the PL transfer from the ER to the IM, which is not the case as autophagy is mildly affected by the absence of them (e.g., Zhang et al, Autophagy, 2020).

      We thank the reviewer for the important comment and agree that Atg39 and Atg40 are not required for phospholipid transfer from the ER to the IM. We have revised the text (lines 155-157). We appreciate if the reviewer could provide the DOI or PubMed ID for this paper.

      6) Authors referred that "no direct evidence has been found to confirm lipid transfer at the ER-IM MCS in living cells" (lines 282-283). However, a recent paper has shown that de novo-synthesized phosphatidylcholine is incorporated from the ER to the autophagosomes and autophagic bodies (Orii et al, J Cel Biol, 2021). This reference should be mentioned in the manuscript.

      Thank you for your insightful reminder. This paper beautifully demonstrated the importance of de novo-synthesized phosphatidylcholine in autophagy using electron microscopy. We have now included its citation and brief discussion in the revised manuscript (lines 74-76, 297-298). However, we respectfully note that direct observation of lipid transfer at the ER-IM MCS in living cells still remains unproven.

      7) In lines 252-253, the sentence "R18 transport from the PM to the ER was partially impaired in osh1Δ osh2Δ, osh6Δ osh7Δ, and oshΔ osh4-1 cells (Figure S3). These results suggest that Osh proteins participate in transferring R18 from the PM to the ER" does not recapitulate what is observed in Fig. S3. Moreover, the Emr lab has generate a tertadeletion mutant in which the PM-ER MCSs are abolished. The authors could examine this mutant.

      We thank the reviewer for this helpful comment and sincerely apologize for the lack of clarity in our original description. Our conclusion was primarily based on the partial PM accumulation of R18 observed in some osh mutant strains shown in Figure S3, which motivated us to further investigate this pathway using the OSW-1 inhibitor. We have revised the corresponding text to improve the logic and clarity of this section.

      We appreciate the recommendation of the tether∆ mutant. Our preliminary tests indicate that R18 still properly labels the ER in tether∆ cells, suggesting that its localization is not due to passive diffusion at membrane contact sites, but rather involves specific transport mechanisms. As this is an initial observation, we plan to confirm the result and include it in a future revision.

      Reviewer #1 (Significance (Required)):

      General assistent: Strength: potential new system to monitor lipid flow Limitations: Indirect evidences and in the case of the retrograde transport of phospholipids, it could be an artefact of the employed experimental approach. Advance: Little advances because something in part already shown in vitro. No new mechanisms uncovered. Audience: Autophagy and membrane contact site fields.

      We sincerely thank the reviewer for the overall evaluation. We agree that our current system offers indirect but promising evidence for lipid transfer events at ER-IM contact sites in vivo. While Atg2-mediated lipid transport has been proposed in vitro, our study adds value by (1) establishing a live-cell imaging way to monitor lipid flow in a non-vesicular transport pathway, (2) proposing a model of reversible one-way lipid transfer activity, and (3) addressing whether findings from simplified in vitro reconstitution accurately reflect the dynamics in the more complex real cellular environment.

      We recognize the limitations of our current approach and plan to include additional analyses to more cautiously interpret the observed retrograde movement. Although we do not claim to identify a new mechanism, we believe our work provides an interesting framework to inspire future efforts aimed at directly probing lipid flow at membrane contact sites in vivo.

      We also sincerely appreciate the reviewer's recognition of the potential value of this system for the autophagy and membrane contact site communities.

      Response to the Reviewer ____#2

      Non-vesicular lipid transfer plays an essential role in organelle biogenesis. Compared to vesicular lipid transfer, it is faster and more efficient to maintain proper lipid levels in organelles. In this study, Hao et al. introduced a high lipophilic dye octadecyl rhodamine B (R18), which specifically labels the ER structures and autophagy-related structures in yeast and mammalian cells. They characterised its distinct lipid entry into yeast cells via lipid flippase Neo1 and Drs2 on the plasma membrane, rather than through the endocytic pathway. They then demonstrated that R18 intracellular trafficking through plasma membrane to ER depends on "box-like" lipid transfer Osh proteins. They further looked into the "bridge-like" lipid transfer protein Atg2, using R18 as a lipid probe to track lipid transfer from ER to the isolation membrane (IM) during membrane expansion and reversible lipid transfer through IM to the ER-IM membrane contact sites (MCS) when autophagy is terminated by nutrient replenishment. The authors provide an interesting model of reversible directionality of Atg2 lipid transfer during autophagy induction and termination.

      We sincerely thank the reviewer for the thoughtful and constructive summary of our work. We are grateful for the recognition of the novelty of using R18 to visualize non-vesicular lipid transfer in vivo and for highlighting the conceptual contribution of our proposed model of reversible Atg2-mediated transport during autophagy.

      In response to the reviewer's valuable suggestions, we have revised key parts of the manuscript and prepared a detailed revision plan to address the specific concerns. We truly appreciate the reviewer's insights, which have been instrumental in improving the clarity of our study.

      Major points:

      1. Line 299-309: The FRAP assays were interesting and well performed. The authors photobleached R18 and Atg8 signal, and found R18 fluorescence recovery but not Atg8, which suggests lipid transfer occurs between ER and the IM and faster than Atg8 lipidation process during IM expansion. These results gave clear evidence that R18 can be transferred during IM expansion. The supply of Atg8 may not be not able to track within this time frame or the recovered amount of Atg8 may not be able to visualized due to the threshold limitation with confocal microcopy. This does not imply the supply of Atg8 to the IM is not required during IM expansion. This should be clarified.

      We thank the reviewer for this valuable comment and fully agree that Atg8 is essential for IM expansion. We apologize for any ambiguity that may have suggested otherwise.

      As pointed out, the lack of mNG-Atg8 recovery in our FRAP assay likely reflects the slower turnover of lipidated Atg8, limited observation time, and photobleaching of the existing protein pool. Notably, we observed a weak but gradual signal recovery at later time points, supporting this view. We have revised the relevant paragraph in the manuscript (line 326-330) to clarify these points and avoid potential misinterpretation.

      Please clarify how the length of the IM is measured and determined in Figure 4H and Figure 5D.

      We thank the reviewer for the vaulable comment. We have now clarified the method for quantifying IM length in the revised manuscript. Specifically, we modified the Statistical Analysis section of the Methods (line 642-643).

      Line 336-342: The description of the results should be clarified. Based on Figure 5H, the authors observed a significant decrease in the mNG-Atg8 signal during photobleaching of the R18 signal.

      We thank the reviewer for pointing out the ambiguity. We have now clarified the description in the revised manuscript. The sentence has been modified (line 360-362) as follows: "To determine whether nutrient replenishment terminates autophagy, we selectively photobleached the R18 signal and monitored the R18 (photobleached) and mNG-Atg8 (without photobleaching) signal following nutrient replenishment."

      The authors photobleached ER-IM MCS and the ER region (boxed region in Figure 5J) and quantified fluorescence recovery, normalized to the IM region and an ER control. The ER control was taken from the other cell. It would be helpful to compare and analyse the fluorescence recovery of R18 in the bleached ER region near the ER-IM MCS to that in the ER-IM MCS. This would help to confirm the ER-IM MCS fluorescence recovery is due to signal coming from the IM.

      We sincerely thank the reviewer for this insightful suggestion. We have now performed the suggested comparison. Interestingly, each sample consistently showed lower fluorescence recovery in the adjacent bleached ER near the ER-IM MCS (mean = 0.20), compared to the ER-IM MCS region (mean = 0.28). To further validate this observation, we also used the IM as a background reference for normalization. This analysis revealed a more significant difference, with the adjacent bleached ER near the ER-IM MCS showing a lower recovery (mean = 0.47) than the ER-IM MCS (mean = 0.80).

      As the reviewer pointed out, these results support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS is due to the signal coming from the IM, rather than from the adjacent bleached ER, which recovers more slowly and less efficiently. We have incorporated this new analysis into Figure 5, and accordingly revised the figure legend and main text (lines 384-396). Again, we appreciate this constructive and helpful suggestion.

      In figure 5K, the autophagic structure or IM labelled by R18 seems to be maintained when the mNG-Atg8 signal decreases or dissociates from the IM. Could the authors comment on that how they interpret the termination of the prolonged IM structure and IM shrinkage?

      We thank the reviewer for this insightful observation. Based on our live-cell imaging, we speculate that following the initial dissociation of Atg8, the IM membrane undergoes a relatively slow disassembly process, potentially retracting toward the ER-IM MCS, which often localizes near ER exit sites (ERES). This suggests that IM shrinkage may proceed via Atg8-independent mechanisms. Although the precise pathway remains unclear, we occasionally observed vesiculation events during this phase, supporting the idea that membrane remodeling continues even in the absence of Atg8. In response to this comment, we have revised our manuscript to reflect these interpretations (line 494-496).

      The author has shown that Atg2Δ and Atg2LT lipid transfer mutant impair R18 labelling of autophagic structures in Figure 4C. However, the evidence supporting that R18 fluorescence recovery at ER-IM MCS is mediated by reversible Atg2 lipid transfer is not direct. It would be helpful to clarify whether Atg2 stays on the enlarged autophagic membranes when the membrane has reached to its maximum length and no longer grows.

      We thank the reviewer for this important suggestion. As noted in our response to Reviewer 1 (Major Point 8-2), clarifying whether Atg2/Atg18 remains at the ER-IM contact sites after IM expansion is indeed important for supporting the reversible lipid transfer model. We plan to monitor the localization of Atg18 during the nutrient replenishment assay.

      Minor points:

      1. Figure 2A "Dpm-GFP" is missing. The experiment replicates in Figure 2M should be indicated.

      We thank the reviewer for pointing out these issues. The label for "Dpm-GFP" has been added in Figure 2A, and the number of experimental replicates for Figure 2M is now indicated in the figure legend.

      Figure S2, the magenta panel should be "R18".

      We thank the reviewer for catching this labeling error. We have corrected the magenta panel label in Figure S2 to "R18" in the revised version of the figure.

      Line 341-342: "Figure 5H and 5J" should be "Figure 5H and 5I"

      We thank the reviewer for pointing out this error. The citation has been corrected from "Figure 5H and 5J" to "Figure 5H and 5I" in the revised manuscript.

      Please describe how the lipid docking model of Atg2 is generated.

      We thank the reviewer for this question. We have added a description of the modeling approach in the Methods section of the revised manuscript (lines 640-646). We also added the configuration files of AlphaFold3 to the supplementary information.

      Reviewer #2 (Significance (Required)):

      Currently, lipid probes are emerging as powerful tools to understand membrane dynamics, integrity, and the lipid-mediated cellular functions. In this manuscript, the authors performed a detailed characterisation of octadecyl rhodamine B (R18) as a potential lipid probe, which specifically labels ER and autophagic membranes. They present high quality imaging data and performed FRAP experiments to monitor the membrane dynamics and investigate the lipid transfer directionality between the ER and autophagic structure. However, the evidence of Atg2-mediated reversible lipid transfer may not be direct and sufficient. The proposed reversible lipid transfer model is interesting and provides an explanation of lipid level regulation during autophagosome formation.

      We sincerely thank the reviewer for the positive assessment of our work and for acknowledging the potential of R18 as a lipid probe, as well as the quality of our imaging and FRAP experiments. We are particularly grateful that the reviewer found the proposed model of reversible lipid transfer both interesting and relevant to the broader question of lipid regulation during autophagosome formation.

      Regarding the reviewer's concern that the evidence for Atg2-mediated reversible lipid transfer may not be sufficiently direct, we agree this is a critical point. While technical limitations currently prevent direct visualization of lipid flow reversal at single-molecule resolution in vivo, we hope our revision plan strengthen the proposed model and better convey its biological relevance, while also acknowledging the current limitations and the need for further mechanistic work.

      Response to the ____Reviewer #3

      The authors address the question of how autophagic membrane seeds expand into autophagosomes. After nucleation, IMs expand in dependence of the bridge-like lipid transfer protein Atg2, which has been shown to tether the IM to the ER. Several studies have shown in vitro evidence for direct lipid transfer by Atg2 between tethered membranes, and previous evidence has shown that the hydrophobic groove of Atg2 implicated in lipid transfer is required for autophagosome biogenesis in vivo in yeast and mammalian cells.

      In this manuscript, the authors take advantage of the dye R18, which they show accumulates mainly in the ER after a few minutes. They show specifically that the import of R18 into cells and transfer to the ER depends on the activity of flippases in the plasma membrane and OSPB-related lipid transporter. Using different sets of FRAT experiments, the authors track the fluorescence recovery of R18 in the IM, the IM-ER membrane contact site and the neighboring ER. From these experiments the authors conclude that (a) R18 is transferred to IM from the ER when IMs expand and (b) can be transferred from IMs back to the ER when autophagy is deactivated.

      The use of a lipophilic dye to monitor lipid dynamics during IM expansion or dissolution is an elegant way to probe the mechanisms of lipid transfer across ER-IM contact sites. Quantitative in vivo data is critically needed to address this fundamental question in autophagy and contact site biology. However, the study remains limited in providing direct evidence that it is indeed the lipid transfer activity of Atg2, which underlies the R18 dynamics in IMs in vivo.

      We sincerely thank the reviewer for this thoughtful and encouraging summary. We appreciate the recognition of our approach using R18 to visualize lipid dynamics at ER-IM contact sites, and agree that in vivo quantitative data are critically needed to advance our understanding of autophagic membrane expansion.

      We also fully agree with the reviewer that our current study provides indirect-but conceptually informative-support for Atg2-mediated reversible one way lipid transfer. While prior in vitro studies have demonstrated the lipid transfer capability of Atg2, our goal here was to develop a live-cell system that allows the dynamic tracking of lipid flow in vivo, and to explore the possibility of reversible transport during autophagy termination. We hope our story will offer unique insights for future studies aiming to directly probe lipid transfer mechanisms in live cells.

      Regarding the reviewer's concern about the lack of direct evidence that Atg2's lipid transfer activity underlies the observed R18 dynamics, we fully acknowledge this limitation. To address this point, we would like to cite our parallel study currently under revision (Sakai et al., bioRxiv 2025.05.24.655882v1), which provides additional mechanistic evidence linking R18 dynamics to the lipid transfer function of Atg2. Further details and planned revisions are described in the responses below.

      Major points:

      (1) The authors use R18in FRAP experiments to follow its transfer from the ER into IMs. However, whether this transfer is mediated by Atg2 via its inherent lipid transfer activity remains indirect. The only evidence that implicates Atg2 directly is the observation that a lipid transfer deficient Atg2 variant fails to support IM expansion and autophagosome biogenesis. A similar full-length Atg2 mutant has previously been shown to block autophagosome formation in Dabrowski et al. 2023 in yeast, which the authors do not cite or discuss, suggesting the inherent lipid transfer activity of Atg2 is required for IM expansion. However, aside from this experiment, the mechanisms underlying R18 transfer remain unclear and, while they likely depend on or are at least partially mediated by Atg2, they may involve alternative mechanisms including vesicle transport or continuous membrane contacts. Moreover, for the assays with stalled or dissolving IM, it is essential for the authors to test whether Atg2 is still associated with these IMs. It is quite possible that Atg2 dissociates from maximally expanded or dissolving IMs, which would make their interpretation of the data very unlikely. Thus, it will be critical to provide consistent evidence that lipid transfer from the IM to the ER is mediated by Atg2. Ideally, the authors would label IM with BFP-Atg8, R18, and Atg2-GFP and perform their in vivo analysis.

      We sincerely thank the reviewer for the critical comments and valuable suggestions. To further support the link between R18 transfer and Atg2, we would like to highlight two complementary findings. As noted in our response to Reviewer 1 (Major Point 3), R18 can still label the PAS even when Atg2 is recruited but IM expansion is impaired, suggesting that R18 trafficking occurs in an Atg2-dependent manner. In addition, in our parallel study (bioRxiv, 2025.05.24.655882v1), we demonstrated that Atg2 acts as a bridge-like lipid transfer protein. Notably, when we mutated the bridge-forming region of Atg2, R18 transport to the IM was also disrupted.

      We greatly appreciate the reviewer's reminder regarding the study by Dabrowski et al., 2023, which we have now cited and discussed in the revised manuscript (lines 66-68, 312-314). Their findings that the inherent lipid transfer activity of Atg2 is required for autophagosome formation in vivo strongly reinforce our model.

      Regarding the possibility of vesicle transport, we consider this contribution minimal based on R18's preferential labeling of continuous membranes and its divergence from FM4-64 staining. As for the role of continuous membrane contacts, as also mentioned in our response to Reviewer 1, our preliminary tests indicate that R18 still properly labels the ER in tether∆ cells, suggesting that its localization is not due to passive diffusion at membrane contact sites, but rather involves specific transport mechanisms. As this is an initial observation, we plan to confirm the result and include it in a future revision.

      We also thank the reviewer for the suggestion to monitor Atg2 localization at the dissolving IM. As similarly pointed out by two other reviewers, we plan to track Atg18 during the nutrient replenishment assay.

      Finally, we appreciate the idea of triple-labeling with BFP-Atg8, R18, and Atg2-GFP. While our preliminary attempts encountered technical difficulties such as abnormal BFP-Atg8 localization and severe bleaching during long-term imaging in yeast, we plan to optimize this approach in future experiments.

      (2) Given the ER forms contact sites with many organelles using bridge-like lipid transfer proteins, how do the authors explain the preferential accumulation of R18 in ARS and not in for example PM (Fmp27), mitochondria, endosomes or vacuole (Vps13)? Why should R18 specifically transferred by Atg2 and not or to a much lower rate by Fmp27 or Vps13?

      We sincerely thank the reviewer for raising this insightful question. Indeed, we have carefully considered this point. Our data indicate that R18 labeling of autophagy-related structures (ARS) depends on Atg2, as demonstrated in the present manuscript and supported by our parallel study currently under revision (bioRxiv, 2025.05.24.655882v1).

      We speculate that the preferential accumulation of R18 in ARS may arise from structural and contextual differences among bridge-like LTPs, such as Atg2, Vps13, and Fmp27. Although all are capable of mediating lipid transfer, these proteins differ in their membrane tethering modes, cargo specificity, and spatial regulation. For example, Atg2 localizes specifically to ER-IM contact sites during autophagosome formation, where membrane expansion requires rapid lipid supply. In contrast, Vps13 and Fmp27 may function at more stable or less dynamic contacts, where lipid turnover or probe accessibility is more limited. We have added a brief discussion of this point in the revised manuscript to reflect this important consideration (lines 439-444).

      (3) Does R18 label autophagic bodies after they are formed. Could the authors add R18 after autophagic bodies have formed in atg15 or pep4 cells?

      We thank the reviewer for this excellent suggestion. To address whether R18 can label autophagic bodies post-formation, we plan to perform additional experiments by adding R18 after autophagic bodies have accumulated in atg15Δ or pep4Δ cells. This will help clarify whether R18 incorporates into pre-formed autophagic bodies or requires earlier membrane dynamics for its labeling.

      (4) Since Neo1- or OSBP-defective cells do not transfer R18 from the PM to the ER or other membranes, the authors should include these strains as controls for ER-dependent R18 transfer to ARSs.

      We thank the reviewer for this insightful suggestion. To further validate the ER-dependency of R18 transfer to autophagy-related structures, we plan to include Neo1- and OSBP-deficient strains as additional controls.

      Comments:

      The authors neglect to mention or discuss important recent literature directly related to their study:

      Schutter et al., Cell (2020); Orii et al., JCB (2021); Polyansky et al., EMBOJ (2022); Dabrowski et al., JCB (2023); Shatz et al., Dev Cell (2024)

      We sincerely thank the reviewer for pointing out these important and highly relevant studies. We apologize for our oversight in not citing them earlier. Each of these works has provided valuable insights that are directly related to and have greatly informed our current study. We have now cited and discussed these references in appropriate sections of the revised manuscript.

      Figure 1A and B: The authors need to describe how these cells were stained with R18 in the figure legend or text to help the reader to understand how these experiments were performed. Figure legends need to indicate at which time point after rapamycin treatment cells were analyzed.

      Thank you for the helpful suggestion. We have now added the corresponding information to the figure legends to clarify the staining procedure and time points.

      The authors need to clarify whether mNG-Atg8 colocalization with R18 was included for dot- and ring-like structures for WT cells as shown separately in 1A but not in 1B.

      Thank you for the comment. The quantification in Figure 1B includes both dot- and ring-like structures of mNG-Atg8 colocalized with R18 in WT cells, as shown in Figure 1A. We have now clarified this point in the revised figure legend.

      Figure 1C: The figure legend needs to describe the conditions cells were treated with and when cells were analyzed after rapamycin treatment (presumably).

      Thank you for the helpful suggestion. We have now added the corresponding information to the figure legends.

      Figure 1C: The authors should combine atg15 and pep4 deletions with atg2 or atg7 as controls in which autophagic bodies are not formed.

      Thank you for the valuable suggestion. We plan to perform these experiments that combine atg15 and pep4 deletions with atg2 or atg7 as controls.

      Figure 1E and F: R18 stains more than just the ER in the cells shown. In addition to atg39 and atg40, authors should include atg11 to inhibit all forms of selective autophagy.

      Thank you very much for the insightful comment. We agree and plan to include the atg11Δ mutant to inhibit all forms of selective autophagy.

      Figure S2A and B: The figures are mislabeled. Instead of FM4-64 it should say R18. In addition to the ER, in several images it is obvious to see R18 staining the vacuole membrane (for example Figure 2A 30 degrees) and others. Thus, the strong thresholding in S2 may give the reader an oversimplified view on R18 localization. This needs to be corrected.

      Thank you very much for pointing this out. We have corrected the labeling error in Figure S2A and B. Regarding the observation that R18 occasionally labels the vacuole membrane, we agree with the reviewer's comment. Based on our data, we believe that this signal likely reflects autophagosomes that have reached and fused with the vacuole, as expected in the later stages of autophagy. We have clarified this point in the text to avoid oversimplification of R18 localization (lines 169-171, 426-428).

      Figure 1G and H: In 1G, there are number of R18-stained patches not co-labeled by GFP-ER. What are these patches and which organelles to they represent? In 1H, given the tight association of the ER (omegasome) with forming IMs, it is difficult to discern whether R18 labels surrounding ER membrane or the IM itself. This needs to be more closely analyzed. The authors need to quantify these data similar to the yeast data.

      Thank you for the suggestion. We plan to perform additional quantification and colocalization analysis to clarify the identity of R18-positive signals in 1G and 1H.

      Figure 4A-C: A full-length PLT-deficient variant of Atg2 has been analyzed by Dabrowski et al, JCB 2023 in vivo. This work needs to be cited and discussed. The analysis needs to include punctate Atg8 structures for WT cells to exclude effects due to expansion defects.

      Thank you for the suggestion. We have now cited and discussed the work by Dabrowski et al., JCB 2023 in the revised manuscript (lines 67-68, 312-314). In addition, we have included an analysis of punctate Atg8 structures in WT cells to address the concern regarding potential expansion defects.

      Figure 4F-H: To measure the size changes in IMs, the authors would need to perform these experiments without bleaching the mNG-Atg8 signals.

      We apologize for the lack of clarity. The method for measuring IM size has now been added to the revised manuscript. In Figure 4, we note that mNG-Atg8 fluorescence actually shows a slow recovery over time. This limited recovery likely reflects both the slower turnover of Atg8 and the fact that the pre-existing Atg8 pool at the IM was partially photobleached. We have now revised the main text to clarify this point and included additional explanation (line 326-330).

      Figure 5C: The authors need to indicate the bleached areas in the mNG-Atg8 image for easier orientation. It looks to me that the area that the authors mark as IM-ER MCS is really the IM in proximity to the ER. Thus, if lipid transfer to the IM has ceased, I would not expect recovery here. If the IM-ER MCS area includes IM and the ER to similar extent, I would expect exactly what the authors show: IM does not recover while ER quickly recovers. On average, we would observe reduced recovery as shown in 5D.

      Thank you for the helpful suggestion, and we apologize for the oversight during figure preparation. We have now clearly indicated the bleached areas in the merged image in Figure 5C for better orientation. Additionally, we have carefully re-examined the defined ER-IM MCS region and confirm that the quantified area indeed corresponds to the contact site between the ER and the IM. And double checked the measurements shown in the figure remain correct.

      Figure 5L: Since mNG-Atg8 signal homogenously disappears from the IM, it is meaningless to measure size. How do the authors measure the size of something they cannot detect?

      Thank you for pointing this out. We agree with the reviewer's comment and have removed the panel from the revised version accordingly.

      Figure 5K: The authors need to show the whole bleached area overtime for the reader to be able to see where the recovered R18 signal might be coming from. Currently, it is impossible to discern whether the signal comes from the IM or from slow recovery from neighboring ER.

      We appreciate this insightful comment. To address the concern and following the suggestion from Reviewer 2 (Major Point No.4), we have now revised the figure to include an additional measurement of fluorescence recovery in the adjacent bleached ER (Figure 5K and 5M) (lines 384-396). These results further support our reversible lipid transfer model by demonstrating that fluorescence recovery at the ER-IM MCS originates from the IM, rather than from the adjacent bleached ER, which shows slower and less efficient recovery.

      We have also added time-lapse videos to the supplementary information due to space limitations in the main figure.

      Reviewer #3 (Significance (Required)):

      The use of a lipophilic dye to monitor lipid dynamics during IM expansion or dissolution is an elegant way to probe the mechanisms of lipid transfer across ER-IM contact sites. Quantitative in vivo data is critically needed to address this fundamental question in autophagy and contact site biology. However, the study remains limited in providing direct evidence that it is indeed the lipid transfer activity of Atg2, which underlies the R18 dynamics in IMs in vivo.

      We sincerely thank the reviewer for this encouraging and thoughtful comment. We appreciate the recognition that our live-cell approach using a lipophilic dye provides a valuable framework to visualize lipid dynamics during autophagosome biogenesis. As the reviewer pointed out, quantitative in vivo evidence is critically needed in this field, and we hope our study contributes meaningfully toward that goal.

      We also fully acknowledge the limitation. While our current data offer indirect evidence for Atg2-mediated lipid transfer, we would like to support this by our revision plan and also our parallel study (bioRxiv, 2025.05.24.655882v1) that shows Atg2 is indeed a bridge-like LTP and R18 transfer is lost in the bridge-structure defective strain. Together, we hope these can suggest that the lipid transfer activity of Atg2 underlies the observed R18 dynamics in vivo.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a priortiziation for generating behavior that supports hawkmoth safety rather than than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Weaknesses:

      The work would be further clarified and strengthened by additional explanation included in the main text, figure legends, and methods that would permit the reader to draw their own conclusions more feasibly. It would be helpful to have all figure panels referenced in the text and referenced in order, as they are currently not. In addition, it seems that sometimes the incorrect figure panel is referenced in the text, Figure S2 is mislabeled with D-E instead of A-C and Table S1 is not referenced in the main text at all. Table S1 is extremely important for understanding the figures in the main text and eliminating acronyms here would support reader comprehension, especially as there is no legend provided for Table S1. For example, a reader that does not specialize in vision may not know that OF stands for optic flow. Further detail in figure legends would also support the reader in drawing their own conclusions. For example, dashed red lines in Figures 3 and 4 A and B are not described and the letters representing statistical significance could be further explained either in the figure legend or materials to help the reader draw their own conclusions.

      We appreciate the suggestions to improve the clarity of the manuscript. We have extensively re-structured the entire manuscript. Among others, we have referenced all figure panels in the text in the order they appear. To do so, we combined the optic flow and contrast measurements of our setup with the methods description of the behavioural experiments (formerly Figs. 5 and 2, respectively). This new figure 2 now introduces the methods of the study, while the remainder of Fig. 2, which presented the experiments that investigated the vetrolateral and dorsal response in more detail, is now a separate figure (Fig. 3). This arrangement also balances the amount of information contained  in each figure better.

      Reviewer #2 (Public review):

      Summary:

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight.

      Strengths:

      The data are very interesting, unique, and compelling. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses:

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      We thank the reviewer for the feedback, and the suggestions for improvement of the manuscript (our implementations are detailed below). We fully agree that this study raises several intriguing questions regarding the dorsal visual response, including how the animals perceive and respond to rotational optic flow in their dorsal visual field, particularly since rotational optic flow may be processed separately from translational optic flow.

      In our free-flight setup, it was not possible to generate rotational optic flow in a controlled manner. To explore this aspect more systematically, a tethered-flight setup would be ideal, or alternatively, a free-flight setup integrated with virtual reality. This would be a compelling direction for a follow-up study.

      Reviewer #3 (Public review):

      The central goal of this paper as I understand it is to extract the "integration hierarchy" of stimulus in the dorsal and ventrolateral visual fields. The segregation of these responses is different from what is thought to occur in bees and flies and was established in the authors' prior work. Showing how the stimuli combine and are prioritized goes beyond the authors' prior conclusions that separated the response into two visual regions. The data presented do indeed support the hierarchy reported in Figure 5 and that is a nice summary of the authors' work. The moths respond to combinations of dorsal and lateral cues in a mixed way but also seem to strongly prioritize avoiding dorsal optic flow which the authors interpret as a closed and potentially dangerous ecological context for these animals. The authors use clever combinations of stimuli to put cues into conflict to reveal the response hierarchy.

      My most significant concern is that this hierarchy of stimulus responses might be limited to the specific parameters chosen in this study. Presumably, there are parameters of these stimuli that modulate the response (spatial frequency, different amounts of optic flow, contrast, color, etc). While I agree that the hierarchy in Figure 5 is consistent for the particular stimuli given, this may not extend to other parameter combinations of the same cues. For example, as the contrast of the dorsal stimuli is reduced, the inequality may shift. This does not preclude the authors' conclusions but it does mean that they may not generalize, even within this species. For example, other cue conflict studies have quantified the responses to ranges of the parameters (e.g. frequency) and shown that one cue might be prioritized or up-weighted in one frequency band but not in others. I could imagine ecological signatures of dorsal clutter and translational positioning cues could depend on the dynamic range of the optic flow, or even having spatial-temporal frequency-dependent integration independent of net optic flow.

      We absolutely agree that in principle, an observed integration hierarchy is only valid for the stimuli tested. Yet, we do believe that we provide good evidence that our key observations are robust also for related stimuli to the ones tested:

      Most importantly, we found that both pathways act in parallel (and are not mutually exclusive, or winner-takes-all, for example), when the animals can enact the locomotion induced by the dorsal and ventrolateral pathway. We tested this with the same dorsal cue (the line switching direction), but different behavioural paradigms (centring vs unilateral avoidance), and different ventrolateral stimuli (red gratings of one spatial frequency, and 100% nominal contrast black-and-white checkerboard stimuli which comprised a range of spatial frequencies) – and found the same integration strategy.

      Certainly, if the contrast of the visual cues was reduced to the point that the dorsal or ventrolateral responses became weaker, we would expect this to be visible in the combined responses, with the respective reduction in response strength for either pathway, to the same degree as they would be reduced when stimuli were shown independently in the dorsal and ventrolateral visual field.

      For testing whether the animals would show a weighting of responses when it was not possible to enact locomotion to both pathways, we felt it was important to use similar external stimuli to be able to compare the responses. So we can confidently interpret their responses in terms of integration. Indeed, how this is translated to responses in the two pathways depends a) on the spatiotemporal tuning, contrast sensitivity and exact receptive fields of the two systems, b) the geometry of the setup and stimulus coverage, and therefore the ability of the animals to enact responses to both pathways independently and c) on the integration weights.

      It would indeed be fascinating to obtain this tuning and the receptive fields, and having these, test a large array of combinations of stimuli and presentation geometries, so that one could extract integration weights for different presentation scenarios from the resulting flight responses in a future study.

      We also expanded the respective discussion section to reflect these points: l. 391-417. We also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The second part of this concern is that there seems to be a missed opportunity to quantify the integration, especially when the optic flow magnitude is already calculated. The discussion even highlights that an advantage of the conflict paradigm is that the weights of the integration hierarchy can be compared. But these weights, which I would interpret as stimulus-responses gains, are not reported. What is the ratio of moth response to optic flow in the different regions? When the moth balances responses in the dorsal and ventrolateral region, is it a simple weighted average of the two? When it prioritizes one over the other is the response gain unchanged? This plays into the first concern because such gain responses could strongly depend on the specific stimulus parameters rather than being constant.

      Indeed, we set up stimuli that are comparable, as they are all in the visual domain, and since we can calculate their external optic flow and contrast magnitudes, to control for imbalances in stimulus presentation, which is important for the interpretation of the resulting data.

      As we discussed above, we are confident that we are observing general principles of the integration of the two parallel pathways. However, we refrained from calculating integration weights, because these might be misleading for several reasons:

      (1) In situations where the animals can enact responses to both pathways, we show that they do so at the full original magnitudes. So there are no “weights” of the hierarchy in this case.

      (2) Only when responses to both systems are not possible in parallel, do we see a hierarchy. However, combined with point (1), this hierarchy likely depends on the geometry of the moths’ environment: it will be more pronounced the less both systems can be enacted in parallel.

      (3) The hierarchy also does not affect all features of the dorsal or ventrolateral pathway equally. The hawkmoths still regulate their perpendicular distance to ventral gratings with dorsal gratings present, to same degree as with only ventral grating - because perpendicular distance regulation is not a feature of the dorsal response. And while the hawkmoths show a significant reduction in their position adjustment to dorsal contrast when it is in conflict with lateral gratings (Fig. 4C), they show exactly the same amount of lateral movement and speed adjustment as for dorsal gratings alone, when not combined with lateral ones (Fig. 4D and Fig. S3A). So even for one particular setup geometry and stimulus combination, there clearly is not one integration weight for all features of the responses.

      We extended the discussion section to clarify these points “The benefit of our study system is that the same cues activate different control pathways in different regions of the visual field, so that the resulting behaviour can directly be interpreted in terms of integration weights” (l. 448-451)

      l. 391-417, we also updated the former Fig. 5, now Fig. 6 to reflect this discussion.

      The authors do explain the choice of specific stimuli in the context of their very nice natural scene analysis in Fig. 1 and there is an excellent discussion of the ecological context for the behaviors. However, I struggled to directly map the results from the natural scenes to the conclusions of the paper. How do they directly inform the methods and conclusions for the laboratory experiments? Most important is the discussion in the middle paragraph of page 12, which suggests a relationship with Figure 1B, but seems provocative but lacking a quantification with respect to the laboratory stimuli.

      We show that contrast cues and translational optic flow are not homogeneously distributed in the natural environments of hawkmoths. This directly related to our laboratory findings, when it comes to responses to these stimuli in different parts of their visual field. In order to interpret the results of these behavioural experiments with respect to the visual stimuli, we did perform measurements of translational optic flow and contrast cues in the laboratory setup. As a result, we make several predictions about the animals’ use of translational optic flow and contrast cues in natural settings:

      a) Hawkmoths in the lab responded strongest to ventral optic flow, even though it was not stronger in magnitude, given our measurements, than lateral optic flow. Thus, we propose that the stronger response to ventral optic flow might be an evolutionary adaptation to the natural distribution of translational optic flow cues.

      b) In the natural habitats of hawkmoths, dorsal coverage is much less frequent that ventrolateral structures generating translational optic flow, yet the hawkmoths responded with a much higher weight to the former. Moreover, in our flight tunnel experiments, the animals responded with the same or higher weights to dorsal cues, which had a lower magnitude of translational optic flow and contrast than the same cues in the ventrolateral visual field. So we showed, combining behavioural experiments and stimulus measurements in the lab that the weighting of dorsal and ventrolateral cues did not follow their stimulus magnitude in the lab. Moreover, comparing to the natural cue distributions, we suggest that the integration weights also did not evolve to match the prevalence of these cues in natural habitats.

      We integrated the measurements of natural visual scene statistics in the new Fig. 6, to relate the behavioural findings to the natural context also in the figure structure, and sequence logic of the text, as they are discussed here.

      The central conclusion of the first section of the results is that there are likely two different pathways mediating the dorsal and the ventrolateral response. This seems reasonable given the data, however, this was also the message that I got from the authors' prior paper (ref 11). There are certainly more comparisons being done here than in that paper and it is perfectly reasonable to reinforce the conclusion from that study but I think what is new about these results needs to be highlighted in this section and differentiated from prior results. Perhaps one way to help would be to be more explicit with the open hypotheses that remain from that prior paper.

      We appreciate the suggestion to highlight more clearly what the open questions that are addressed in this study are. As a result, we have entirely restructured the introduction, added sections to the discussion and fundamentally changed the graphical result summary in Fig. 6, to reflect the following new findings (and differences to the previous paper):

      The previous paper demonstrated that there are two different pathways in hummingbird hawkmoths that mediate visual flight guidance, and newly described one of them, the dorsal response. This established flight guidance in hummingbird hawkmoths as a model for the questions asked in the current study, which are very different in nature from the previous paper.  

      The main question addressed in the current study is how these two flight guidance pathways interact to generate consistent behaviour? Throughout the literature of parallel sensory and motor pathways guiding behaviour, there are different solutions – from winner-takes-all to equal mixed responses. We tested this fundamental question using the hummingbird hawkmoth flight guidance systems as a model.

      This is the main question addressed in the various conflict experiments in this study, and we show that indeed, the two systems operate in parallel. As long as the animals can enact both dorsal and optic-flow responses, they do so at the original strengths of the responses. Only when this is not possible, hierarchies become visible. We carefully measured the optic flow and contrast cues generated by the different stimuli to ensure that the hierarchies we observed were not generated by imbalances of the external stimuli.

      - Does the interaction hierarchy of the two pathways follow the statistics of natural environments?  We did show qualitatively previously how optic flow and contrast cues are distributed across the visual field in natural habitats of the hummingbird hawkmoth. In this study, we quantitatively analysed the natural image data, including a new analysis for the contrast edges, and statistically compared the results across conditions. This quantitative analysis supported the previous qualitative assessment that the prevalence of translational optic flow was highest in the ventral and lowest in the dorsal visual field in all natural habitat types. The distribution of contrast edges across the visual field did depend on habitat type much stronger than visible in the qualitative analysis in the previous paper. When compared to the magnitude of the behavioural responses, and considering that the hummingbird hawkmoth is predominantly found in open and semi-open habitats, the natural distributions of optic flow and contrast edges did not align with the response hierarchy observed in our laboratory experiments. Dorsal cues elicited much stronger responses relative to ventrolateral optic flow responses than would be expected.

      To provide a more complete picture of the dorsal pathway, which will be important to understand its nature, and also compare to other species, we conducted additional experiments that were specifically set up to test for response features known from the translational optic flow response. To compare and contrast the two systems. These experiments here allowed us to show that the dorsal response is not simply a translational optic flow reduction response that creates much stronger output than the ventrolateral optic flow response. We particularly show that the dorsal response was lacking the perpendicular distance regulation of the optic flow response, while it did provide alignment with prominent contrasts (possibly to reduce the perceived translational optic flow), which is not observed in the ventrolateral optic flow response. The strong avoidance of any dorsal contrast cues, not just those inducing translational optic flow, is another feature not found in the ventrolateral pathway.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Many comparisons between visual conditions are made and it was confusing at times to know which conditions the authors were comparing. Thinking of a way to label each condition with a letter or number so that the authors could specify which conditions are specifically being compared would greatly enhance comprehension and readability.

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      Consider adding in descriptive words to the y-axis labels for the position graphs that would help the reader quickly understand what a positive or negative value means with respect to the visual condition.

      We did now change the viewpoint on the example tracks in Figs. 2-5, to take a virtual viewpoint from the top, not as the camera recorded from below, which requires some mental rotation to reconcile the left and right sides. Moreover, we noticed that the example track axes were labelled in mm, while the axes for the plots showing median position in the tunnel were labelled in cm. We reconciled the units as well. This will make it easier to see the direct equivalent of the axis (as well as positive and negative values) in the example tracks in those figures, and the median positions, as well as the cross-index.

      There are no line numbers provided so it is a bit challenging to provide feedback on specific sentences but there are a handful of typos in the manuscript, a few examples:

      (1) Cue conflict section, first paragraph: "When both cues were presented to in combination, ..." (remove to)

      (2) The ecological relevance section, first paragraph, first sentence: "would is not to fly"

      (3) Figure S3 legend: explanation for C is labeled as B and B is not included with A

      We apologise for the missing line numbers. We added these and resolved the issues 1-3.

      Reviewer #2 (Recommendations for the authors):

      - The pictograms in Fig. 1a were at first glance not clear to me, maybe adding l, r, d, v to the first pictogram could make the figure more immediately accessible.

      We added these labels to make it more accessible.

      - I would suggest noting in the main text that the red patterns were chosen for technical reasons (see Methods), if this is correct.

      We added this information and a reference to the methods in the main text (lines 100-102).

      - "Thus, hawkmoths are currently the only insect species for which a partitioning of the visual field has been demonstrated in terms of optic-flow-based flight control [33-35]." I think that is a bit too strong and maybe it would be more interesting to connect the current data to connected data in other insects to perhaps discuss important similarities. Ref 32 for example shows that fruit flies weigh ventral translational optic flow considerably more than dorsal translational optic flow. Reichardt 1983 (Naturwissenschaften) showed that stripe fixation in large flies (a behaviour relying in part on the motion pathway) is confined to the ventral visual field, etc...

      We have changed this sentence to acknowledge partitioning in other insects, and motivating the use of our model species for this study: While fruit flies weight ventral translational optic flow stronger than dorsal optic flow, the most extreme partitioning of the visual field in terms of  optic-flow-based flight control has been observed in hawkmoths [33-35]. (lines 60-62)

      - I think the statistical differences group mean differences could be described in more detail at least in Fig. 2 (to me the description was not immediately clear, in particular with the double letters).

      We added an explanation of the letter nomenclature to all respective figure legends:

      Black letters show statistically significant differences in group means or median, depending on the normality of the test residuals (see Methods, confidence level: 5%). The red letters represent statistically significant differences in group variance from pairwise Brown–Forsythe tests (significance level 5%). Conditions with different letters were significantly different from each other. The white boxplots depict the median and 25% to 75% range, the whiskers represent the data exceeding the box by more than 1.5 interquartile ranges, and the violin plots indicate the distribution of the individual data points shown in black.

      - "When translational optic flow was presented laterally" I would use a more wordy description, since it is the hawkmoth that is controlling the optic flow and in addition to translational optic flow, there might also be rotational components, retinal expansion etc.

      We extended the description to explain that the moths were generating the optic flow percept based on stationary gratings in different orientations, by way of their flight through the tunnel. Lines 127-129

      - While it is clearly stated that the measure of the perpendicular distance from the ventral and dorsal pattern via the size of the insect as seen by the camera is indirect, I would suggest to determine the measurement uncertainty of distance estimate.

      - Connected to above - is the hawkmoth area averaged over the entire flight and is the variance across frames similar in all the stimuli conditions? Is it, in principle, conceivable that the hawkmoths' pitch (up or down) is different across conditions, e.g. with moths rising and falling more frequently in a certain condition, which could influence the area in addition to distance?

      There are a number of sources that generate variance in the distance estimate (which was based on the size of the moth in each video frame, after background subtraction): the size of the animal, the contrast with which the animal was filmed (which also depended on the type of pattern in the tunnel – it was lower with ventral or dorsal patterns as a background than with lateral ones), and the speed of the animal, as motion blur could impact the moth’s image on the video. The latter is hard to calibrate, but the uncertainty related to animal size and pattern types could theoretically be estimated. However, since we moved between finishing the data acquisition for this study and publishing the paper, the original setup has been dismantled. We could attempt to recreate it as faithfully as possible, but would be worried to introduce further noise. We therefore decided to not attempt to characterise the uncertainty, to not give a false impression of quantifiability of this measure. For the purpose of this study, it will have to remain a qualitative, rather than a quantitative measure. If we should use a similar measure again, we will make sure to quantify all sources of uncertainty that we have access to.

      The variance in area is different between conditions. Most likely, the animals vary their flight height different for different dorsal and ventral patterns, as they vary their lateral flight straightness with different lateral visual input. For the reasons mentioned above, we cannot disentangle the effects of variations in flight height and other sources of uncertainty relating to animal size in the video frames. We therefore averaged the extracted area across the entire flight, to obtain a coarse measure of their flight height. Future studies focusing specifically on the vertical component or filming in 3D will be required to determine the exact amount of vertical flight variation.

      - Results second paragraph, suggestion: pattern wavelength or spatial frequency instead of spatial resolution.

      - Same paragraph, suggestion: For an optimal wavelength/spatial frequency of XX

      We corrected these to spatial frequency.

      - Above Fig 3- "this strongly suggests a different visual pathway". In my opinion it would be better to say sensory-motor /visuomotor pathway or to more clearly define visual pathway? Could one in principle imagine a uniform set of local motion sensitive neurons across the entire visual field that connect differentially to descending/motor neurons.

      We appreciate this point and changed this, and further instances in the manuscript to visuomotor pathway.

      - If I understood correctly, you calculated the magnitude of optic flow in the different tunnel conditions based on the image of a fisheye camera moving centrally in the tunnel, equidistant from all walls. I did not understand why the magnitude of optic flow should differ between the four quadrants showing the same squarewave patterns. Apologies if I missed something, but maybe it is worth explaining this in more detail in the manuscript.

      We recognize that this point may not have been immediately clear and have therefore provided additional clarification in the Methods and results section (lines 106-111, 543-549). We anticipated differences in the magnitude of optic flow due to potential contrast variations arising from the way the stimuli were generated—being mounted on the inner surfaces of different tunnel walls while the light source was positioned above. On the dorsal wall, light from the overhead lamps passed through the red material. For laterally mounted patterns, the animals perceived mainly reflected light, as these tunnel walls were not transparent.

      A similar principle applied to the background, which consisted of a white diffuser allowing light to pass through dorsally, but white non-transmissive paper laterally, with a 5% contrast random checkerboard patterns. The ventral side presented a more complex scenario, as it needed to be partially transparent for the ventrally mounted camera. Consequently, the animals perceived a combination of light reflections from the red patterns and the white gauze covering the ventral tunnel side, against the much darker background of the surrounding room.

      To ensure that the observed flight responses were not artifacts of deviations in visual stimulation from an ideal homogeneous environment, we used the camera to quantify the magnitude of optic flow and contrast patterns under these real experimental conditions. This approach also allowed us to directly relate the optic flow measurements taken indoors to those recorded outdoors, as we employed the same camera and analytical procedures for both datasets.

      Reviewer #3 (Recommendations for the authors):

      In addition to the considerations above I had a few minor points:

      There are so many different directions of stimuli and response that it is quite challenging to parse the results. Can this be made a little easier for the reader?

      We appreciate this concern. To be able to refer to the individual stimulus conditions in the analysis and results description, we gave each stimulus a unique identifier (see table S1), and provided these identifiers in the respective figures and throughout the text. We hope that this makes the identification of the individual stimuli easier.

      One suggestion (only a suggestion): I found myself continuously rotating the violin plots in my head so that the lateral position axis lined up with the lateral position of the tunnel icons below. Consider if rotating the plots 90 degs would help interpretability. It was challenging to keep track of which side was side.

      We did discuss this with a number of test-readers, and tried multiple configurations. They all have advantages and drawbacks, but we decided that the current configuration for the majority of testers was the current one. To help the mental transformations from the example flight tracks in the figures, we now present the example flight tracks in Figs. 2-5 in the same reference frame as the figures showing median position (so positive and negative values on those axes correspond directly), and changed the view from a below the tunnel to an above the tunnel view, as this is the more typical depiction. We hope that this enhances readability.

      Are height measurements sensitive to the roll and pitch of the animal? I suspect this is likely small but worth acknowledging.

      They are indeed. These effects are likely small but contribute to the overall inaccuracy, which we could not quantify in this particular setup (see also response to reviewer 2 on that point), which is why the height measurements have to be considered a qualitative approximation rather than a quantification of flight height. We added text to acknowledge the effects of roll and pitch specifically (lines 657-658)

      The Brown-Forsythe test was reported as paired but this seems odd because the same moths were not used in each condition. Maybe the authors meant something different by "paired" than a paired statistical design?

      Indeed, the data was not paired in the sense that we could attribute individual datapoints to individual moths across conditions. We applied the Brown-Forsythe test in a pairwise manner, comparing the variance of each condition with another one in pairs each, to test if the variance in position differed across conditions. We did phrase this misleadingly, and have corrected it to „The variance in the median lateral position (in other words, the spread of the median flight position) was statistically compared between the groups using the pairwise Brown–Forsythe tests“ l. 187-188

      There is some concern about individual moth preferences and bias due to repeated measures. I appreciate that the individual moth's identity was not likely known in most cases, but can the authors provide an approximate breakdown of how many individual moths provided the N sample trajectories?

      This is a very valid concern, and indeed one we did investigate in a previous study with this setup. We confirmed that the majority of animals (70%, 68% and 53% out of 40 hawkmoths, measured on three consecutive days) crossed the tunnel within a randomly picked window of 3h (Stöckl et al. 2019). We now state this explicitly in the methods section (lines 594-597). Thus, for the sample sizes in our study, statistically, each moth would have contributed a small number of tracks compared to the overall number of tracks sampled.

      The statistics section of the methods said that both Tukey-Kramer (post-hoc corrected means) and Kruskal-Wallis (non-parametric medians) were done. It is sometimes not clear which test was done for which figure, and where the Kruskal-Wallis test was done there does not seem to be a corrected statistical significance threshold for the many multiple comparisons (Fig. 2). It is quite possible I am just missing the details and they need to be clarified. I think there also needs to be a correction for the Brown-Forsythe tests but I don't know this method well.

      We first performed an ANOVA, and if the test residuals were not normally distributed, we used a Kruskal-Wallis test instead. For the post-hoc tests of both we used Tukey-Kramer to correct for multiple comparisons. The figure legends did indeed miss this information. We added it to clarify our statistical analysis strategy and refer to the methods section for more details (i.e. l. 185-186). All statistical results, including the type of statistical test used, have been uploaded to the data repository as well.

      The connection to stimulus reliability in the discussion seems to conflate reliability with prevalence or magnitude.

      We have rephrased the respective discussion sections to clearly separate the prevalence and magnitude of stimuli, which was measured, from an implied or hypothesized reliability (lines 510-511).

      Line numbers would be helpful for future review.

      We apologize for missing the line numbers and have added them to the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      (1) At several places in the reply to reviewers and the manuscript, when discussing the new simulations conducted, the authors mention they break the 180 trials into a train/test split of 108/108 - is this value correct? If so, how? (pg 19 of updated manuscript)  

      Thank you for pointing this out; it was not clearly explained. We have now added the explanation to the Methods section: 

      “For each iteration, we randomly selected 108 responses from the full set of 180 for training, and then independently sampled another 108 from the same full set for testing. This ensured that the same orientation could appear in both sets, consistent with the structure of the original experiment.”

      (2) I appreciate the authors have added the variance explained of principal components to the axes of Fig. 3, though it took me a while to notice this, and this isn't described in the figure caption at all. It would likely help readers to directly explain what the % means on each axis of Fig. 3.

      Thank you, we have now added a description in both Fig. 2 and 3:

      “The axes represent the first two principal components, with labels indicating the percent of total explained variance.”

      (3) I believe there is a typo/missing word in the new paragraph on pg 15: "neural visual WM representations in the early visual cortices are [[biased]] towards distractors" (I think the bracketed word may be omitted as a typo)

      Thank you - fixed.

  5. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. https://en.wikipedia.org/w/index.php?title=Luddite&oldid=1189255462 (visited on 2023-12-10). [u3] Ted Chiang. Will A.I. Become the New McKinsey? The New Yorker, May 2023. URL:

      This article argues that AI is more beneficial for the bourgeoise and the corporate world, rather than the working class. It even makes a comparison to McKinsey in order to further its argument. I think this post makes a really good point as we can see a lot of entry level job being more competitive or downright replaced by AI so corporate can cut cost, making the rich even richer.

    1. Multivariate predictive models play a crucial role in enhancing our understanding of complex biological systems and in developing innovative, replicable tools for translational medical research. However, the complexity of machine learning methods and extensive data pre-processing and feature engineering pipelines can lead to overfitting and poor generalizability. An unbiased evaluation of predictive models necessitates external validation, which involves testing the finalized model on independent data. Despite its importance, external validation is often neglected in practice due to the associated costs. Here we propose that, for maximal credibility, model discovery and external validation should be separated by the public disclosure (e.g. pre-registration) of feature processing steps and model weights. Furthermore, we introduce a novel approach to optimize the trade-off between efforts spent on training and external validation in such studies. We show on data involving more than 3000 participants from four different datasets that, for any “sample size budget”, the proposed adaptive splitting approach can successfully identify the optimal time to stop model discovery so that predictive performance is maximized without risking a low powered, and thus inconclusive, external validation. The proposed design and splitting approach (implemented in the Python package “AdaptiveSplit”) may contribute to addressing issues of replicability, effect size inflation and generalizability in predictive modeling studies.

      A version of this preprint has been published in the Open Access journal GigaScience (see paper (https://doi.org/10.1093/gigascience/giaf036), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

      Original version

      Reviewer 1: Qingyu Zhao

      The manuscript discusses an interesting approach that seeks optimal data split for the pre-registration framework. The approach adaptively optimizes the balance between predictive performance of discovery set and sample size of external validation set. The approach is showcased on 4 applications, demonstrating advantage over traditional fixed data split (e.g., 80/20). I generally enjoyed reading the manuscript. I believe pre-registration is one important tool for reproducible ML analysis and the ideology behind the proposed framework (investigating the balance between discovery power and validation power) is urgently needed. My main concerns are all around Fig. 3, which represents the core quantitative analysis but lacks many details.

      1. Fig. 3 is mostly about external validation. What about training? For each n_total, which stopping rule is activated? What is the training accuracy? What does l_act look like? What is \hat{s_total}?
      2. Results section states "the proposed adaptive splitting strategy always provided equally good or better predictive performance than the fixed splitting strategies (as shown by the 95% confidence intervals on Figure 3)". I'm confused by this because the blue curve is often below other methods in accuracy (e.g., comparing with 90/10 split in ABIDE and HCP).
      3. Why does the half split have the lowest accuracy but the highest statistical power?
      4. How was the range of x-axis (n_total) selected? E.g., HCP has 1000 subjects, why was 240-380 chosen for analysis?
      5. The lowest n_total for BCW and IXI is approximately 50. If n_act starts from 10% of n_total, how is it possible to train (nested) cross-validation on 5 samples or so?

      Two other general comments are: 1. How can this be applied to retrospective data or secondary data analysis where the collection is finished? 2. Is there a guidance on the minimum sample size that is required to perform such an auto-split analysis? It is surprising that the authors think the two studies with n=35 and n=38 are good examples of training generalizable ML models. It is generally hard to believe any ML analysis can be done on such low sample sizes with thousands of rs-fMRI features. By the way, I believe n=25 in Kincses 2024 if I read it correctly.

      Reviewer 2: Lisa Crossman

      External validation of machine learning models - registered models and adaptive sample splitting Gallito et al. The Manuscript describes a methodology and algorithm aimed at better choosing a train-test validation split of data for scikit-learn models. A python package, adaptivesplit, was built as part of this MS as a tool for others to use. The package is proposed to be used together with a suggested workflow to integrate an approach invoking registered models as a full design for better prospective modelling studies. Finally, the work is evaluated on four alternative publicly available datasets of health research data and comprehensive results are presented. There is a trade-off in the split between the amount of sample data to be used for training and the amount of data to use for validation. Ideally the content of each must be balanced in order for the trained model to be representative and equally for the validation set to be representative. This manuscript is therefore very timely due to the large increase in the use of AI models and provides important information and methodology.

      This reviewer does not have the specific expertise to provide detailed comments on the statistical rule methods.

      Main Suggested Revision: 1. The Python implementation of the "adaptivesplit" package is described as available on GitHub (Gallitto et al., n.d.). One of the major points of the paper is to provide the python package "adaptivesplit", however, this package does not have a clear hyperlink, and is not found by simple google searches, and it appears is not yet available. It is therefore not possible to evaluate it at present. There is a website found available with a preprint of this MS after further google searches, https://pnilab.github.io/adaptivesplit/ however, adaptive split is here shown as an interactivate jupyter-type notebook example and not as a python library code. Therefore, it is not clear how available the package is for others' use. Can the authors comment on the code availability?

      Minor comments: 1. Apart from the 80:20 Pareto split of train-test data, other splits are commonly used in ratios such as 75:25 (the scikit-learn default split if ratio is unspecified), and 70:30. Also the cross-validation strategy with train-test-validation split 60:20:20, yet these strategies have not been mentioned or included in the figures such as Fig 3. The splits provided in the figure and discussed are 50:50, 80:20 and 90:10 only. Could the authors discuss alternative split ratios?

    1. I think that the students’ voice is not always heard entirely, even through dialogue. I feel that by doing this journal we can make a difference with our personal experience and touch the heart of someone who is willing to stand by us. I also wanted to get the attention of other students who may be feel-ing the same frustration I have felt

      Rashida’s words remind me that being asked to speak is not the same as being truly heard. Even when dialogue happens, students’ insights can be filtered or dismissed by adults who hold more power. Her hope that personal experience can move someone to take action reveals a quiet kind of strength. It’s thoughtful and brave—she’s using her voice not just to describe injustice, but to change who listens and how they respond

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells.

      Strengths:

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent.

      Weaknesses:

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish?

      We have examined functional deficits in kif1aa mutants in another paper that was recently accepted: David et al. 2024. https://pubmed.ncbi.nlm.nih.gov/39373584/

      In David et al., we found that in addition to a subtle role in ribbon fusion during development, Kif1aa plays a major role in enriching glutamate-filled synaptic vesicles at the presynaptic active zone of mature hair cells. In kif1aa mutants, synaptic vesicles are no longer enriched at the hair cell base, and there is a reduction in the number of synaptic vesicles associated with presynaptic ribbons. Further, we demonstrated that kif1aa mutants also have functional defects including reductions in spontaneous vesicle release (from hair cells) and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow. Because our current paper focuses on microtubule-associated ribbon movement and dynamics early in hair-cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window. In our revision, we have referenced this recent work. Currently it is challenging to disentangle how the subtle defects in ribbon formation in kif1aa mutants contribute to the defects we observe in ribbon-synapse function.

      Added to results:

      “Recent work in our lab using this mutant has shown that Kif1aa is responsible for enriching glutamate-filled vesicles at the base of hair cells. In addition this work demonstrated that loss of Kif1aa results in functional defects in mature hair cells including a reduction in evoked post-synaptic calcium responses (David et al., 2024). We hypothesized that Kif1aa may also be playing an earlier role in ribbon formation.”

      Impact:

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists.

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans).

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion?

      These are important strengths and as stated we are currently investigating what other kinesins and adaptors and adaptor’s transport ribbons. We have ongoing work examining how hair-cell activity impacts ribbon fusion and transport!

      Weaknesses:

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point. Previous immunohistochemistry work in mice demonstrated that ribbons and Kif1a colocalize in mouse hair cells (Michanski et al, 2019). Unfortunately, the antibody used in study work did not work in zebrafish. To further investigate this interaction, we also attempted to create a transgenic line expressing a fluorescently tagged Kif1aa to directly visualize its association with ribbons in vivo. At present, we were unable to detect transient expression of Kif1aa-GFP or establish a transgenic line using this approach. While we will continue to work towards understanding whether Kif1aa and ribbons colocalize in live hair cells, currently this goal is beyond the scope of this paper. In our revision we discuss this caveat.

      Added to discussion:

      “In addition, it will be useful to visualize these kinesins by fluorescently tagging them in live hair cells to observe whether they associate with ribbons.”

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 public response weaknesses.

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary.

      This is correct and a caveat of our Kif1aa and drug experiments. In our recently published work, we confirmed that Kif1aa is expressed in hair cells and neurons, while kif1ab is present just is neurons. Therefore, it is likely that the ribbon formation defects in kif1aa mutants are restricted to hair cells. We added this expression information to our results:

      “ScRNA-seq in zebrafish has demonstrated widespread co-expression of kif1ab and kif1aa mRNA in the nervous system. Additionally, both scRNA-seq and fluorescent in situ hybridization have revealed that pLL hair cells exclusively express kif1aa mRNA (David et al., 2024; Lush et al., 2019; Sur et al., 2023).”

      Non-hair cell effects are a real concern in our pharmacology experiments. To mitigate this in our pharmacological experiments, we have performed drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The fast experiments were done after 30 min nocodazole drug treatment, and after this treatment we observed reduced directional motion and fusions. This fast drug treatment should not incur any long-term changes or developmental defects as hair-cell development occurs over 12-16 hrs. However, we acknowledge that drug treatments could have secondary phenotypic effects or effects that are not hair-cell specific. In our revision, we discuss these issues.

      Added to discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30-70 min and 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel.

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.

      (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility.

      We agree that overexpression of transgenes under using a non-endogenous promoter in transgenic lines is an important consideration. Ideally, we would do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. The decrease in precursors is likely not due to regulation by the myo6a promoter. Although the myo6a promoter comes on early in hair cell development, the promoter only gets stronger as the hair cells mature. This would lead to a continued increase rather than a decrease in puncta numbers with development.

      Protein tags such as tagRFP always have the caveat of impacting protein function. This is in partly why we complemented our live imaging with analyses in fixed tissue without transgenes (kif1aa mutants and nocodazole/taxol treatments).

      In our revision, we did perform an immunolabel on myo6b:riba-tagRFP transgenic fish and found that Riba-tagRFP expression did not impact ribbon synapse numbers or ribbon size. This analysis argues that the transgene is expressed at a level that does not impact ribbon synapses. This data is summarized in Figure 1-S1.

      Added to the results:

      “Although this latter transgene expresses Riba-TagRFP under a non-endogenous promoter, neither the tag nor the promoter ultimately impacts cell numbers, synapse counts, or ribbon size (Figure 1-S1A-E).”

      Added to methods:

      Tg(myo6b:ctbp2a-TagRFP)<sup>idc11Tg</sup> reliably labels mature ribbons, similar to a pan-CTBP immunolabel at 5 dpf (Figure 1-S1B). This transgenic line does not alter the number of hair cells or complete synapses per hair cell (Figure 1-S1A-D). In addition, myo6b:ctbp2a-TagRFP does not alter the size of ribbons (Figure 1-S1E).”

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.

      We did attempt a co-localization analysis between microtubules and ribbons but did not move forward with it due to several issues:

      (1) Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and ultimately, we found that co-localization analyses were not meaningful because the distances were too small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in hair cells due to highly varying filament intensities, filament dynamics and the presence of diffuse cytoplasmic tubulin signal.

      Because of these challenges we concluded the best evidence of ribbon-microtubule association is through visualization of ribbons and their association with microtubules over time (in our timelapses). We see that ribbons localize to microtubules in all our timelapses, including the examples shown (Movies S2-S10). The only instance of ribbon dissociation it when ribbons switch from one filament to another. We did not observe free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1).

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting).

      Yes, it is true, that directed transport of ribbon precursors is relatively rare. Only a small subset of the ribbon precursors moves directionally (α > 1, 20 %) or have a displacement distance > 1 µm (36 %) during the time windows we are imaging. The majority of the ribbons are stationary. To emphasize this result we have added bar graphs to Figure 3I,K to illustrate this result and state the numbers behind this result more clearly.

      “Upon quantification, 20.2 % of ribbon tracks show α > 1, indicative of directional motion, but the majority of ribbon tracks (79.8 %) show α < 1, indicating confinement on microtubules (Figure 3I, n = 10 neuromasts, 40 hair cells, and 203 tracks).

      To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells, and 203 tracks).”

      We cannot say for certain what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion sufficient to reach the active zone. This idea is supported by the fact that we see ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses (Movies S4 and S5). It is possible that ribbons that are stationary may not have enough motors attached, or there may be a ‘seeding’ phase where Ribeye aggregates are condensing on the ribbon.

      We also reexamined our MSD a values as the a values we observed in hair cells were lower than those seen canonical motor-driven transport (where a approaches 2). One reason for this difference may arise from the dynamic microtubule network in developing hair cells, which could affect directional ribbon movement. In our revision we plotted the distribution of a values which confirmed that in control hair cells, the majority of the a values we see are typically less than 2 (Figure 7-S1A). Interestingly we also compared the distribution a values between control and taxol-treated hair cells, where the microtubule network is more stable, and found that the distribution shifted towards higher a values (Figure 7-S1A). We also plotted only ‘directional’ tracks (with a > 1) and observed significantly higher a values in taxol-treated hair cells (Figure 7-S1B). This is an interesting result which indicates that although the proportion of directional tracks (with a > 1) is not significantly different between control and taxol-treated hair cells (which could be limited by the number of motor/adapter proteins), the ribbons that move directionally do so with greater velocities when the microtubules are more stable. This supports our idea that the stability of the microtubule network could be why ribbon movement does not resemble canonical motor transport. This analysis is presented as a new figure (Figure 7-S1A-B) and is referred to in the text in the results and the discussion.

      Results:

      “Interestingly, when we examined the distribution of α values, we observed that taxol treatment shifted the overall distribution towards higher α a values (Figure 7-S1A). In addition, when we plotted only tracks with directional motion (α > 1), we found significantly higher α values in hair cells treated with taxol compared to controls (Figure 7-S1B). This indicates that in taxol-treated hair cells, where the microtubule network is stabilized, ribbons with directional motion have higher velocities.”

      Discussion:

      “Our findings indicate that ribbons and precursors show directed motion indicative of motor-mediated transport (Figure 3 and 7). While a subset of ribbons moves directionally with α values > 1, canonical motor-driven transport in other systems, such as axonal transport, can achieve even higher α values approaching 2 (Bellotti et al., 2021; Corradi et al., 2020). We suggest that relatively lower α values arise from the highly dynamic nature of microtubules in hair cells. In axons, microtubules form stable, linear tracks that allow kinesins to transport cargo with high velocity. In contrast, the microtubule network in hair cells is highly dynamic, particularly near the cell base. Within a single time frame (50-100 s), we observe continuous movement and branching of these networks. This dynamic behavior adds complexity to ribbon motion, leading to frequent stalling, filament switching, and reversals in direction. As a result, ribbon transport appears less directional than the movement of traditional motor cargoes along stable axonal filaments, resulting in lower α values compared to canonical motor-mediated transport. Notably, treatment with taxol, which stabilizes microtubules, increased α values to levels closer to those observed in canonical motor-driven transport (Figure 7-S1). This finding supports the idea that the relatively lower α values in hair cells are a consequence of a more dynamic microtubule network. Overall, this dynamic network gives rise to a slower, non-canonical mode of transport.”

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete.

      When using nocodazole, we worked to optimize the concentration of the drug to minimize cytotoxicity, while still being effective. While the more stable filaments at the cell apex remain largely intact after nocodazole treatment, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We have clarified this in our results. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells expressing YFP-tubulin (Figure 4-S1F-G), that highlight cytoplasmic YFP-tubulin and long, stabilized microtubules after 3-4 hr treatment with nocodazole and taxol respectively. In these images we also point out microtubules at the apical region of hair cells that are very stable and do not completely destabilize with nocodazole treatment at concentrations that are tolerable to hair cells.

      “We verified the effectiveness of our in vivo pharmacological treatments using either 500 nM nocodazole or 25 µM taxol by imaging microtubule dynamics in pLL hair cells (myo6b:YFP-tubulin). After a 30-min pharmacological treatment, we used Airyscan confocal microscopy to acquire timelapses of YFP-tubulin (3 µm z-stacks, every 50-100 s for 30-70 min, Movie S8). Compared to controls, 500 nM nocodazole destabilized microtubules (presence of depolymerized YFP-tubulin in the cytosol, see arrows in Figure 4-S1F-G) and 25 µM taxol dramatically stabilized microtubules (indicated by long, rigid microtubules, see arrowheads in Figure 4-S1F,H) in pLL hair cells. We did still observe a subset of apical microtubules after nocodazole treatment, indicating that this population is particularly stable (see asterisks in Figure 4-S1F-H).”

      To further address concerns about verifying the efficacy of nocodazole and taxol treatment on microtubules, we added a quantification of our immunostaining data comparing the mean acetylated-a-tubulin intensities between control, nocodazole and taxol-treated hair cells. Our results show that nocodazole treatment reduces the mean acetylated-a-tubulin intensity in hair cells. This is included as a new figure (Figure 4-S1D-E) and this result is referred to in the text. To better illustrate the effect of nocodazole and taxol we have also added additional side-view images of hair cells after overnight treatment with nocodazole and taxol (Figure 4-S1A-C).

      “After a 16-hr treatment with 250 nM nocodazole we observed a decrease in acetylated-a-tubulin label (qualitative examples: Figure 4A,C, Figure 4-S1A-B). Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D). Less acetylated-a-tubulin label indicates that our nocodazole treatment successfully destabilized microtubules.”

      “Qualitatively more acetylated-a-tubulin label was observed after treatment, indicating that our taxol treatment successfully stabilized microtubules (qualitative examples: Figure 4-S1A,C). Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1E).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript is fairly dense. For instance, some information is repeated (page 3 ribbon synapses form along a condensed timeline in zebrafish hair cells: 12-18 hrs, and on .page 5. These hair cells form 3-4 ribbon synapses in just 12-18 hrs). Perhaps, the authors could condense some of the ideas? The introduction could be shortened.

      We have eliminated this repeated text in our revision. We have shortened the introduction 1275 to 1038 words (with references)

      (2) The mechanosensory structure on page 5 is not defined for readers outside the field.

      Great point, we have added addition information to define this structure in the results:

      “We staged hair cells based on the development of the apical, mechanosensory hair bundle. The hair bundle is composed of actin-based stereocilia and a tubulin-based kinocilium. We used the height of the kinocilium (see schematic in Figure 1B), the tallest part of the hair bundle, to estimate the developmental stage of hair cells as described previously…”

      (3) Figure 1E is quite interesting but I'd rather show Figure S1 B/C as they provide statistics. In addition, the authors define 4 stages : early, intermediate, late, and mature for counting but provide only 3 panels for representative examples by mixing late/mature.

      We were torn about which ribbon quantification graph to show. Ultimately, we decided to keep the summary data in Figure 1E. This is primarily because the supplementary Figure will be adjacent to the main Figure in the Elife format, and the statistics will be easy to find and view.

      Figure 1 now provides a representative image for both late and mature hair cells.

      (4.) The ribbon that jumps from one microtubule to another one is eye-catching. Can the authors provide any statistics on this (e.g. percentage)?

      Good point. In our revision, we have added quantification for these events. We observe 2.8 switching events per neuromast during our fast timelapses. This information is now in the text and is also shown in a graph in Figure 3-S1D.

      “Third, we often observed that precursors switched association between neighboring microtubules (2.8 switching events per neuromast, n= 10 neuromasts; Figure 3-S1C-D, Movie S7).”

      (5) With regard to acetyl-a-tub immunocytochemistry, I would suggest obtaining a profile of the fluorescence intensity on a horizontal plane (at the apical part and at the base).

      (6) Same issue with microtubule destruction by nocodazole. Can the authors provide fluorescence intensity measurements to convince readers of microtubule disruption for long and short-term application.

      Regarding quantification of microtubule disruption using nocodazole and taxol. We did attempt to create profiles of the acetylated tubulin or YFP-tubulin label along horizontal planes at the apex and base, but the amount variability among cells and the angle of the cell in the images made this type of display and quantification challenging. In our revision we as stated above in our response to Reviewer #1’s public comment, we have added representative side-view images to show the disruptions to microtubules more clearly after short and long-term drug experiments (Figure 4-S1A-C, F-H). In addition, we quantified the reduction in acetylated tubulin label after overnight treatment with nocodazole and found the signal was significantly reduced (Figure 3-S1D-E). Unfortunately, we were unable to do a similar quantification due to the variability in YFP-tubulin intensity due to variations in mounting. The following text has been added to the results:

      “Quantification revealed significantly less mean acetylated-a-tubulin label in hair cells after nocodazole treatment (Figure 4-S1D).”

      “Quantification revealed an overall increase in mean acetylated-a-tubulin label in hair cells after taxol treatment, but this increase did not reach significance (Figure 4-S1A,C,E).”

      (7) It is a bit difficult to understand that the long-term (overnight) microtubule destabilization leads to a reduction in the number of synapses (Figure 4F) whereas short-term (30 min) microtubule destabilization leads to the opposite phenotype with an increased number of ribbons (Figure 6G). Are these ribbons still synaptic in short-term experiments? What is the size of the ribbons in the short-term experiments? Alternatively, could the reduction in synapse number upon long-term application of nocodazole be a side-effect of the toxicity within the hair cell?

      Agreed-this is a bit confusing. In our revision, we have changed our analyses, so the comparisons are more similar between the short- and long-term experiments–we examined the number of ribbons and precursor per cells (apical and basal) in both experiments (Changed the panel in Figure 4G, Figure 4-S2G and Figure 5G). In our live experiments we cannot be sure that ribbons are synaptic as we do not have a postsynaptic co-label. Also, we are unable to reliably quantify ribbon and precursor size in our live images due to variability in mounting. We have changed the text to clarify as follows:

      Results:

      “In each developing cell, we quantified the total number of Riba-TagRFP puncta (apical and basal) before and after each treatment. In our control samples we observed on average no change in the number of Riba-TagRFP puncta per cell (Figure 6G). Interestingly, we observed that nocodazole treatment led to a significant increase in the total number of Riba-TagRFP puncta after 3-4 hrs (Figure 6G). This result is similar to our overnight nocodazole experiments in fixed samples, where we also observed an increase in the number of ribbons and precursors per hair cell. In contrast to our 3-4 hr nocodazole treatment, similar to controls, taxol treatment did not alter the total number of Riba-TagRFP puncta over 3-4 hrs (Figure 6G). Overall, our overnight and 3-4 hr pharmacology experiments demonstrate that microtubule destabilization has a more significant impact on ribbon numbers compared to microtubule stabilization.”

      Discussion:

      “Ribbons and microtubules may interact during development to promote fusion, to form larger ribbons. Disrupting microtubules could interfere with this process, preventing ribbon maturation. Consistent with this, short-term (3-4 hr) and long-term (overnight) nocodazole increased ribbon and precursor numbers (Figure 6AG; Figure 4G), suggesting reduced fusion. Long-term treatment (overnight) resulted in a shift toward smaller ribbons (Figure 4H-I), and ultimately fewer complete synapses (Figure 4F).”

      Nocodazole toxicity: in response to Reviewer # 2’s public comment we have added the following text in our discussion:

      Discussion:

      “Another important consideration is the potential off-target effects of nocodazole. Even at non-cytotoxic doses, nocodazole toxicity may impact ribbons and synapses independently of its effects on microtubules. While this is less of a concern in the short- and medium-term experiments (30 min to 4 hr), long-term treatments (16 hrs) could introduce confounding effects. Additionally, nocodazole treatment is not hair cell-specific and could disrupt microtubule organization within afferent terminals as well. Thus, the reduction in ribbon-synapse formation following prolonged nocodazole treatment may result from microtubule disruption in hair cells, afferent terminals, or a combination of the two.”

      (8) Does ribbon motion depend on size or location?

      It is challenging to reliability quantify the actual area of precursors in our live samples, as there is variability in mounting and precursors are quite small. But we did examine the location of ribbon precursors (using tracks > 1 µm as these tracks can easily be linked to cell location in Imaris) with motion in the cell. We found evidence of ribbons with tracks > 1 µm throughout the cell, both above and below the nucleus. This is now plotted in Figure 3M. We have also added the following test to the results:

      “In addition, we examined the location of precursors within the cell that exhibited displacements > 1 µm. We found that 38.9 % of these tracks were located above the nucleus, while 61.1 % were located below the nucleus (Figure 3M).”

      Although this is not an area or size measurement, this result suggests that both smaller precursors that are more apical, and larger precursors/ribbons that are more basal all show motion.

      (9) The fusion event needs to be analyzed in further detail: when one ribbon precursor fuses with another one, is there an increase in size or intensity (this should follow the law of mass conservation)? This is important to support the abstract sentence "ribbon precursors can fuse together on microtubules to form larger ribbons".

      As mentioned above it is challenging accurately estimate the absolute size or intensity of ribbon precursors in our live preparation. But we did examine whether there is a relative increase in area after ribbon fuse. We have plotted the change in area (within the same samples) for the two fusion events in shown in Figure 8-S1A-B. In these examples, the area of the puncta after fusion is larger than either of the two precursors that fuse. Although the areas are not additive, these plots do provide some evidence that fusion does act to form larger ribbons. To accompany these plots, we have added the following text to the results:

      “Although we could not accurately measure the areas of precursors before and after fusion, we observed that the relative area resulting from the fusion of two smaller precursors was greater than that of either precursor alone. This increase in area suggests that precursor fusion may serve as a mechanism for generating larger ribbons (see examples: Figure 8-S1A-B).”

      Because we were unable to provide more accurate evidence of precursor fusion resulting in larger ribbons, we have removed this statement from our abstract and lessened our claims elsewhere in the manuscript.

      (10) The title in Figure 8 is a bit confusing. If fusion events reflect ribbon precursors fusion, it is obvious it depends on ribbon precursors. I'd like to replace this title with something like "microtubules and kif1aa are required for fusion events"

      We have changed the figure title as suggested, good idea.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1C. The purple/magenta colors are hard to distinguish.

      We have made the magenta color much lighter in the Figure 1C to make it easier to distinguish purple and magenta.

      (2) There are places where some words are unnecessarily hyphenated. Examples: live-imaging and hair-cell in the abstract, time-course in the results.

      In our revision, we have done our best to remove unnecessary hyphens, including the ones pointed out here.

      (3) Figure 4H and elsewhere - what is "area of Ribeye puncta?" Related, I think, in the Discussion the authors refer to "ribbon volume" on line 484. But they never measured ribbon volume so this needs to be clarified.

      We have done best to clarify what is meant by area of Ribeye puncta in the results and the methods:

      Results:

      “We also observed that the average of individual Ribeyeb puncta (from 2D max-projected images) was significantly reduced compared to controls (Figure 4H). Further, the relative frequency of individual Ribeyeb puncta with smaller areas was higher in nocodazole treated hair cells compared to controls (Figure 4I).”

      Methods:

      “To quantify the area of each ribbon and precursor, images were processed in a FIJI ‘IJMacro_AIRYSCAN_simple3dSeg_ribbons only.ijm’ as previously described (Wong et al., 2019). Here each Airyscan z-stack was max-projected. A threshold was applied to each image, followed by segmentation to delineate individual Ribeyeb/CTBP puncta. The watershed function was used to separate adjacent puncta. A list of 2D objects of individual ROIs (minimum size filter of 0.002 μm2) was created to measure the 2D areas of each Ribeyeb/CTBP puncta.”

      We did refer to ribbon volume once in the discussion, but volume is not reflected in our analyses, so we have removed this mention of volume.

      (4) More validation data showing gene/protein removal for the crispants would be helpful.

      Great suggestion. As this is a relatively new method, we have created a figure that outlines how we genotype each individual crispant animal analyzed in our study Figure 6-S1. In the methods we have also added the following information:

      “fPCR fragments were run on a genetic analyzer (Applied Biosystems, 3500XL) using LIZ500 (Applied Biosystems, 4322682) as a dye standard. Analysis of this fPCR revealed an average peak height of 4740 a.u. in wild type, and an average peak height of 126 a.u. in kif1aa F0 crispants (Figure 6-S1). Any kif1aa F0 crispant without robust genomic cutting or a peak height > 500 a.u. was not included in our analyses.”

      Reviewer #3 (Recommendations For The Authors):

      Lines 208-209--should refer to the movie in the text.

      Movie S1 is now referenced here.

      It would be helpful if the authors could analyze and quantify the effect of nocodozole and taxol on microtubules (movie 7).

      See responses above to Reviewer #1’s similar request.

      Figure 7 caption says "500 mM" nocodozole.

      Thank you, we have changed the caption to 500 nM.

      One problem with the MSD analysis is that it is dependent upon fits of individual tracks that lead to inaccuracies in assigning diffusive, restricted, and directed motion. The authors might be able to get around these problems by looking at the ensemble averages of all the tracks and seeing how they change with the various treatments. Even if the effect is on a subset of ribeye spots, it would be reassuring to see significant effects that did not rely upon fitting.

      We are hesitant to average the MSD tracks as not all tracks have the same number of time steps (ribbon moving in and out of the z-stack during the timelapse). This makes it challenging for us to look at the ensembles of all averages accurately, especially for the duration of the timelapse. This is the main reason why added another analysis, displacements > 1µm as another readout of directional motion, a measure that does not rely upon fitting.

      The abstract states that directed movement is toward the synapse. The only real evidence for this is a statement in the results: "Of the tracks that showed directional motion, while the majority move to the cell base, we found that 21.2 % of ribbon tracks moved apically." A clearer demonstration of this would be to do the analysis of Figure 2G for the ribeye aggregates.

      If was not possible to do the same analysis to ribbon tracks that we did for the EB3-GFP analysis in Figure 2. In Figure 2 we did a 2D tracking analysis and measured the relative angles in 2D. In contrast, the ribbon tracking was done in 3D in Imaris not possible to get angles in the same way. Further the MSD analysis was outside of Imaris, making it extremely difficult to link ribbon trajectories to the 3D cellular landscape in Imaris. Instead, we examined the direction of the 3D vectors in Imaris with tracks > 1µm and determined the direction of the motion (apical, basal or undetermined). For clarity, this data is now included as a bar graph in Figure 3L. In our results, we have clarified the results of this analysis:

      “To provide a more comprehensive analysis of precursor movement, we also examined displacement distance (Figure 3J). Here, as an additional measure of directed motion, we calculated the percent of tracks with a cumulative displacement > 1 µm. We found 35.6 % of tracks had a displacement > 1 µm (Figure 3K; n = 10 neuromasts, 40 hair cells and 203 tracks). Of the tracks with displacement > 1 µm, the majority of ribbon tracks (45.8 %) moved to the cell base, but we also found a subset of ribbon tracks (20.8 %) that moved apically (33.4 % moved in an undetermined direction) (Figure 3L).”

      Some more detail about the F0 crispants should be provided. In particular, what degree of cutting was observed and what was the criteria for robust cutting?

      See our response to Reviewer 2 and the newly created Figure 6-S1.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      *We would like to thank all the reviewers for their positive comments and valuable feedback. In addition, we would like to address reviewer 1 query on novelty, which was not questioned by the other 2 reviewers. Our study uncovered two main aspects of hypoxia biology: first we addressed the role of NF-kappaB contribution towards the transcriptome changes in hypoxia, and second, this revealed a previously unknown aspect, that NF-kappaB is required for gene repression in hypoxia. While we know a lot about gene induction in hypoxia, much less is known about repression of genes. In times of energy preservation, gene repression is as important as gene induction. *

      .

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The work from Shakir et al uses different cell line models to investigate the role of NF-kB in the transcriptional adaptation of cells to hypoxia, which is relevant. In addition, the manuscript contains a large amount of data that could be of interest and even useful for researchers in the field of hypoxia and NF-kB. However, in my opinion, there are several concerns that should be revised and additional experiments that could be included to strengthen the relevance of the work.

      We thank this reviewer for their positive comments.

      Specific issues: In Figure 1A, the authors examine which of the genes induced by hypoxia require NF-kB by RNA sequencing analysis of cells knocked down for specific NF-kB subunits and exposed to hypoxia for 24 hours. The knockdown is about 40-60% at the RNA level, but it would be helpful to show the effect of knockdown at the protein level.

      We agree with this and have added Western blot data (Sup. Figure S1F), which shows the effects of the siRNA are much more pronounced at the protein level.

      All the data regarding genes induced by hypoxia in control or NF-kB siRNA-treated cells are somewhat confusing. If I understand correctly, when the data from the three different siRNAs are crossed, only 1070 genes are upregulated and 295 are downregulated in an NF-kB-independent manner. If this is the case, I think it would be easier to use this information in Figure 2 to define the hypoxia-induced genes that are NF-kB-dependent by simply considering those induced in the control that are not in the NF-kB-independent subset (rather than repeating the integration of the data without additional explanation). If the authors do this simple analysis, are the resulting genes the same or similar? In any case, the way these numbers are obtained should be shown more clearly (i.e., a new Venn diagram showing genes up- or down-regulated in the siRNA control that are not up- or down-regulated in any of the siRNA-NF-kB treatments).

      Figure 1 shows the effects on gene expression of hypoxia in control and NF-____k____B ____subunit____-depleted cells compared to normoxia control cells. Figures 1F/1G compares genes up/downregulated in hypoxia when RelA, RelB, and cRel are depleted, compared to normoxia control. Figure 1 does not display N____F-____k____B____-dependent/independent hypoxia-responsive genes____, but rather the overall effect of siRNA control and siNF-____k____B treatments in hypoxia, compared to siRNA control in normoxia. Figure 2 then defines NF-____k____B-dependent ____and independent hypoxia-responsive genes. We actually define these exactly as the reviewer suggested and agree that we should show the way these numbers are obtained more clearly. We have added the suggested Venn diagrams (Sup. Figure S2) and added extra information to the methods section (page 5 of revised manuscript). We felt it was important to show all the data upfront in Figure 1 and then integrate and focus on NF-____k____B-dependent ____hypoxia-induced genes in Figure 2.

      Figure 2H shows that approximately 80% of the NF-kB-dependent genes up- or down-regulated in hypoxia were identified as RelA targets, which is statistically significant compared to RelB or cRel targets. However, what is the proportion of genes identified as RelA targets in the subset of NF-kB-independent hypoxia-induced genes? And in a randomly selected set of 500-600 genes? In my opinion, this statistical analysis should be included to demonstrate a relationship between NF-kB recruitment and hypoxia-induced upregulation (expected) and downregulation (unexpected). In this context, it is surprising that HIF consensus sites are preferentially detected in the genes that are supposed to be NF-kB dependent instead of RelA consensus.

      We thank the reviewer for this question, which is really helpful. The way we have displayed the stars on the graph for Figure 2H was slightly misleading we realize now. As such, we have amended the graph. RelA, RelB, and cRel bound genes (from the ChIP atlas) are all significantly enriched within our N____F-____k____B-dependent hypoxia-responsive genes, there is no statistical testing between RelA bound vs RelB bound or cRel bound. We have also performed this analysis on the NF-____k____B____-independent hypoxia-responsive genes ____and see the same trend (Sup. Figure S5B). This indicates that the enrichment of Rel binding sites from the ChIP atlas is not specific to NF-____k____B____-dependent hypoxia-responsive genes____. We have moved Figure 2H to (Sup. Figure S5A) and amended our description of the finding. This showcases how DNA binding does not necessarily mean functionality. We have amended our description of this result and limitation of the study.

      Figure 3 is just a confirmation by qPCR of the data obtained in the RNA-seq analysis, which in my opinion should be included as supplementary information. Moreover, both the effects of hypoxia and reversion by RelB siRNA are modest in several of the genes tested. The same is true for Figures 4 and 5 with very modest and variable results across cell types and genes.

      We appreciate this comment; we would like to keep this as a main figure for full transparency and show validation of our RNA-sequencing results.

      Figure 6 shows the effect of NF-kB knockdown on the induction of ROS in response to hypoxia. In the images provided, the effect of hypoxia is minimal in control cells, with the only clear differences shown in RelA-depleted cells.

      The quantification of the IF data (Figure 6B) shows ROS induction in hypoxia which is reduced in Rel-depleted cells, with RelA depletion having the strongest effect. ROS generation in hypoxia, although counterintuitive, is well documented and used for important signalling events. We believe our data supports the previously reported levels of ROS induction (reviewed in {Alva, 2024}) in hypoxia and importantly, that NF-____k____B depletion can at least partially____ reverse this.

      In 6B it is not clear what the three asterisks in the normoxia control represent (compared to the hypoxia siRNA control?). This should be clarified in the figure legend or text.

      We apologize for the lack of clarity we have now added this information to the figure legend.

      In the Western blot of 6C, there are no differences in the levels of SOD1 after RelA depletion. Again, there is no reason not to include the NF-kB subunits in the Western blot analysis.

      We have added the Western blot analysis to this figure. We were trying to simplify it. Although depletion of RelA does not rescue the hypoxia-induced repression of SOD1, depletion of RelB does. Furthermore, cRel although not statistically significant, has a trend for the rescue of this effect, see Figure 6C-D.

      Finally, regarding Figure 7, the authors mention that "we confirmed that hypoxia led to a reduction in several proteins represented in this panel (of proteins involved in oxidative phosphorylation), such as UQCRC2 and IDH1 (Figure 7A-B)". The authors cannot say this because it is not seen in the Western blot in 7A or in the quantification shown in 7B. In my personal opinion, stating something that is not even suggested in the experiments is very negative for the credibility of the whole message.

      We really do not agree with this comment. We do see reductions in the levels of the proteins we mentioned. We have made the figure less complex given that some proteins are very abundant while others are not. We hope the changes are now clear and apparent. We have changed the quantification normalisation to reflect this as well and modified our description of the results, see Figure 7 and Sup. Figure S18.

      In conclusion, this paper contains a large amount of relevant information, but i) non-essential data should be moved to Supplementary, ii) protein levels of relevant players need to be shown in addition to RNA, iii) minimal or undetectable differences need to be considered as no-differences, and iv) a model showing what is the interpretation of the data provided is needed to better understand the message of the paper. I mean, is it p65 or RelB binding to some of these genes leading to their activation or repression, or is it RelA or RelB inducing HIF1beta leading to NF-kB-dependent gene activation by hypoxia? If this were the case, experimental evidence that NF-kB regulates a subset of hypoxia genes through HIF1beta would make the story more understandable.

      We apologise but we do not know why the reviewer mentions HIF1beta. For gene induction, there is cooperation with the HIF system in some genes but not all. The most interesting and unexpected finding is that NF-kappaB is required for gene repression in hypoxia. We have added a new figure, investigating how HDAC inhibition could reverse the repression. A mechanism known to be employed by NF-kappaB when repressing genes. We have added all the blots for NF-kB, clarified the quantification and included other approaches including a CRISPR KO cell lines for both IKKs. We hope this is now clear.

      Reviewer #1 (Significance (Required)):

      The work presented here is interesting but does not provide a major advance over previous publications, the main message being that a subset of hypoxia-regulated genes are NF-kB dependent. However, there is no mechanistic explanation of how this regulation is achieved and several data that are not clearly connected. A more comprehensive analysis of the data and additional experimental validation would greatly enhance the significance of the work.

      We politely disagree with the reviewer. Our main finding is that NF-____k____B____ does play an important role in gene regulation in hypoxia but unexpectedly, this occurs mostly via gene repression. While there is vast knowledge on gene induction in hypoxia, gene repression, which typically does not occur directly via HIF, is virtually unknown. A previous study had identified Rest as a transcriptional repressor {PMID: 27531581} but this could only account for 20% of gene repression. Our findings reveal up to 60% of genes repressed in hypoxia require NF-____k____B____, hence this is a significant finding and a major advance over previous knowledge. Furthermore, we feel this paper is an excellent data resource for the field, as it is, to our knowledge, the first study characterising the extent to which NF-____k____B is required for hypoxia-induced gene changes, on a transcriptome-wide scale. Furthermore, we have validated this across multiple cell types and also used different approaches to investigate the role of NF-kB in the hypoxia transcriptional response. We are happy that the other reviewers agree with our novel findings.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors have interrogated the role of NF-kappaB in the cellular transcriptional response to hypoxia. While HIF is considered the master regulator of the cellular response to hypoxia, it has long been known that mutliple transcription factors also play a role both independently of HIF and through the regulation of HIF-1alpha levels. Chief amongst these is NF-kappaB, a regulator of cell death and inflammation amongst other things. While NF-kappB has been known to be activated in hypoxia through altered PhD activity, the impact of this on global gene expression has remained unclear and this study addresses this important question. Of particular interest, genes downregulated in hypoxia appear to be repressed in a NF-kappaB-dependent manner. Overall, this nice study reveals an important role for NF-kappaB in the control of the global cellular transcriptional response to hypoxia.

      We thank this reviewer for their positive comments.

      Reviewer #2 (Significance (Required)):

      Some questions for the authors to consider with experiments or discussion: -One caveat of the current study which should be discussed is that while interesting and extensive, the analysis is restricted to cancer cell lines which have dysfunctional gene expression systems which may differ from "normal" cells. This should be discussed.

      We thank the reviewer for these comments. This is indeed an important aspect, which we now expand on in the discussion section. We also took advantage of RNA-seq datasets for HUVECs (a non-transformed cell lines) in response to hypoxia (Sup. Figure S15), TNF-alpha with and without RelA depletion (Sup. Figure S16). These data support our findings that in hypoxia NF-kB is important for transcriptional repression, with some contributions to gene induction, even in a non-transformed cell system.

      In the publicly available data sets analyzed, were the same hypoxic conditions used as in this study. This information should be included.

      We apologize if this was not clear, the hypoxia RNA-seq studies are the same oxygen level and time (1%, 24 hours), this is in the legend of Figure 4A and Sup. Figure S9 and in Sup. Table S2. We have added this information to the main text also.

      • What is known about NF-kappaB as a transcriptional repressor in other systems such as the control of cytokine or infection driven inflammation? This is briefly discussed but should be expanded. This is important as a key question in the study of hypoxia is what regulates gene repression.

      We have included this in the discussion and also analysed available data in HUVECs in response to cytokine stimulation with and without RelA depletion (Sup. Figure S16). This analysis revealed equal importance of RelA for activation and repression of genes upon TNF-alpha stimulation. Around 40% of genes require RelA for their induction or repression in response to TNF-a. In the discussion we have also included other references where NF-kappaB has been found to repress genes.

      NF-kappaB has previously been shown to regulate HIF-1alpha transcription. What are the effects of NF-kappaB subunit siRNAs on basal HIF-1alpha transcription? In figure 7, it appears that NF-kappaB subunit siRNA is without effect on hypoxia-induced HIF protein expression. Could this account for some of the effects of NF-kappaB depletion on the hypoxic gene signature? This point needs to be clarified in light of the data presented.

      We have included data for HIF-1α RNA levels in HeLa cells with/without NF-____k____B____ depletion followed by 24 hours of hypoxia (Sup. Figure S20) and we see a small reduction (~10-20%). The reviewer is correct, there was not much effect of NF-____k____B____ depletion on HIF-1α protein levels following 24 hours hypoxia in HeLa cells. Effects of NF-kappaB depletion can be found usually with lower times of hypoxia exposure or when more than one subunit is depleted at the same time. We have added this as a discussion point in the revised manuscript.

      NRF-2 is a key cellular sensor of oxidative stress in a similar way to HIF being a hypoxia sensor. The authors demonstrate using a dye that ROS are paradoxically increased in hypoxia (a more controversial finding than the authors present). It would be of interest to know if NFR-2 is induced in hypoxia as a marker of cellular oxidative stress. Similarly, it would be interesting to determine by metabolic analysis whether oxidative phosphorylation (O2 consumption) is decreased as the transcriptional signature would suggest (although the difficulty of performing metabolic analysis in hypoxia is acknowledged).

      To investigate if NRF2 is induced, we performed a western blot at 0, 1, and 24 hours 1% oxygen, but didn’t see any induction of NRF2 protein levels (____Sup. Figure S17A). We also overlapped our hypoxia upregulated genes with NRF2 target genes from {PMID:24647116 and PMID: 38643749} (Sup. Figure S17B) and found limited evidence of NRF2 target genes being induced. Based on these findings, it seems that NRF2 is not being induced in hypoxia, at least not at the hypoxia level/time point we have analysed. We also agree it would be ideal to measure oxygen consumption in hypoxia, but unfortunately, we do not have the technical ability to do this at present.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Strengths This manuscript attempts to integrate multiple strands of data to determine the role of NFkB in hypoxia -induced gene expression. This analysis looks at multiple NFkB subunits in multiple cell lines to convincingly demonstrate that NFkB does indeed play a central role in the regulation of hypoxia-induced gene expression. This broad approach integrates new experimental data with findings from the published literature.

      A significant amount of work has been performed both experimentally and bioinformatically to test experimental hypotheses.

      We thank this reviewer for their positive comments.

      Limitations

      The main analysis in the paper involves comparing the impact of knocking down different NFkB family members in hypoxia and comparing transcriptional responses. I am surprised that the authors did not include the impact of knockdown of the NFkB family members in normoxia too. The absence of these control experiments allows us to understand the role of NFkB in hypoxia, but does not give us information as to how many of those impacts are specific/ induced in hypoxic conditions. i.e. many of the observed effects of NFkB knockdown could be due to basal suppression of NFkB target genes that happen to be hypoxia sensitive. This finding is obviously important, but it would be nice to know how many of those genes are only / preferentially regulated by NFkB in hypoxia. This would give a much deeper insight into the role of NFkB in hypoxia induced gene expression.

      We agree this would have been ideal. For financial reasons we limited our analysis to hypoxia samples. We have performed qPCR analysis depleting RelA, RelB and cRel under normal oxygen conditions in HeLa (Sup. Figure S8). We find that the majority of the validated genes in HeLa cells which require____ NF-____k____B for gene changes in hypoxia, are not regulated by N____F-____k____B under normal oxygen conditions____. We have also added this limitation into our discussion section.

      The broad experimental approach while a strength of the paper in many ways also has its limitations e.g. Motif analysis revealing e.g. HIF-1a binding site enrichment in RelA and RelB-dependent DEGs is correlative observation and does not prove HIF involvement in NFkB-dependent hypoxia induced gene activation. Comparing responses with responses seen in one cell type with responses that have been described in a database comprised of many studies in a variety of different cells also has some limitations. These points can be described more fully in the discussion

      We agree these are mere correlations and hence a limitation and we have not formerly tested the involvement of HIF. We have included this in the discussion as suggested. For HIF binding site correlation, we do also compare to HIF ChIP-seq in HeLa cells exposed to 1% oxygen, albeit at 8 hours and not 24 hours (Sup. Figure S4).

      For siRNA transfections, single oligonucleotide sequences were used for RelA, RelB and cRel. This increases the potential likelihood of 'off targets' compared to pooled oligos delivered at lower concentrations. This limitation should at least be mentioned.

      We agree and have now included this as a limitation in the discussion section. We have now also included analysis using wild type and 2 different IKK____________ double KO CRISPR cell lines generated in the following publication {PMID: 35029639}. Out of the 9 genes we identified as NF-____k____B-dependent hypoxia upregulated genes from HeLa cell RNA-seq and validated by qPCR, which are also hypoxia-responsive in HCT116 cells (Sup. Figure S11D), 6 displayed ____NF-____k____B dependence in HCT116 cells (Sup. Figure S14). We also provide new protein data in this cell system for oxidative phosphorylation markers, which show as with the siRNA depletion, rescue of repression of these proteins when NF-____k____B is inactivated.

      RNA-seq experiments are performed on n=2 data which means relatively low statistical power. How has the statistical analysis been performed on normalised counts (corresponding to 2 n- numbers) to yield statistical significance? I am not familiar with hypergeometric tests - please justify their use here.

      __*We use DESeq2 for differential expression analysis and filter for effect size (> -/+ 0.58 log2 fold change) and statistical significance (FDR I am not familiar with hypergeometric tests - please justify their use here.

      The hypergeometric test (equivalent to a one-sided Fisher's exact test) is routinely used to determine whether the observed overlap between two gene lists is statistically significant compared to what would be expected by chance. It is also the statistical test of choice for popular bioinformatics tools which perform over representation analysis (ORA) to see which gene sets/groups/pathways/ontologies are over-represented in a gene list, examples include Metascape, clusterProfiler, WebGestalt (used in this study), and gProfiler.

      P14 RelB is described as having the most widespread impact of hypoxia dependent gene changes across all cell systems tested. Could this be due to a more potent silencing of RelB and / or due to particularly high/ low expression of RelB in these cells in general?

      This is an excellent point, at the RNA level the RelB depletion is slightly more efficient (Sup. Figure S1), at the protein level, silencing is highly potent with all 3 siRNAs (Sup. Figure S1). We looked at the RNA levels of RelA, RelB and cRel in HeLa cells at basal conditions, and RelA shows the highest abundance compared to RelB and cRel, while RelB and cRel have similar expression levels (see below). However, RelB is very dynamic in response to hypoxia, something we have observed but have not published yet.

      P18 For western blot analysis best practise is to have 2 MW markers per blot presented

      We have and have added the second MW markers suggested.

      For quantification, I suggest avoiding performing statistical analysis on semi-quantitative data unless a dynamic range of detection (with standards) has been fully established.

      We agree this has many limitations, we will keep the quantification but moved into supplementary information.

      P19 There is clearly an effect of reciprocal silencing with the NFkB knockdown experiments ie. siRelA affects RelB levels in hypoxia and vice versa. The implications of this for data interpretation should be discussed.

      Indeed, it is well known that RelB and cRel are RelA targets. Less is known about RelA as it is not a known NF-____k____B____ target. We have added a discussion in the revised manuscript.

      P20 The literature can be better cited in relation RelB and hypoxia A brief search reveals a few papers that should be mentioned/ discussed. Oliver et al. 2009 Patel et al. 2017 Riedl et al. 2021

      We have looked into these suggestions. Oliver et al, refer to hypercapnia, not hypoxia and the other two only briefly mentioned RelB with no effects toward the goals of their studies. We have tried to incorporate what is currently known as much as possible.

      I suggest leaving out mention of IkBa sumoylation and supplementary figure 10. I'm not sure the data in the paper as a whole merits focus on this very specific point.

      We thank the reviewer for this suggestion and we have removed this aspect from the manuscript.

      There is a very strong reliance on mRNA and TPM data. Some additional protein data in support of key findings will enhance

      We have added additional protein level analysis where we could obtain antibodies, see Figures 6, 7 and Sup. Figures S17, S18, and S19 for our protein level analysis.

      A graphical abstract summarising key findings with exemplar genes highlighted will enhance.

      We have added a model to summarise our findings as suggested.

      Both HIF and NFKB are ancient evolutionarily conserved pathways. Can lessons be learned from evolutionary biology as to how NFkB regulation of hypoxia induced genes occured. Does the HIF pathway pre-date the NFkB pathway or vice versa. This approach could be valuable in supporting the findings from this study.

      We have investigated this. Unfortunately, there are very little available data on hypoxia gene expression in lower organisms. However, we have added a few sentences on the evolution of NF-____k____B____ and HIF.

      Minor comments P2 please briefly explain how 5 genes give rise to 7 proteins

      We have added this to the introduction as requested.

      P2 there seems to be some recency bias in the studies cited as being associated with NFkB activation in response to hypoxia. Mention of Koong et al (1994) and Taylor et al (1999) and other early papers in the field will enhance

      We have added these as suggested.

      P3 The role of PHD enzymes in the regulation of NFkB in hypoxia can be introduced and / or discussed

      We have added a reference to this aspect as suggested.

      P8 I suggest use of proportional Venn diagrams to demonstrate the patterns more clearly

      We have added these as suggested.

      P11 To what extent might NFkB and Rest co-operate/ co-regulate gene repression in hypoxia?

      This is a good question. We have overlapped our datasets with Rest-dependent hypoxia-regulated genes identified by Cavadas et al., (Figure below), and find that these appear to act independently of each other for the most part, with very few genes co-regulated by both.

      Reviewer #3 (Significance (Required)):

      Shakir et al. present a manuscript titled 'NFkB is a central regulator of hypoxia-induced gene expression'.

      The research group are experts in both NFkB and hypoxia signaling and are the ideal group to perform these studies.

      Hypoxia and inflammation are co-incident in many physiological and pathophysiological conditions, where the microenvironment affects disease severity and patient outcome. The cross talk between inflammatory and hypoxia signaling pathways is not fully described. Thus, this manuscript takes a novel approach to an established question and concludes clearly that NFkB is a central regulator of hypoxia-induced gene expression.

      We thank the reviewer for these positive comments.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) I miss some treatment of the lack of behavioural correlate. What does it mean that metamine benefits EEG classification accuracy without improving performance? One possibility here is that there is an improvement in response latency, rather than perceptual sensitivity. Is there any hint of that in the RT results? In some sort of combined measure of RT and accuracy? 

      First, we would like to thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.  

      The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data, neither in the reported accuracy data nor in the RT data. We do not report RT results as participants were instructed to respond as accurately as possible, without speed pressure. We added a paragraph in the discussion section to point to possible reasons for this surprising finding:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that we found a tight link between these EEG decoding markers and behavioral performance in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine was just too subtle to show up in changes in overt behavior.”

      (2) An explanation is missing, about why memantine impacts the decoding of illusion but not collinearity. At a systems level, how would this work? How would NMDAR antagonism selectively impact long-range connectivity, but not lateral connectivity? Is this supported by our understanding of laminar connectivity and neurochemistry in the visual cortex?

      We have no straightforward or mechanistic explanation for this finding. In the revised discussion, we are highlighting this finding more clearly, and included some speculative explanations:

      “The present effect of memantine was largely specific to illusion decoding, our marker of feedback processing, while collinearity decoding, our marker of lateral processing, was not (experiment 1) or only weakly (experiment 2) affected by memantine. We have no straightforward explanation for why NMDA receptor blockade would impact inter-areal feedback connections more strongly than intra-areal lateral connections, considering their strong functional interdependency and interaction in grouping and segmentation processes (Liang et al., 2017). One possibility is that this finding reflects properties of our EEG decoding markers for feedback vs. lateral processing: for example, decoding of the Kanizsa illusion may have been more sensitive to the relatively subtle effect of our pharmacological manipulation, either because overall decoding was better than for collinearity or because NMDA receptor dependent recurrent processes more strongly contribute to illusion decoding than to collinearity decoding.”

      (3) The motivating idea for the paper is that the NMDAR antagonist might disrupt the modulation of the AMPA-mediated glu signal. This is in line with the motivating logic for Self et al., 2012, where NMDAR and AMPAR efficacy in macacque V1 was manipulated via microinfusion. But this logic seems to conflict with a broader understanding of NMDA antagonism. NMDA antagonism appears to generally have the net effect of increasing glu (and ACh) in the cortex through a selective effect on inhibitory GABAergic cells (eg. Olney, Newcomer, & Farber, 1999). Memantine, in particular, has a specific impact on extrasynaptic NMDARs (that is in contrast to ketamine; Milnerwood et al, 2010, Neuron), and this type of receptor is prominent in GABA cells (eg. Yao et al., 2022, JoN). The effect of NMDA antagonists on GABAergic cells generally appears to be much stronger than the effect on glutamergic cells (at least in the hippocampus; eg. Grunze et al., 1996).

      This all means that it's reasonable to expect that memantine might have a benefit to visually evoked activity. This idea is raised in the GD of the paper, based on a separate literature from that I mentioned above. But all of this could be better spelled out earlier in the paper, so that the result observed in the paper can be interpreted by the reader in this broader context.

      To my mind, the challenging task is for the authors to explain why memantine causes an increase in EEG decoding, where microinfusion of an NMDA antagonist into V1 reduced the neural signal Self et al., 2012. This might be as simple as the change in drug... memantine's specific efficacy on extrasynaptic NMDA receptors might not be shared with whatever NMDA antagonist was used in Self et al. 2012. Ketamine and memantine are already known to differ in this way. 

      We addressed the reviewer’s comments in the following way. First, we bring up our (to us, surprising) result already at the end of the Introduction, pointing the reader to the explanation mentioned by the reviewer:

      “We hypothesized that disrupting the reentrant glutamate signal via blocking NMDA receptors by memantine would impair illusion and possibly collinearity decoding, as putative markers of feedback and lateral processing, but would spare the decoding of local contrast differences, our marker of feedforward processing. To foreshadow our results, memantine indeed specifically affected illusion decoding, but enhancing rather than impairing it. In the Discussion, we offer explanations for this surprising finding, including the effect of memantine on extrasynaptic NMDA receptors in GABAergic cells, which may have resulted in boosted visual activity.”

      Second, as outlined in the response to the first point by Reviewer #2, we are now clear throughout the title, abstract, and paper that memantine “improved” rather than “modulated” illusion decoding.

      Third, and most importantly, we restructured and expanded the Discussion section to include the reviewer’s proposed mechanisms and explanations for the effect. We would like to thank the reviewer for pointing us to this literature. We also discuss the results of Self et al. (2012), specifically the distinct effects of the two NMDAR antagonists used in this study, more extensively, and speculate that their effects may have been similar to ketamine and thus possibly opposite of memantine (for the feedback signal):

      “Although both drugs are known to inhibit NMDA receptors by occupying the receptor’s ion channel and are thereby blocking current flow (Glasgow et al., 2017; Molina et al., 2020), the drugs have different actions at receptors other than NMDA, with ketamine acting on dopamine D2 and serotonin 5-HT2 receptors, and memantine inhibiting several subtypes of the acetylcholine (ACh) receptor as well as serotonin 5HT3 receptors. Memantine and ketamine are also known to target different NMDA receptor subpopulations, with their inhibitory action displaying different time courses and intensity (Glasgow et al., 2017; Johnson et al., 2015). Blockade of different NMDA receptor subpopulations can result in markedly different and even opposite results. For example, Self and colleagues (2012) found overall reduced or elevated visual activity after microinfusion of two different selective NMDA receptor antagonists (2-amino-5phosphonovalerate and ifendprodil) in macaque primary visual cortex. Although both drugs impaired the feedback-related response to figure vs. ground, similar to the effects of ketamine (Meuwese et al., 2013; van Loon et al., 2016) such opposite effects on overall activity demonstrate that the effects of NMDA antagonism strongly depend on the targeted receptor subpopulation, each with distinct functional properties.”

      Finally, we link these differences to the potential mechanism via GABAergic neurons:

      “As mentioned in the Introduction, this may be related to memantine modulating processing at other pre- or post-synaptic receptors present at NMDA-rich synapses, specifically affecting extrasynaptic NMDA receptors in GABAergic cells (Milnerwood et al, 2010; Yao et al., 2022). Memantine’s strong effect on extrasynaptic NMDA receptors in GABAergic cells leads to increases in ACh levels, which have been shown to increase firing rates and reduce firing rate variability in macaques (Herrero et al., 2013, 2008). This may represent a mechanism through which memantine (but not ketamine or the NMDA receptor antagonists used by Self and colleagues) could boost visually evoked activity.”

      (4) The paper's proposal is that the effect of memantine is mediated by an impact on the efficacy of reentrant signaling in visual cortex. But perhaps the best-known impact of NMDAR manipulation is on LTP, in the hippocampus particularly but also broadly.

      Perception and identification of the kanisza illusion may be sensitive to learning (eg. Maertens & Pollmann, 2005; Gellatly, 1982; Rubin, Nakayama, Shapley, 1997); what argues against an account of the results from an effect on perceptual learning? Generally, the paper proposes a very specific mechanism through which the drug influences perception. This is motivated by results from Self et al 2012 where an NMDA antagonist was infused into V1. But oral memantine will, of course, have a whole-brain effect, and some of these effects are well characterized and - on the surface - appear as potential sources of change in illusion perception. The paper needs some treatment of the known ancillary effects of diffuse NMDAR antagonism to convince the reader that the account provided is better than the other possibilities. 

      We cannot fully exclude an effect based on perceptual learning but consider this possibility highly unlikely for several reasons. First, subjects have performed more than a thousand trials in a localizer session before starting the main task (in experiment 2 even more than two thousand) containing the drug manipulation. Therefore, a large part of putative perceptual learning would have already occurred before starting the main experiment. Second, the main experiment was counterbalanced across drug sessions, so half of the participants first performed the memantine session and then the placebo session, and the other half of the subjects the other way around. If memantine would have improved perceptual learning in our experiments, one may actually expect to observe improved decoding in the placebo session and not in the memantine session. If memantine would have facilitated perceptual learning during the memantine session, the effect of that facilitated perceptual learning would have been most visible in the placebo session following the memantine session. Because we observed improved decoding in the memantine session itself, perceptual learning is likely not the main explanation for these findings. Third, perceptual learning is known to occur for several stimulus dimensions (e.g., orientation, spatial frequency or contrast). If these findings would have been driven by perceptual learning one would have expected to see perceptual learning for all three features, whereas the memantine effects were specific to illusion decoding. Especially in experiment 2, all features were equally often task relevant and in such a situation one would’ve expected to observe perceptual learning effects on those other features as well.  

      To further investigate any potential role of perceptual learning, we analyzed participants’ performance in detecting the Kanizsa illusion over the course of the experiments. To investigate this, we divided the experiments’ trials into four time bins, from the beginning until the end of the experiment. For the first experiment’s first target (T1), there was no interaction between the factors bin and drug (memantine/placebo; F<sub>3,84</sub>=0.89, P\=0.437; Figure S6A). For the second target (T2), we performed a repeatedmeasures ANOVA with the factors bin, drug, T1-T2 lag (short/long), and masks (present/absent). There was only a trend towards a bin by drug interaction (F<sub>3,84</sub>=2.57, P\=0.064; Figure S6B), reflecting worse performance under memantine in the first three bins and slightly better performance in the fourth bin. The other interactions that include the factors bin and drug factors were not significant (all P>0.117). For the second experiment, we performed a repeated-measures ANOVA with the factors bin, drug, masks, and task-relevant feature (local contrast/collinearity/illusion). None of the interactions that included the bin and drug factors were significant (all P>0.219; Figure S6C). Taken together, memantine does not appear to affect Kanizsa illusion detection performance through perceptual learning. Finally, there was no interaction between the factors bin and task-relevant feature (F<sub>6,150</sub>=0.76, P\=0.547; Figure S6D), implying there is no perceptual learning effect specific to Kanizsa illusion detection. We included these analyses in our revised Supplement as Fig. S6.

      (5) The cross-decoding approach to data analysis concerns me a little. The approach adopted here is to train models on a localizer task, in this case, a task where participants matched a kanisza figure to a target template (E1) or discriminated one of the three relevant stimuli features (E2). The resulting model was subsequently employed to classify the stimuli seen during separate tasks - an AB task in E1, and a feature discrimination task in E2. This scheme makes the localizer task very important. If models built from this task have any bias, this will taint classifier accuracy in the analysis of experimental data. My concern is that the emergence of the kanisza illusion in the localizer task was probably quite salient, respective to changes in stimuli rotation or collinearity. If the model was better at detecting the illusion to begin with, the data pattern - where drug manipulation impacts classification in this condition but not other conditions - may simply reflect model insensitivity to non-illusion features.

      I am also vaguely worried by manipulations implemented in the main task that do not emerge in the localizer - the use of RSVP in E1 and manipulation of the base rate and staircasing in E2. This all starts to introduce the possibility that localizer and experimental data just don't correspond, that this generates low classification accuracy in the experimental results and ineffective classification in some conditions (ie. when stimuli are masked; would collinearity decoding in the unmasked condition potentially differ if classification accuracy were not at a floor? See Figure 3c upper, Figure 5c lower).

      What is the motivation for the use of localizer validation at all? The same hypotheses can be tested using within-experiment cross-validation, rather than validation from a model built on localizer data. The argument may be that this kind of modelling will necessarily employ a smaller dataset, but, while true, this effect can be minimized at the expense of computational cost - many-fold cross-validation will mean that the vast majority of data contributes to model building in each instance. 

      It would be compelling if results were to reproduce when classification was validated in this kind of way. This kind of analysis would fit very well into the supplementary material.

      We thank the reviewer for this excellent question. We used separate localizers for several reasons, exactly to circumvent the kind of biases in decoding that the reviewer alludes to. Below we have detailed our rationale, first focusing on our general rationale and then focusing on the decisions we made in designing the specific experiments.  

      Using a localizer task in the design of decoding analysis offers several key advantages over relying solely on k-fold cross-validation within the main task:

      (1) Feature selection independence and better generalization: A separate localizer task allows for independent feature selection, ensuring that the features used for decoding are chosen without bias from the main task data. Specifically, the use of a localizer task allows us to determine the time-windows of interest independently based on the peaks of the decoding in the localizer. This allows for a better direct comparison between the memantine and placebo conditions because we can isolate the relevant time windows outside a drug manipulation. Further, training a classifier on a localizer task and testing it on a separate experimental task assesses whether neural representations generalize across contexts, rather than simply distinguishing conditions within a single dataset. This supports claims about the robustness of the decoded information.

      (2) Increased sensitivity and interpretability: The localizer task can be designed specifically to elicit strong, reliable responses in the relevant neural patterns. This can improve signal-to-noise ratio and make it easier to interpret the features being used for decoding in the test set. We facilitate this by having many more trials in the localizer tasks (1280 in E1 and 5184 in E2) than in the separate conditions of the main task, in which we would have to do k-folding (e.g., 2, mask, x 2 (lag) design in E1 leaves fewer than 256 trials, due to preprocessing, for specific comparisons) on very low trial numbers. The same holds for experiment 2 which has a 2x3 design, but also included the base-rate manipulation. Finally, we further facilitate sensitivity of the model by having the stimuli presented at full contrast without any manipulations of attention or masking during the localizer, which allows us to extract the feature specific EEG signals in the most optimal way.

      (3) Decoupling task-specific confounds: If decoding is performed within the main task using k-folding, there is a risk that task-related confounds (e.g., motor responses, attention shifts, drug) influence decoding performance. A localizer task allows us to separate the neural representation of interest from these taskrelated confounds.

      Experiment 1 

      In experiment 1, the Kanizsa was always task relevant in the main experiment in which we employed the pharmacological manipulation. To make sure that the classifiers were not biased towards Kanizsa figures from the start (which would be the case if we would have done k-folding in the main task), we used a training set in which all features were equally relevant for task performance. As can be seen in figure 1E, which plots the decoding accuracies of the localizer task, illusion decoding as well as rotation decoding were equally strong, whereas collinearity decoding was weaker. It may be that the Kanizsa illusion was quite salient in the localizer task, which we can’t know at present, but it was at least less salient and relevant than in the main task (where it was the only task-relevant feature). Based on the localizer decoding results one could argue that the rotation dimension and illusion dimension were most salient, because the decoding was highest for these dimensions. Clearly the model was not insensitive to nonillusory features. The localizer task of experiment 2 reveals that collinearity decoding tends to be generally lower, even when that feature is task relevant.  

      Experiment 2 

      In experiment 2, the localizer task and main task were also similar, with three exceptions: during the localizer task no drug was active, and no masking and no base rate manipulation were employed. To make sure that the classifier was not biased towards a certain stimulus category (due to the bias manipulation), e.g. the stimulus that is presented most often, we used a localizer task without this manipulation. As can be seen in figure 4D decoding of all the features was highly robust, also for example for the collinearity condition. Therefore the low decoding that we observe in the main experiment cannot be due to poor classifier training or feature extraction in the localizer. We believe this is actually an advantage instead of a disadvantage of the current decoding protocol.

      Based on the rationale presented above we are uncomfortable performing the suggested analyses using a k-folding approach in the main task, because according to our standards the trial numbers are too low and the risk that these results are somehow influenced by task specific confounds cannot be ruled out.  

      Line 301 - 'Interestingly, in both experiments the effect of memantine... was specific to... stimuli presented without a backward mask.' This rubs a bit, given that the mask broadly disrupted classification. The absence of memantine results in masked results may simply be a product of the floor ... some care is needed in the interpretation of this pattern. 

      In the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      While floor is less likely to account for the absence of an effect in the masked condition in experiment 2, where illusion decoding in the masked condition was significantly above chance, it is still possible that to obtain an effect of memantine, decoding accuracy needed to be higher. We therefore also added here:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      In the discussion, we changed the sentence to read “…the effect of memantine on illusion decoding tended to be specific to attended, task-relevant stimuli presented without a backward mask.”

      Line 441 - What were the contraindications/exclusion parameters for the administration of memantine? 

      Thanks for spotting this. We have added the relevant exclusion criteria in the revised version of the supplement. See also below.

      – Allergy for memantine or one of the inactive ingredients of these products;

      – (History of) psychiatric treatment;

      – First-degree relative with (history of) schizophrenia or major depression;

      – (History of) clinically significant hepatic, cardiac, obstructive respiratory, renal, cerebrovascular, metabolic or pulmonary disease, including, but not limited to fibrotic disorders;

      – Claustrophobia;

      –  Regular usage of medicines (antihistamines or occasional use of paracetamol);

      – (History of) neurological disease;

      –  (History of) epilepsy;

      –  Abnormal hearing or (uncorrected) vision;

      –  Average use of more than 15 alcoholic beverages weekly;

      – Smoking

      – History of drug (opiate, LSD, (meth)amphetamine, cocaine, solvents, cannabis, or barbiturate) or alcohol dependence;

      – Any known other serious health problem or mental/physical stress;

      – Used psychotropic medication, or recreational drugs over a period of 72 hours prior to each test session,  

      – Used alcohol within the last 24 hours prior to each test session;

      – (History of) pheochromocytoma.

      – Narrow-angle glaucoma;

      – (History of) ulcer disease;

      – Galactose intolerance, Lapp lactase deficiency or glucose­galactose malabsorption.

      – (History of) convulsion;

      Line 587 - The localizer task used to train the classifier in E2 was collected in different sessions. Was the number of trials from separate sessions ultimately equal? The issue here is that the localizer might pick up on subtle differences in electrode placement. If the test session happens to have electrode placement that is similar to the electrode placement that existed for a majority of one condition of the localizer... this will create bias. This is likely to be minor, but machine classifiers really love this kind of minor confound.

      Indeed, the trial counts in the separate sessions for the localizer in E2 were equal. We have added that information to the methods section.  

      Experiment 1: 1280 trials collected during the intake session.

      In experiment 2: 1728 trials were collected per session (intake, and 2 drug sessions), so there were 5184 trials across three sessions.

      Reviewer #2:

      To start off, I think the reader is being a bit tricked when reading the paper. Perhaps my priors are too strong, but I assumed, just like the authors, that NMDA-receptors would disrupt recurrent processing, in line with previous work. However, due to the continuous use of the ambiguous word 'affected' rather than the more clear increased or perturbed recurrent processing, the reader is left guessing what is actually found. That's until they read the results and discussion finding that decoding is actually improved. This seems like a really big deal, and I strongly urge the authors to reword their title, abstract, and introduction to make clear they hypothesized a disruption in decoding in the illusion condition, but found the opposite, namely an increase in decoding. I want to encourage the authors that this is still a fascinating finding.

      We thank the reviewer for the positive assessment of our manuscript, and for many helpful comments and suggestions.  

      We changed the title, abstract, and introduction in accordance with the reviewer’s comment, highlighting that “memantine […] improves decoding” and “enhances recurrent processing” in all three sections. We also changed the heading of the corresponding results section to “Memantine selectively improves decoding of the Kanizsa illusion”.

      Apologies if I have missed it, but it is not clear to me whether participants were given the drug or placebo during the localiser task. If they are given the drug this makes me question the logic of their analysis approach. How can one study the presence of a process, if their very means of detecting that process (the localiser) was disrupted in the first place? If participants were not given a drug during the localiser task, please make that clear. I'll proceed with the rest of my comments assuming the latter is the case. But if the former, please note that I am not sure how to interpret their findings in this paper.

      Thanks for asking this, this was indeed unclear. In experiment 1 the localizer was performed in the intake session in which no drugs were administered. In the second experiment the localizer was performed in all three sessions with equal trial numbers. In the intake session no drugs were administrated. In the other two sessions the localizer was performed directly after pill intake and therefore the memantine was not (or barely) active yet. We started the main task four hours after pill intake because that is the approximate peak time of memantine. Note that all three localizer tasks were averaged before using them as training set. We have clarified this in the revised manuscript.

      The main purpose of the paper is to study recurrent processing. The extent to which this study achieves this aim is completely dependent to what extent we can interpret decoding of illusory contours as uniquely capturing recurrent processing. While I am sure illusory contours rely on recurrent processing, it does not follow that decoding of illusory contours capture recurrent processing alone. Indeed, if the drug selectively manipulates recurrent processing, it's not obvious to me why the authors find the interaction with masking in experiment 2. Recurrent processing seems to still be happening in the masked condition, but is not affected by the NMDA-receptor here, so where does that leave us in interpreting the role of NMDA-receptors in recurrent processing? If the authors can not strengthen the claim that the effects are completely driven by affecting recurrent processing, I suggest that the paper will shift its focus to making claims about the encoding of illusory contours, rather than making primary claims about recurrent processing.

      We indeed used illusion decoding as a marker of recurrent processing. Clearly, such a marker based on a non-invasive and indirect method to record neural activity is not perfect. To directly and selectively manipulate recurrent processing, invasive methods and direct neural recordings would be required. However, as explained in the revised Introduction,

      “In recent work we have validated that the decoding profiles of these features of different complexities at different points in time, in combination with the associated topography, can indeed serve as EEG markers of feedforward, lateral and recurrent processes (Fahrenfort et al., 2017; Noorman et al., 2023).”  

      The timing and topography of the decoding results of the present study were consistent with our previous EEG decoding studies (Fahrenfort et al., 2017; Noorman et al., 2023). This validates the use of these EEG decoding signatures as (imperfect) markers of distinct neural processes, and we continue to use them as such. However, we expanded the discussion section to alert the reader to the indirect and imperfect nature of these EEG decoding signatures as markers of distinct neural processes: “Our approach relied on using EEG decoding of different stimulus features at different points in time, together with their topography, as markers of distinct neural processes. Although such non-invasive, indirect measures of neural activity cannot provide direct evidence for feedforward vs. recurrent processes, the timing, topography, and susceptibility to masking of the decoding signatures obtained in the present study are consistent with neurophysiology (e.g., Bosking et al., 1997; Kandel et al., 2000; Lamme & Roelfsema, 2000; Lee & Nguyen, 2001; Liang et al., 2017; Pak et al., 2020), as well as with our previous work (Fahrenfort et al., 2017; Noorman et al., 2023).” 

      The reviewer is also concerned about the lack of effect of memantine on illusion decoding in the masked condition in experiment 2. In our view, the strong effect of masking on illusion decoding (both in absolute terms, as well as when compared to its effect on local contrast decoding), provides strong support for our assumption that illusion decoding represents a marker of recurrent processing. Nevertheless, as the reviewer points out, weak but statistically significant illusion decoding was still possible in the masked condition, at least when the illusion was task-relevant. As the reviewer notes, this may reflect residual recurrent processing during masking, a conclusion consistent with the relatively high behavioral performance despite masking (d’ > 1). However, rather than invalidating the use of our EEG markers or challenging the role of NMDA-receptors in recurrent processing, this may simply reflect a floor effect. As outlined in our response to reviewer #1 (who was concerned about floor effects), in the results section of experiment 1, we added:

      “While the interaction between masking and memantine only approached significance (P\=0.068), the absence of an effect of memantine in the masked condition could reflect a floor effect, given that illusion decoding in the masked condition was not significantly better than chance.”

      And for experiment 1:

      “For our time window-based analyses of illusion decoding, the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking (note, however, given overall much lower decoding accuracy in the masked condition, the lack of a memantine effect could reflect a floor effect).”

      An additional claim is being made with regards to the effects of the drug manipulation. The authors state that this effect is only present when the stimulus is 1) consciously accessed, and 2) attended. The evidence for claim 1 is not supported by experiment 1, as the masking manipulation did not interact in the cluster-analyses, and the analyses focussing on the peak of the timing window do not show a significant effect either. There is evidence for this claim coming from experiment 2 as masking interacts with the drug condition. Evidence for the second claim (about task relevance) is not presented, as there is no interaction with the task condition. A classical error seems to be made here, where interactions are not properly tested. Instead, the presence of a significant effect in one condition but not the other is taken as sufficient evidence for an interaction, which is not appropriate. I therefore urge the authors to dampen the claim about the importance of attending to the decoded features. Alternatively, I suggest the authors run their interactions of interest on the time-courses and conduct the appropriate clusterbased analyses.

      We thank the reviewer for pointing out the importance of key interaction effects. Following the reviewer’s suggestion, we dampened our claims about the role of attention. For experiment 1, we changed the heading of the relevant results section from “Memantine’s effect on illusion decoding requires attention” to “The role of consciousness and attention in memantine’s effect on illusion decoding”, and we added the following in the results section:

      “Also our time window-based analyses showed a significant effect of memantine only when the illusion was both unmasked and presented outside the AB (t_28\=-2.76, _P\=0.010, BF<sub>10</sub>=4.53; Fig. 3F). Note, however, that although these post-hoc tests of the effect of memantine on illusion decoding were significant, for our time window-based analyses we did not obtain a statistically significant interaction between the AB and memantine, and the interaction between masking and memantine only approached significance (P\= 0.068). Thus, although these memantine effects were slightly less robust than for T1, probably due to reduced trial counts, these results point to (but do not conclusively demonstrate) a selective effect of memantine on illusion-related feedback processing that depends on the availability of attention. In addition to the lack of the interaction effect, another potential concern…”

      For experiment 2, we added the following in the results section:

      “Note that, for our time window-based analyses of illusion decoding, although the specificity of the memantine effect to the unmasked condition was supported by a significant interaction between drug and masking, we did not obtain a statistically significant interaction between memantine and task-relevance. Thus, although the memantine effect was significant only when the illusion was unmasked and taskrelevant, just like for the effect of temporal attention in experiment 1, these results do not conclusively demonstrate a selective effect of memantine that depends attention (task-relevance).”

      In the discussion, we toned down claims about memantine’s effects being specific to attended conditions, we are highlighting the “preliminary” nature of these findings, and we are now alerting the reader explicitly to be careful with interpreting these effects, e.g.:

      “Although these results have to be interpreted with caution because the key interaction effects were not statistically significant, …”

      How were the length of the peak-timing windows established in Figure 1E? My understanding is that this forms the training-time window for the further decoding analyses, so it is important to justify why they have different lengths, and how they are determined. The same goes for the peak AUC time windows for the interaction analyses. A number of claims in the paper rely on the interactions found in these posthoc analyses, so the 223- to 323 time window needs justification.

      Thanks for this question. The length of these peak-timing windows is different because the decoding of rotation is temporarily very precise and short-lived, whereas the decoding of the other features last much longer and is more temporally variable. In fact, we have followed the same procedure as in a previously published study (Noorman et al., elife 2025) for defining the peak-timing and length of the windows. We followed the same procedure for both experiments reported in this paper, replicating the crucial findings and therefore excluding the possibility that these findings are in any way dependent on the time windows that are selected. We have added that information to the revised version of the manuscript.

      Reviewer #3:

      First, despite its clear pattern of neural effects, there is no corresponding perceptual effect. Although the manipulation fits neatly within the conceptual framework, and there are many reasons for not finding such an effect (floor and ceiling effects, narrow perceptual tasks, etc), this does leave open the possibility that the observation is entirely epiphenomenal, and that the mechanisms being recorded here are not actually causally involved in perception per se.

      We thank the reviewer for the positive assessment of our work. The reviewer rightly points out that, to our surprise, we did not obtain a correlate of the effect of memantine in our behavioral data. We agree with the possible reasons for the absence of such an effect highlighted by the reviewer, and expanded our discussion section accordingly:

      “There are several possible reasons for this lack of behavioral correlate.  For example, EEG decoding may be a more sensitive measure of the neural effects of memantine, in particular given that perceptual sensitivity may have been at floor (masked condition, experiment 1) or ceiling (unmasked condition, experiment 1, and experiment 2). It is also possible that the present decoding results are merely epiphenomenal, not mapping onto functional improvements (e.g., Williams et al., 2007). However, given that in our previous work we found a tight link between these EEG decoding markers and behavioral performance (Fahrenfort et al., 2017; Noorman et al., 2023), it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

      Second, although it is clear that there is an effect on decoding in this particular condition, what that means is not entirely clear - particularly since performance improves, rather than decreases. It should be noted here that improvements in decoding performance do not necessarily need to map onto functional improvements, and we should all be careful to remain agnostic about what is driving classifier performance. Here too, the effect of memantine on decoding might be epiphenomenal - unrelated to the information carried in the neural population, but somehow changing the balance of how that is electrically aggregated on the surface of the skull. *Something* is changing, but that might be a neurochemical or electrical side-effect unrelated to actual processing (particularly since no corresponding behavioural impact is observed.)

      We would like to refer to our reply to the previous point, and we would like to add that in our previous work (Fahrenfort et al., 2017; Noorman et al., 2023) similar EEG decoding markers were often tightly linked to changes in behavioral performance. This indicates that these particular EEG decoding markers do not simply reflect some sideeffect not related to neural processing. However, as stated in the revised discussion section, “it is possible that the effect of memantine in the present study was just too subtle to show up in changes in overt behavior.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (…) In my view, the part about NF-YA1 is less strong - although I realize this is a compelling candidate to be a regulator of cell cycle progression, the experimental approaches used to address this question falls a bit short, in particular, compared to the very detailed approaches shown in the rest of the manuscript. The authors show that the transcription factor NF-YA1 regulates cell division in tobacco leaves; however, there is no experimental validation in the experimental system (nodules). All conclusions are based on a heterologous cell division system in tobacco leaves. The authors state that NF-YA1 has a nodule-specific role as a regulator of cell differentiation. I am concerned the tobacco system may not allow for adequate testing of this hypothesis.

      Reviewer #1 makes a valid point by asking to focus the manuscript more explicitly on the role of NF-YA1 as a differentiation factor in a symbiotic context. We have now addressed this formally and experimentally.

      The involvement of A-type NF-Y subunits in the transition to the early differentiation of nodule cells has been documented in model legumes through several publications that we refer to in the revised version of the discussion (lines 617/623). We fully agree that the CDEL system, because it is heterologous, does not allow us more than to propose a parallel explanation for these observations - i.e_., that the Medicago NF-YA1 subunit presumably acts in post-replicative cell-cycle regulation at the G2/M transition. Considering your recommendations and those of reviewer #2, we sought to support this conclusion by testing the impact of localized over-expression of _NF-YA1 on cortical cell division and infection competence at an early stage of root colonization. The results of these experiments are now presented in the new Figure 9 and Figure 9-figure supplement 1-5 and described from line 435 to 495.

      With the fluorescent tools the authors have at hand (in particular tools to detect G2/M transition, which the authors suggest is regulated by NF-YA1), it would be interesting to test what happens to cell division if NF-YA1 is over-expressed in Medicago roots?

      To limit pleiotropic effects of an ectopic over-expression, we used the symbiosis-induced, ENOD11 promoter to increase NF-YA1 expression levels more specifically along the trajectory of infected cells. We chose to remain in continuity with the experiments performed in the CDEL system by opting for a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. The results obtained are presented in Figure 9B (quantification of split infected cells), in Figure 9-figure supplement 1B (ENOD11 expression profile), in Figure 9-figure supplement 3B (representative confocal images) and Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal). There, we show that mitosis remains inhibited in cells accommodating infection threads, but is completed in a higher proportion of outer cortical cells positioned on the infection trajectory, where ENOD11 gene transcription is active before their physical colonization.

      Based on NF-YA1 expression data published previously and their results in tobacco epidermal cells, the authors hypothesize that NF-YA regulates the mitotic entry of nodule primordial cells. Given that much of the manuscript deals with earlier stages of the infection, I wonder if NF-YA1 could also have a role in regulating mitotic entry in cells adjacent to the infection thread?

      The expression profile of NF-YA1 at early stages of cortical infection (Laporte et al., 2014) is indeed similar to the one of ENOD11 (as shown in Figure 9-figure supplement 1C) in wild-type Medicago roots, with corresponding transcriptional reporters being both activated in cells adjacent to the infection thread. Under our experimental conditions, additional expression of NF-YA1 (driven by the ENOD11 promoter) in these neighbouring cells did not impact their propensity to enter mitosis and to complete cell division. These results are presented in Figure 9-figure supplement 4D (quantification of pKNOLLE reporter signal) and Figure 9-figure supplement 5 (quantification of split neighbouring cells).

      Reviewer #1 (Recommendations For The Authors):

      - In the first part, images show the qualitative presence/absence of H3.1 or H3.3 histones.

      Upon closer inspection, many cells seem to have both histones. In Fig1-S1 for example (root meristem), it is evident that there are many cells with low but clearly present H3.1 content in the green channel; however, in the overlay, the green is lost and H3.3 (pink) is mainly visible. What does this mean in terms of the cell cycle? 

      We fully agree with reviewer #1 on these points. Independent of whether they have low or high proliferation potential, most cells retain histone H3.1 particularly in silent regions of the genome, while H3.3 is constitutively produced and enriched at transcriptionally active regions. When channels are overlaid, cells in an active proliferation or endoreduplication state (in G1, S or G2, depending on the size of their nuclei) will appear mainly "green" (H3.1-eGFP positive). Cells with a low proliferation potential (e.g., in the QC), G2-arrested (e.g., IT-traversed) or terminally differentiating (e.g., containing symbiosomes or arbuscules) will appear mainly "magenta" (H3.1-low, medium to high H3.3-mCherry content).

      Furthermore, all nodule images only display the overlay image, and individual fluorescence channels are not shown. Does the same masking effect happen here? It may be helpful to quantify fluoresce intensity not only in green but also in red channels as done for other experiments.

      Quantifying fluorescence intensity in the mCherry channel may indeed help to highlight the likely replacement of H3.1-eGFP by H3.3-mCherry in infected cells, as described by Otero and colleagues (2016) at the onset of cellular differentiation. However, the quantification method as established (i.e., measuring the corrected total nuclear fluorescence at the equatorial plane) cannot be applied, most of the time, to infected cells' nuclei due to the overlapping presence of mCherry-producing S. meliloti in the same channel (e.g., in Figure 2B). Nevertheless, and to avoid this masking effect when the eGFP and mCherry channels are overlaid, we now present them as isolated channels in revised Figures 1-3 and associated figure supplements. As the cell-wall staining is regularly included and displayed in grayscale, we assigned to both of them the Green Fire Blue lookup table, which maps intensity values to a multiple-colour sequential scheme (with blue or yellow indicating low or high fluorescence levels, respectively). We hope that this will allow a better appreciation of the respective levels of H3.1- and H3.3-fusions in our confocal images.

      - Fig 1 B - it is hard to differentiate between S. meliloti-mCherry and H3.3-mCherry. Is there a way to label the different structures?

      In the revised version of Figure 1B, we used filled or empty arrowheads to point to histone H3-containing nuclei. To label rhizobia-associated structures, we used dashed lines to delineate nodule cells hosting symbiosomes and included the annotation “IT” for infection threads. We also indicated proliferating, endoreduplicating and differentiating tissues and cells using the following annotations: “CD” for cell division, “En” for endoreduplication and “TD” for terminal differentiation. All annotations are explained in the figure legend.

      - Fig 1 - supplement E and F - no statistics are shown.

      We performed non-parametric tests using the latest version of the GraphPad Prism software (version 10.4.1). Stars (Figure 1-figure supplement 1F) or different letters (Figure 1-figure supplement 1G) now indicate statistically significant differences. Results of the normality and non-parametric tests were included in the corresponding Source Data Files (Figure 1 – figure supplement 1 – source data 1 and 2). We have also updated the compact display of letters in other figures as indicated by the new software version. The raw data and the results of the statistical analyses remain unchanged and can be viewed in the corresponding source files.

      - Fig 2 A - overview and close-up image do not seem to be in the same focal plane. This is confusing because the nuclei position is different (so is the infection thread position).

      We fully agree that our former Figure may have confused reviewers #1 and #2 as well as readers. Figure 2A was designed to highlight, from the same nodule primordium, actively dividing cells of the inner cortex (optical section z 6-14) and cells of the outer cortex traversed, penetrated by or neighbouring an infection thread (optical section z 11-19). We initially wanted to show different magnification views of the same confocal image (i.e_._, a full-view of the inner cortex and a zoomed-view of the outer layers) to ensure that audiences can identify these details. In the revised version of Figure 2A, we displayed these full- and zoomed-views in upper and lower panels, respectively and we removed the solid-line inset to avoid confusion. 

      - Fig 1A and Fig 2E could be combined and shown at the beginning of the manuscript. Also, consider making the cell size increase more extreme, as it is important to differentiate G2 cells after H3.1 eviction and cells in G1. You have to look very closely at the graph to see the size differences.

      We have taken each of your suggestions into account. A combined version of our schematic representation with more pronounced nuclei size differences is now presented in Figure 1A.

      - Fig. 3 C is difficult to interpret. Can this be split into different panels?

      We realized that our previous choice of representation may have been confusing. Each value corresponds only to the H3.1-eGFP content, measured in an infected cell and reported to that of the neighbouring cell (IC / NC) within individual root samples. Therefore, we removed the green-magenta colour code and changed the legend accordingly. We hope that these slight modifications will facilitate the interpretation of the results - namely, that the relative level of H3.1 increases significantly in infected cells in the selected mutants compared to the wild-type. This mode of representation also highlights that in the mutants, there are more individual cases where the H3.1 content in an infected cell exceeds that of the neighbouring cell by more than two times. These cases would be masked if the couples of infected cells and associated neighbours would be split into different panels as in Figure 3B.

      - Line 357/359. I assume you mean ...'through the G2 phase can commit to nuclear division'.

      We have edited this sentence according to your suggestion, which now appears in line 370. 

      Reviewer #2 (Recommendations For The Authors):

      Cell cycle control during the nitrogen-fixing symbiosis is an important question but only poorly understood. This manuscript uses largely cell biological methods, which are always of the highest quality - to investigate host cell cycle progression during the early stages of nodule formation, where cortical infection threads penetrate the nodule primordium. The experiments were carefully conducted, the observations were detail oriented, and the results were thought-provoking. The study should be supported by mechanistic insights. 

      (1) One thought provoked by the authors' work is that while the study was carried out at an unprecedented resolution, the relationship between control of the cell cycle and infection thread penetration remains correlative. Is this reduced replicative potential among cells in the infection thread trajectory a consequence of hosting an infection thread, or a prerequisite to do so?

      We understand and share the point of view of reviewer #2. At this stage, we believe that our data won’t enable us to fully answer the question, thus this relationship remains rather correlative. The reasons are that 1) the access to the status of cortical cells below C2 is restricted to fixed material and therefore only represents a snapshot of the situation, and 2) we are currently unable to significantly interfere with mechanisms as intertwined as cell cycle control and infection control. What we can reasonably suggest from our images is that the most favorable window of the cell cycle for cells about to be crossed by an infection thread is post-replicative, i.e., the G2 phase. Typical markers of the G2 phase were recurrently observed at the onset of physical colonization – enlarged nucleus, containing less histone H3.1 than neighbouring cells in S phase (e.g., in Figure 2A). Reaching the G2 phase could therefore be a prerequisite for infection (and associated cellular rearrangements), while prolonged arrest in this same phase is likely a consequence of transcellular passage towards a forming nodule primordium.

      More importantly, in either scenario, what is the functional significance of exiting the cell cycle or endocycle? By stating that "local control of mitotic activity could be especially important for rhizobia to timely cross the middle cortex, where sustained cellular proliferation gives rise to the nodule meristem" (Line 239), the authors seem to believe that cortical cells need to stop the cell cycle to prepare for rhizobia infection. This is certainly reasonable, but the current study provides no proof, yet. To test the functional importance of cell cycle exit, one would interfere with G2/M transition in nodule cells,  and examine the effect on infection.

      We fully agree with reviewer #2 that the functional importance of a cell-cycle arrest on the infection thread trajectory remains to be demonstrated. Interfering with cell-cycle progression in a system as complex and fine-tuned as infected legume roots certainly requires the right timing – at the level of the tissue and of individual cells; the right dose; and the right molecular player(s) (i.e., bona fide activators or repressors of the G2/M transition). Using the symbiosis-specific NPL promoter, activated in the direct vicinity of cortical infection threads (Figure 9-figure supplement 1B), we tried to force infectable cells to recruit the cell division program by ectopically over-expressing the Arabidopsis CYCD3.1, “mimicking” the CDEL system. So far, this strategy has not resulted in a significant increase in the number of uninfected nodules in transgenic hairy roots - though the effect on symbiosome release remains to be investigated. Provided that a suitable promoter-cell cycle regulator combination is identified, we hope to be able to answer this question in the future.

      Given that the authors have already identified a candidate, and showed it represses cell division in the CDEL system, not testing the same gene in a more relevant context seems a lost opportunity. If one ectopically expressed NY-YA1 in hairy roots, thus repressing mitosis in general, would more cells become competent to host infection threads? This seems a straightforward experiment and readily feasible with the constructs that the authors already have. If this view is too naive, the authors should explain why such a functional investigation does not belong in this manuscript.

      Reviewer #2's point is entirely valid, and we decided to address it through additional experiments. To avoid possible side effects on development by affecting cell division in general, we placed NF-YA1 under control of the symbiosis-induced ENOD11 promoter. Based on the results obtained in the CDEL system, the pENOD11::FLAG-NF-YA1 cassette was coupled to a destabilized version of the KNOLLE transcriptional reporter to detect the G2/M transition. Competence for transcellular infection was maintained upon local NFYA1 overexpression, the latter leading to a slight (non-significant) increase in the number of infected cells per cortical layer. These results are presented in Figure 9-figure supplement 3A-B (representative confocal images) and in Figure 9-figure supplement 4A-

      G.

      (1b) A related comment: on Line 183, it was stated that "The H3.1-eGFP fusion protein was also visible in cells penetrated but not fully passed by an infection thread". Presumably, the authors were talking about the cell marked by the arrowhead. But its H3.1-GFP signal looks no different from the cell immediately to its left. It is hard to say which cells are ones "preparing for intracellular infection pass through S-phase", and which ones are just "regularly dividing cortical cells forming the nodule primordium". What can be concluded is that once a cell has been fully transversed by an infection thread, its H3.1 level is low. Whether this is the cause or consequence of infection cannot be resolved simply by timing the appearance or disappearance of H3.1-GFP.

      We basically agree with comment 1b. In an unsynchronized system such as infected hairy roots, it is challenging to detect the event where a cell is penetrated, but not yet completely crossed by an infection thread. What we wanted to emphasize in Figure 2A, is that host cells in the path of an infection thread re-enter the cell cycle and pass through S-phase just as their neighbours do (as pointed out by reviewer #2 in his summary). The larger nucleus with slightly lower H3.1-eGFP signal than the neighbouring cell (as indicated by the use of the Green Fire Blue lookup table) suggests that the infected cell marked by the arrowhead in Figure 2A is actually in the G2 phase. The main difference is indeed that cells allowing complete infection thread passage exit the cell cycle and largely evict H3.1 while their neighbours proceed to cell division (as exemplified by PlaCCI reporters in Figure 4CD and the new Figure 5-figure supplement 2). Whether cell-cycle exit in G2 is a cause, or a consequence of cortical infection is a question that cannot be easily answered from fixed samples, which is a limitation of our study.

      (2) The authors have convincingly demonstrated that cortical cells accommodating infection threads exit the cell cycle, inhibit cell division, and down-regulate KNOLLE expression. How do these observations reconcile with the feature called the pre-infection thread? The authors devoted one paragraph to this question in the Discussion, but this does seem sufficient given that the pre-infection thread is a prominent concept. Is the resemblance to the cell division plane superficial, or does it reflect a co-option of the normal cytokinesis machinery for accommodating rhizobia?

      From our point of view, cortical cells forming pre-infection threads are likely in an intermediate state. PIT structures undoubtedly share many similarities with cells establishing a cell division plane. The recruitment of at least some of the players normally associated with cytokinesis has been demonstrated and is consistent with the maintenance of infectable cells in a pre-mitotic phase in Medicago, as discussed in lines 558 to 568. We nevertheless think that the arrest of the cell cycle in the G2 phase, presumably occurring in crossed cortical cells, constitutes an event of cellular differentiation and specialization in transcellular infection. 

      The following are mainly points of presentation and description: 

      (3) Line 158: I can't see "subnuclear foci" in Figure 1-figure supplement 1C-E. However, they are visible in Fig. 1C.

      We hope that presenting the eGFP and mCherry channels in separate panels and assigning them the Green Fire Blue colour scheme provides better visibility and contrast of these detailed structures. We now refer to Figure 1C in addition to Figure 1–figure supplement 1E in the main text (line 161). 

      (4) Line 160: The authors should outline a larger region containing multiple QC cells, rather than pointing to a single cell, as there are other areas in the image containing cells with the same pattern.

      We updated Figure 1-figure supplement 1E accordingly.

      (5) Fig. 1B should include single channels, since within a single plant cell, the nucleus, the infection thread, and sometimes symbiosomes all have the same color. This makes it hard to see whether the nuclei in these cells are less green, or are simply overwhelmed by the magenta color.

      To improve the readability of Figure 1B and to address suggestions from individual reviewers, we now include separate channels and have annotated the different structures labeled by mCherry.

      (6) Fig. 2A: the close-up does not match the boxed area in the left panel. Based on the labeling, it seems that the two panels are different optical sections. But why choose a different optical depth for the left panel? This can be disorienting to the author, because one expects the close-up to be the same image, just under higher magnification.

      We fully agree that our previous choice of representation may have been confusing. As we also specified to reviewer #1, we wanted to show a full-view of proliferating cells in the inner cortex and a zoomed-view of infected cells in the outer layers of the same nodule primordium. In the revised version of Figure 2A, we displayed these full- and zoomedviews in separate panels and removed the boxed area to avoid confusion. 

      (7) Figure 2-figure supplement 1B: the cell indicated by the empty arrowhead has a striking pattern of H3.1 and H3.3 distribution on condensed chromosomes. Can you comment on that?

      Reviewer #2 may be referring to the apparent enrichment of H3.3 at telomeres, previously described in Arabidopsis, while pericentromeric regions are enriched in H3.1. This distribution is indeed visible on most of the condensed chromosomes shown in Figure 2-figure supplement 1B. We included this comment in the corresponding caption.

      (8) Fig. 4: It is not very easy to distinguish M phase. Can the authors describe how each phase is supposed to look like with the reporters?

      We agree with reviewer #2 and attempted to improve Figure 4, which is now dedicated to the Arabidopsis PlaCCI reporter. ECFP, mCherry, and YFP channels were presented separately and the corresponding cell-cycle phases (in interphase and mitosis) were annotated. The Green Fire Blue lookup table was assigned to each reporter to provide the best visibility of, for example, chromosomes in early prophase. We included a schematic representation corresponding to the distribution of each reporter, using the colors of the overlaid image to facilitate its interpretation.

      (9) Line 298: what is endopolyploid? This term is used at least three times throughout the manuscript. How is it different from polyploid?

      In the manuscript, we aimed to differentiate the (poly)ploidy of an organism (reflecting the number of copies of the basic genome and inherited through the germline) from endopolyploidy produced by individual somatic cells. As reviewed by Scholes and Paige, polyploidy and endopolyploidy differ in important ways, including allelic diversity and chromosome structural differences. In the Medicago truncatula root cortex for example, a tetraploid cell generated via endoreduplication from the diploid state would contain at most two alleles at any locus. The effects of endopolyploidy on cell size, gene expression, cell metabolism and the duration of the mitotic cell cycle are not shared among individual cells or organs, contrasting to a polyploid individual (Scholes and Paige, 2015).

      See Scholes, D. R., & Paige, K. N. (2015). Plasticity in ploidy : A generalized response to stress. Trends in Plant Science, 20(3), 165‑175. https://doi.org/10.1016/j.tplants.2014.11.007

      (10) Line 332: "chromosomes on mitotic figures" - what does this mean?

      Reviewer #2 is right to point out this redundant wording. Mitotic “figures” are recognized, by definition, based on chromosome condensation. We now use the term "mitotic chromosomes" (line 344).

      (11) Fig. 6A: could the authors consider labeling the doublets, at least some of them? I understand that this nucleus contains many doublets. However, this is the first image where one is supposed to recognize these doublets, and pointing out these features can facilitate understanding. Otherwise, a reader might think the image is comparable to nuclei with no doublets in the rest of the figure.

      Following this suggestion, five of these doublets are now labeled in Figure 7A (formerly Figure 6A).

  6. May 2025
    1. Act I is the Introduction, also known as the exposition. Here we are introduced to the “normal world.” Now, the normal world may exist in a far future on an interstellar starship, or it may be set in a suburban ranch house with a swing set in the back yard, but the audience will give us great latitude as we establish the definition of “normal.” In this act, we learn the rules that govern this world, and something about the characters that inhabit it. In the Hegelian dialectic, this is the “thesis.” Act II is the Conflict. This conflict is introduced through an “inciting incident,” an act that disrupts the normal world outlined in act I. The tension introduced during this incident grows throughout the second act. In the Hegelian dialectic, the second act is the “antithesis.” Act III is the Resolution. The conflict is resolved, and the world and the characters in it are revealed to have been changed. In the Hegelian dialectic, the third act is the “synthesis.”

      This also fits the logic of essay/article writing. I think the author forgot to mention that logical structure also contributes to the tension in writings as an important role. You should write something understandable with the tension to attract audience/keep them focused/ask questions spontaneously.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Review

      Manuscript number: RC-2024-02391

      Corresponding author(s): John Varga

      Dibyendu Bhattacharyya

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      Dear editor,

      We are pleased to submit a full revised version of the manuscript that addresses all the points raised by the reviewers. We have included new experiments and modified the text and figures based on the reviewers’ suggestions. We thank all the reviewers for their insightful feedback, which has significantly enhanced the quality of the manuscript. We are confident and optimistic that our improved manuscript will be accepted by the journal of our choice.

      This document is supposed to contain a few images, which were somehow missing after the processing through the manuscript submission path. For convenience we also included a PDF version of the response to reviewers.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1

      • To reliably quantify the ciliary length in different cell types, and in independent ciliary marker needs to be included for comparison and the ciliary base needs to be labeled (e.g., g-TUBULIN). This needs to combined with a non-biased, high-throughput analysis, e.g., CiliaQ, Response: As suggested, we compared primary cilia length measurements using antibodies against Arl13b and γ-tubulin. The comparison between healthy controls (HC) and systemic sclerosis (SSc) is presented in Supplementary Figure S1. No significant differences in primary cilia length were observed compared to our previous measurements. Cilia length was quantified using ImageJ version 1.48v (http://imagej.nih.gov/ij) with the maximum intensity projection (MIP) method and visualized through 3D reconstruction using the ImageJ 3D Viewer.

      • As mentioned in the study, TGFbhas been implicated to drive myofibroblast transition. Thus TGFb stimulate ciliary signaling in the presented primary cells? The authors should provide a read-out for TGFb signaling in the cilium (ICC for protein phosphorylation etc.). Furthermore, canonical ciliary signaling pathways have been suggested to act as fibrotic drivers, such as Hedgehog and Wnt signaling - does stimulation of these pathways evoke a similar effect? Response: Yes, TGF-β1 stimulates ciliary signaling in growth-arrested foreskin fibroblasts. Clement et al. (2013) showed that TGF-β1 induces p-SMAD2/3 at the ciliary base, followed by the nuclear translocation of p-SMAD2/3 after 90 minutes. To assess whether canonical ciliary signaling pathways influence primary cilia length, we treated foreskin fibroblasts with Wnt (#908-SH, R&D) and a Shh agonist (#5036-WN, R&D) at 100 ng/mL each for 24 hours. We did not observe any changes in primary cilia length under either condition. These data are shown here for reference but are not included in the manuscript.

      Clement, Christian Alexandro, et al. "TGF-β signaling is associated with endocytosis at the pocket region of the primary cilium." Cell reports 3.6 (2013): 1806-1814.

      • Does TGFbinduce cell proliferation? If yes, this would force cilium disassembly and, thereby, reduce ciliary length, which is independent of a "shortening" mechanism proposed by the authors. Response: Yes, TGF-β induces cell proliferation in fibroblasts (Lee et al., 2013; Liu et al., 2016). However, we did serum starvation to stop proliferation. In our study, we observed a few percentage of Ki67-positive cells under TGF-β treatment at 24 hours (Supplementary Figure S2C). However, cell proliferation mainly stopped after 48 hours. Typically, proliferating cells rarely display any PC or show very small puncta. In our case, we observe a significantly elongated PC structure (although shorter than that of untreated cells) under TGF-beta-treated conditions. Our results display that a majority of cells are not proliferating but still display PC shortening under TGF-β treatment, suggesting that PC shortening is not due to cell division-induced PC disassembly. TGF beta-induced PC shortening is also reported in another fibroblast type previously (Kawasaki et al., 2024).

      Kawasaki, Makiri, et al. "Primary cilia suppress the fibrotic activity of atrial fibroblasts from patients with atrial fibrillation in vitro." Scientific Reports 14.1 (2024): 12470.

      Lee, J., Choi, JH. & Joo, CK. TGF-β1 regulates cell fate during epithelial–mesenchymal transition by upregulating survivin. Cell Death Dis 4, e714 (2013). https://doi.org/10.1038/cddis.2013.244.

      Liu, Y. et al. TGF-β1 promotes scar fibroblasts proliferation and transdifferentiation via up-regulating MicroRNA-21. Sci. Rep. 6, 32231; doi: 10.1038/srep32231 (2016).

      • As PGE2 has been shown to signal through EP4 receptors in the cilium, is the restoration of primary cilia length due to ciliary signaling? Response: As per your suggestion, we measured cilia length in the presence and absence of the EP4 receptor antagonist (#EP4 Receptor Antagonist 1; #32722; Cayman Chemicals; 500 nM) with PGE2. Interestingly, we did not observe a change in cilia length between the PGE2 and TGFβ (with EP4 receptor antagonist) treatment groups, as shown in supplementary figure S3. We believe that PGE2 works with the EP2 receptor under our experimental conditions. Kolodsick et al., 2003, also observed that PGE2 inhibits myofibroblast differentiation via activation of EP2 receptors and elevations in cAMP levels in healthy lung fibroblasts.

      Kolodsick, Jill E., et al. "Prostaglandin E2 inhibits fibroblast to myofibroblast transition via E. prostanoid receptor 2 signaling and cyclic adenosine monophosphate elevation." American journal of respiratory cell and molecular biology 29.5 (2003): 537-544.

      • Primary cilia length is regulated by cAMP signaling in the cilium vs. cytoplasm - does cAMP signaling play a role in this context? PGE2 is potent stimulator of cAMP synthesis - does this underlie the rescue of primary cilia length? Response: Yes, cAMP levels are important for both myofibroblast dedifferentiation and cilia length elongation. Kolodsick et al., 2003 observed that PGE2 inhibits myofibroblast differentiation via activation of EP2 receptors and elevations in cAMP levels in healthy lung fibroblasts. In a parallel set of experiments, treatment with forskolin (a cAMP activator) also reduced α-SMA protein levels by 40%. Forskolin is also known to increase PC length.

      Kolodsick, Jill E., et al. "Prostaglandin E2 inhibits fibroblast to myofibroblast transition via E. prostanoid receptor 2 signaling and cyclic adenosine monophosphate elevation." American journal of respiratory cell and molecular biology 29.5 (2003): 537-544.

      • The authors describe that they wanted to investigate how aSMA impacted primary cilia length. They only provide a knock-down experiment and measured ciliary length, but the mechanistic insight is missing. How does loss of aSMA expression control ciliary length? Response: We measured acetylated α-tubulin levels in ACTA2 siRNA-treated cells compared to control-treated cells. Acetylated α-tubulin levels increased under ACTA2 siRNA-treated conditions, as shown in Figure 4D, and TPPP3 levels were also elevated (Figure S8A). Interestingly, TPPP3 levels negatively correlated with disease severity in SSc fibroblasts (r = -0.2701, p = 0.0183), and TPPP3 expression significantly reduced in SSc skin biopsies, as shown in Figures 6C and 6D. These results strengthen our hypothesis that microtubule polymerization and actin polymerization, while they counterbalance each other, also contrarily affect PC length. We agree that a much more detailed study is needed to extensively delineate the intricate homeostasis of the actin network and microtubule network in conjunction with fibrosis and primary cilia length. We have mentioned this in the discussion.

      • The authors used LiCl in their experiments, which supposedly control Hh signaling. Coming back to my second questions, is this Hh-dependent? And what is the common denominator with respect to TGFbsignaling? And how is this mechanistically connected to actin and microtubule polymerization? Response: We used Shh inhibitor (Cyclopamine hydrate #C4116 Sigma-Aldrich) in both SSc and foreskin fibroblasts (with and without TGFβ). We found that PC length is significantly increased and αSMA intensity is reduced in the Shh inhibitor treated group (data not included in the Manuscript)

      • How was the aSMA Mean intensity determined? Response: We quantified aSMA mean intensity using ImageJ, and the procedure has been added to the respective figure legend and materials and methods section under ‘Quantification of immunofluorescence’ (each point represents mean intensity from three randomly selected hpf/slide was performed using ImageJ).

      • Fig: 1D: Statistical test is missing in Figure Legend and presentation of the p-values for the left graph is confusing Response: We added statistical test information in Figure Legend.

      • Some graphs are presented {plus minus} SD and some {plus minus} SEM, but this is not correctly stated in the Material & Methods Part __Response: __We added information to the figure legend as well as in the Material & Methods section.

        • 4D&E: Statistical test is missing in Figure Legend* Response: We added it now.
      • In general, text should be checked again for spelling mistakes and sentences may be re-written to promote readability. In particular, this applies to the discussion. __Response: __We checked and corrected.

      • Figure Legends are not written consistently, information is missing (e.g., statistical tests, see above). __Response: __We carefully checked and added information accordingly.

      • Figures should be checked again, and all text should be the same size and alignment of images should be improved. __Response: __We checked and corrected.

      Significance

      The authors present a novel connection between the regulation of primary cilia length and fibrogenesis. However, the study generally lacks mechanistic insight, in particular on how TGFb signaling, aSMA expression, and ciliary length control are connected. The spatial organization of the proposed signaling components is also not clear - is this a ciliary signaling pathway? If so, how does it interact with cytoplasmic signaling and vice versa?

      Response: Thank you for your thoughtful and constructive feedback. We appreciate your recognition of the novelty of our study linking primary cilia length regulation to fibrogenesis. In our revised manuscript, we did provide a mechanistic insight, though. Our results suggest that during the fibrotic response, higher-order actin polymerization, along with microtubule destabilization resulting from tubulin deacetylation, drives the shortening of PC length. In contrast, PC length elongation via stabilization of microtubule polymerization mitigates the fibrotic phenotype in fibrotic fibroblasts. We agree that a deeper mechanistic understanding particularly regarding how TGFβ signaling, αSMA expression, and ciliary length control intersect is essential for fully elucidating the pathway. We also acknowledge the importance of clarifying the spatial organization of the signaling components and plan to incorporate such analyses in future studies.

      Reviewer #2

      *I found the paper to be rather muddled and its presentation made if somewhat difficult to follow. For example, the Figures are disorganised (Fig 1 is a great example of this) and there was reference to Sup data that appeared out of order (eg Sup Fig 2 appeared before Sup Fig 1 in the text). *

      Response: We carefully revised the manuscript and arranged the figures.

      *Images in a single figure should be the same size. Currently they are almost random and us different magnifications. Overall, the paper needs to be better organized. *

      Response: We carefully revised the manuscript and figures provided with same magnification.

      *I have some significant concerns about how the PC length data was generated. To my mind the length may be hard to determine from the type of images shown in the paper (which may represent the best images?). Some of the images presented appear to show shorter, fatter PCs in the cells from fibrosis cases. Is this real or is it some kind of artefact? Would a shorter, fatter PCs have a similar or larger surface area? What would be the consequence of this? *

      Response: Primary cilia length was measured with ImageJ1.48v (using maximum intensity projection (MIP) method and visualized by 3D reconstruction with the ImageJ 3D viewer. Each small dot represents the PC length from an individual cell, and each large dot represents the average of the small dots for one cell line.

      *I am confused as to exactly what is meant by matched healthy controls. Age, sex and ethnicity, where stated seem to be very variable? What are CCL210 fibroblasts? *

      Response: We appreciate this comment. This is correct. The age, sex, and ethnicity are not matched for the available healthy controls. We have corrected that in the text. CCL210 is a commercially available fibroblast cell line that was isolated from the lung of a normal White, 20-year-old, female patient.

      *What does a change in PC length signify? DO shot PC foe a cellular transition or are they a consequence of it? What would happen is you targeted PCs with a drug and that influenced the length on all cell types? Is the effect on PC fibroblast specific? *

      __Response: __Significance and regulation of PC length are greatly debated and investigated still. It appears that PC length signify different features in different cell types. Although these are very interesting questions but such experiments are beyond the scope of our present work.

      Minor concerns

      *Page 4 second paragraph. I think it should be clarified that it is this group who have suggested a link between PCs and myofibroblast transition? *

      __Response: __We agree with the reviewer and clarified it.

      *Page 4 second paragraph. The use of the word "remarkably' is a bit subjective. *

      __Response: __We agree with the reviewer and have removed it.

      *Reference 27 is a paper on multiciliogenesis rather than primary ciliogenesis. *

      __Response: __We agree with the reviewer and have removed it.

      Figure 1 panel D. Make the image with the same sized vertical scale

      __Response: __We have replaced it with a new Figure 1.

      Significance

      Reviewer #2 (Significance (Required)):

      To my mind this is a novel paper and the data presented in it may be of interest to the cilia community as well as to the fibrosis field. This could be considered to be a significant advance and I am unaware that other groups are actively working in this area.

      Presentation of the data in the current form does not instil confidence in the work.

      Response: ____Thank you for recognizing the novelty and potential significance of our work. We appreciate your comments and fully acknowledge the concern regarding the presentation of the data. We have carefully revised the manuscript and reorganized the figures to improve clarity and overall presentation.

      Reviewer #3

      Major comments:

      • Need to demonstrate if the fibrotic phenotypes seen are produced through a ciliary-dependent mechanism. For example, to see if LiCl effects on Cgn1 are through ciliary expression or by other mechanisms. To achieve that objective, The authors should repeat the experiments in cells with a knockdown or knockout of ciliary proteins such as IFT20, IFT88, etc. The same approach should be applied to the tubacin experiments. Response: We silenced foreskin fibroblasts with IFT88/IFT20, both in the presence and absence of TGF-β1, followed by treatment with LiCl and Tubacin. Both LiCl and Tubacin can rescue cilia length and mitigate the myofibroblast phenotype in the presence of silenced IFT88/IFT20 gene, as shown in supplementary figure S9. Our result suggests that LiCl and Tubacin functions are both independent of the IFT-mediated ciliary mechanism. Regulation of PC length is still an enigma and highly debated. Moreover, PC length can be affected in multiple ways and is not solely dependent on IFTs (Avasthi and Marshall, 2012). One such method is the direct modification of the axoneme by altering microtubule stability through the acetylation state (Avasthi and Marshall, 2012), a pathway most likely the case for Tubacin. Another mode of PC length regulation is through a change in Actin polymerization. The remodeling of actin between contractile stress fibers and a cortical network alters conditions that are hospitable to basal body docking and maintenance at the cell surface (Avasthi and Marshall, 2012), causing PC length variation. Our results suggest that PC length functions as a sensor of the status of the fibrotic condition, as evidenced by the aSMA levels of the cells.

      Avasthi, P., and W.F. Marshall. 2012. Stages of ciliogenesis and regulation of ciliary length. Differentiation. 83:S30-42.

      • The use of LiCl to increase ciliary length is complicated. What are the molecular mechanisms underlying this effect? It is known that it may be affecting GSK-3b, which can have other ciliary-independent effects. Therefore, using ciliary KO/KD cells (IFT88 or IFT20) as controls may help assess the specificity of the proposed treatments. Response: As explained in the previous paragraph, PC length regulations are dependent on multiple factors and many of them are not IFT dependent. One such method is directly modifying the axoneme by altering microtubule stability/polymerization through the acetylation state(Avasthi and Marshall, 2012), a pathway most likely the case for Tubacin. Another mode of PC length regulation is through a change in Actin polymerization. The remodeling of actin between contractile stress fibers and a cortical network alters conditions that are hospitable to basal body docking and maintenance at the cell surface (Avasthi and Marshall, 2012), causing PC length variation. Higher order microtubule polymerization inhibit actin polymerization. By interrogating RNA-seq data we determined that several PC-disassembly related genes (KIF4A, KIF26A, KIF26B, KIF18A), as well as microtubule polymerization protein genes (TPPP, TPPP3, TUBB, TUBB2A etc), were differentially expressed in LiCl-treated SSc fibroblasts (Suppl. Fig. S6D). Altogether, these findings suggest that microtubule polymerization/depolymerization mechanisms may regulate PC elongation and attenuation of fibrotic responses after either LiCl or Tubacin treatment.

      • Also, assessing the frequency of ciliary-expressing cells is important. That may give another variable important to predict fibrotic phenotypes. Or do 100% of the cultured cells express cilia in those conditions? Response: We carefully checked and observed almost 95% cells express cilia in cultured conditions.

      • Have the authors evaluated if TGF-b1 treatments induce cell cycle re-entry and proliferation in these experimental conditions? This is important to exclude ciliary resorption due to cell cycle re-entry instead of the myofibroblast activation process. __Response:__Yes, TGF-β induces cell proliferation in fibroblasts (Lee et al., 2013; Liu et al., 2016). However, we did serum starvation to stop proliferation. In our study, we observed a few percentage of Ki67-positive cells under TGF-β treatment at 24 hours (Supplementary Figure S2C). However, cell proliferation mainly stopped after 48 hours. Typically, proliferating cells rarely display any PC or show very small puncta. In our case, we observe a significantly elongated PC structure (although shorter than that of untreated cells) under TGF-beta-treated conditions. Our results display that a majority of cells are not proliferating but still display PC shortening under TGF-β treatment, suggesting that PC shortening is not due to cell division-induced PC disassembly. TGF beta-induced PC shortening is also reported in another fibroblast type previously (Kawasaki et al., 2024).

      Kawasaki, Makiri, et al. "Primary cilia suppress the fibrotic activity of atrial fibroblasts from patients with atrial fibrillation in vitro." Scientific Reports 14.1 (2024): 12470.

      Lee, J., Choi, JH. & Joo, CK. TGF-β1 regulates cell fate during epithelial–mesenchymal transition by upregulating survivin. Cell Death Dis 4, e714 (2013). https://doi.org/10.1038/cddis.2013.244.

      Liu, Y. et al. TGF-β1 promotes scar fibroblasts proliferation and transdifferentiation via up-regulating MicroRNA-21. Sci. Rep. 6, 32231; doi: 10.1038/srep32231 (2016).

      • The authors described that they focused on the genes that are affected in opposite ways (supp table 4), but TEAD2, MICALL1, and HDAC6 are not listed in that table. Response: The list in Supplementary Table S3 includes common genes defined as differentially expressed based on a fold change >1 or Minor comments:

      • Figure 1A,B,C should also show lower magnification images where several cells/field are visualized. Response: We have replaced it with a new Figure 1.

      • The number of patients analyzed is not clear. For example, M&M describes 5 healthy and 8 SSc, but only 3 and 4 are shown in the figure. Furthermore, for orbital fibrosis, 2 healthy vs. 2 TAO are mentioned in the figure legend, but only one of each showed. Finally, the healthy control for lung fibroblast seems to be 3 independent experiments of the CCL210 cell line; please show the three independent controls and clarify on the X-axis and in the figure legend that these are CCL210 cells. Response: A total of 5 healthy and 8 SSc skin explanted fibroblast cell lines were used, as described in the Materials and Methods. Since these are patient-derived skin fibroblasts, maintaining equal numbers in each experiment is challenging. Revised graphs for orbital fibroblasts and CCL210 have been added in the new Figures 1B and 1C.

      • For the same set of experiments, please clarify and consistently describe the conditions that promote PC: 12hs serum starvation as described in M&M? Or 24hs as described in the text? Or 16 as described in figure legend 1? Or 24hs as described in supp figure 2? Response: We serum-starved the cells overnight, and this is also mentioned in the manuscript.

      • Please confirm in figure legends and M&M that 100 cells per group were counted. Response: We measured only 100 cells per cell line in Supplementary Figure S1B. To eliminate any confusion, we have now created a superplot for cilia analysis. Each small dot represents the PC length from an individual cell, and each large dot represents the average of the small dots for one cell line. An unpaired two-tailed t-test was performed on the small dots (mean ± SD).

      • Figure 2 should also provide lower magnification to show several cells per field. Response: Foreskin fibroblasts treated with TGF-β1 are added in S2A.

      • How do you explain that the increase in length of primary cilia after siACTA2 doesn't change COL1A1? Wouldn't it be a good approach to also check by Western Blot? Response: We believe that depletion of aSMA was sufficient to reduce the PC length for the reason described earlier (Avasthi and Marshall, 2012), but was not sufficient enough to change COL1A1 level. We added the western blot in Supplementary Figure S8B.

      • Once more, figure 5 will benefit from low mag images. How consistent is the effect of LiCl in the cultured cells? What is the percentage of rescued cells? Response: LiCl treatment was consistent for almost all the cells (~95%) as shown below and added in S4A.

      • Figure 5, panels F and G need better explanation in the results text as well as in the figure legend. Response: We added now.

      • 9) Some figures/supp figures are wrongly referenced in the text. *

      __ Response:__ We carefully revised the manuscript and corrected the references.

      10) Figure 6, panel A is confusing. Is it a comparison between SSC skin fibroblasts and foreskin fibroblasts? Maybe show labels on the panel.

      __ Response:__ We updated the figure legend for Panel A in Figure 6.

      11) Where is Figure 8 mentioned in the text?

      __ Response:__ In the discussion section.

      12) The work will benefit from an initial paragraph in the discussion enumerating the findings and a summary of the conclusion at the end.

      Response: We agree and modified the discussion accordingly.

      13) The nintedanib experiments are not described in the results section at all.

      Response: All nintedanib experiments are now included in Figure S5C-F and are described in the Results section.

      Significance

      Reviewer #3 (Significance (Required)): Beyond the lack of in situ ciliary expression assessment, the work is exciting, and the potential implications of treating/preventing fibrosis with small molecules to modulate ciliary length could be transformative in the field. Furthermore, there are a few HDAC6 inhibitors already in clinical trials for different tumors, which increases the significance of the work.

      Response: Thank you for your encouraging comments regarding the potential impact of our findings. We agree that the therapeutic implications of modulating ciliary length, particularly using small molecules such as HDAC6 inhibitors already in clinical trials, could be transformative in the context of fibrosis. We also acknowledge the importance of in situ assessment of ciliary expression and plan to incorporate such analyses in future studies to further strengthen our findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalise these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed.

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it is even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model’s results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which so far has not been brought up when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting. This study, however, focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling. The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

      Summary from the Reviewing Editor:

      The authors present a simple one-dimensional biophysical model to describe the gliding motion and the observed statistics of trajectory reversals. However, the model does not capture some important experimental findings, such as the buckling occurring in long filaments, and the coupling between rotation, slime generation, and motion. More effort is recommended to integrate the information gathered on these different aspects to provide a more unified understanding of filament motility. In particular, the referees suggest performing a more quantitative analysis of the buckling in long filaments. Finally, it is also recommended to discuss the results in the context of previous literature, in order to better explain their relevance. Please find below the detailed individual recommendations of the three reviewers.

      We thank the editor for this accurate summary of the presented work and for highlighting the key points raised by the reviewers. We have provided below point-by-point replies to these.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The relevance of the study organism Fluctiforma draycotensis is not clearly explained, and the results are not discussed in the context of previous literature. The motivation would be clearer if the manuscript explained why this model organism was chosen and how the results compare with those previously observed for this or other organisms.

      We have extended the introduction and discussion sections to make it clearer why we have worked with this organism and how the findings from this work relate to previous ones. In brief, Flucitforma draycotensis is a useful organism to work with as it not only displays significant motility but it also displays intriguing collective behaviour at different scales. Previous works on gliding motility in filamentous cyanobacteria have mostly focussed on the model organism Nostoc punctiforme, which only displays motility after differentiation into hormogonia [1]. There have also been studies in a range of different filamentous species, including those of the non-monophyletic genus, Phormidium, but these studies mostly looked at effects of genetic deletions on motility [2] or utilised electron microscopy to identify proteins (or surface features) involved in motility [3-5]. It must be noted that motility is also described and studied in non-filamentous cyanobacteria, but the dynamics of motion and molecular mechanisms there are different to filamentous cyanobacteria [6,7]. These previous studies are now cited / summarised in the revised introduction and discussion sections.

      The inferred tracks, probably associated with secreted slime, play a key role since it is supposed that the tracks provide the external force that keeps the filaments straight. Movie S3, in phase contrast, provides convincing evidence for the tracks, but they cannot be seen in the fluorescence images presented in the main text. Clearer evidence of them should be shown in the main text. An especially important aspect of the tracks is where they start and end since the computational model assumes that reversal happens due to forces generated by reaching the end of a track. Therefore it seems important to comment on what produces the tracks, to check whether reversals actually happen at the end of a track, etc. Perhaps tracks could be strained with Concanavalin-A?

      To confirm that reversals happen on track ends, we have now performed an analysis on agar, where we can see tracks on phase microscopy. This analysis confirms that, on agar, reversals indeed happen on track ends. We added this analysis, along with images showing tracks clearly as a new Fig in the main text (see new Fig. 1).

      Further confirming the reversal at track ends, we note that filaments on circular tracks do not not reverse over durations longer than the ‘expected reversal interval’ of a filament on a straight track (see details in response to Reviewer 2).

      Regarding what produces the tracks on agar, we are still analysing this using different methods and these results will be part of a future study. Fluorescent staining can be used to visualise slime tubes using TIRF microscopy, as shown in Fig. S8, however, visualising tracks on agar using low magnification microscopy has been difficult due to background fluorescence from agar.

      We would also like to clarify that the model does not incorporate any assumptions regarding the track-filament interaction, other than that the track ends behave akin to a physical boundary for the filament. The observed reversal at track ends and “what” produces the track are distinct aspects of filament motion. We do not think that the model’s assumption of filament reversal at the end of the track requires understanding of the mechanism of slime production.

      Reviewer #3 (Recommendations for the authors):

      The manuscript combines three distinct topics: (1) the difference in locomotion on glass vs agar, (2) the development of a biophysical model, and (3) the helical motion of filament. It is not clear what insight one can gain from any one of these topics about the two others. The manuscript would be strengthened by more clearly connecting these three aspects of the work. A stronger comparison of theory to observation would be very useful. Some suggestions:

      (1) The observation that it is only the longest filaments that buckle is interesting. It should be possible to predict the critical length from the biophysical model. Doing so could allow fits of some model parameters.

      (2) What model parameters change between glass and agar? Can you explain these qualitative differences in motility by changing one model parameter?

      (3) Is it possible to exert a force on one end of a filament to see if it is really mechano-sensing that couples their motion?

      We thank the reviewer for this comment and agree with them that a better connection between model and experiment should be sought. We believe that the new analyses, presented below in response to the 2nd suggestion of the reviewer, provide such a connection in the context of reversal frequency. As stated below, we think that the 1st suggestion falls outside of the scope of the current work, but should form the basis of a future study.

      Regarding suggestion (1) - addressing buckling:

      We agree with the reviewer that using a model to predict a critical buckling length would be useful. We note, however, that the presented study focussed on cell-to-cell coupling / coordination during filament motility using a 1D, beadchain model. The buckling observations served, in this context, as evidence of cellular de-coordination. Now that we have observed buckling (and plectoneme formation), these processes need to be analysed with further experiments and modelling. The appropriate model for studying buckling would have to be at least 2D (ideally 3D) and consider elastic forces and torques relating to filament bending, rotation, and twisting. Experimentally, we need to identify means of influencing filament length and motion and undertake further measurements of buckling frequency and position across different filament lengths. These investigations are ongoing and will be summarised in a separate, future publication.

      Regarding suggestion (2) - addressing differences in motility on agar vs. glass:

      We believe that the two key differences between agar and glass experiments are the occasional detachment of filaments from substrate on glass and the lack of confining tracks on glass. These differences might arise from the interactions between the filament, the slime, and the surface. As both slime and agar contain polysaccharides, the slime-agar interaction can be expected to be different from the slime-glass interaction. Additionally, in the agar experiments, the filaments are confined between the agar and a glass slide, while they are not confined on the glass, leaving them free to lift up from the glass surface. We expect these factors to alter reversal frequency between the two conditions. To explore this possibility, we have now extended the analysis of experimental data from glass and present that (see details below):

      (i) dwell times are similar between agar and glass, and

      (ii) reversal frequency distribution is different between glass and agar, and remains constant across filament length on glass.

      We were able to explore these experimental findings with new model simulations, by removing the assumption of an “external bounding frame”. We then analysed reversal frequency within against model parameters, as detailed below.

      “The movement of the filaments on glass. We have extended our analysis of motility on glass resulting in the following noted features. Firstly, the median speed shows a weak positive correlation with filament length on glass (see original Fig S3B vs. updated Fig. S3A). This is slightly different to agar, where we do not observe any strong correlation in either direction (see original, Fig. 1 vs. updated Fig 2). Both the cases of positive, and no correlation, support our original hypothesis that the propulsion force is generated by multiple cells within the filament.

      Secondly, the filaments on glass display ‘stopping’ events that are not followed by a reversal, but are instead followed by a continuation in the original direction of motion, which we term ‘stop-go’ events, in contrast to the reversals. The dwell times associated with reversals and ‘stop-go’ events are similarly distributed (see original Fig S3A vs. updated Fig S3B). Furthermore, the dwell time distributions are similar between agar and glass (compare old Fig. 1C vs. new Fig 2C and new Fig. S3B). This suggests that the reversal process is the same on both agar and glass.

      Thirdly, we find that the frequencies of both reversal and stop-go events on glass are uncorrelated with the filament length (see new Fig. S4A) and there are approximately twice as many reversals as stop-go events. In contrast, the filaments on agar reverse with a frequency that is inversely proportional to the filament length (which is in turn proportional to the track length) (see original Fig. S1). The distribution of reversal frequencies on agar is broader and flatter than the distribution on glass (see new Fig. S4B). These findings are inline with the idea that tracks on agar (which are defined by filament length) dictate reversal frequency, resulting in the strong correlations we observe between reversal frequency, track length, and filament length. On glass, filament movement is not constrained by tracks, and we have a specific reversal frequency independent of filament length.”

      “Model can capture movement of filaments on glass and provides hypotheses regarding constancy of reversal frequency with length. We believe the model parameters controlling cellular memory (ω<sub>max</sub>) and strength of cellular coupling (K<sub>ω</sub>) describe the internal behaviour of a filament and therefore should not change depending on the substrate. Thus, we expect the model to be able to capture movement on glass just by removal of any ‘confining tracks’, i.e external forces, from the simulations. Indeed, we find that the model displays both stop-go and reversal events when simulated without any external force and can capture the dwell time distribution under this condition (compare new Figs. S12,S13 with S3).

      In terms of reversal frequency, however, the model shows a reduction in reversal frequency with filament length (see new Fig. S15). This is in contrast to the experimental data. We find, however, that model results also show a reduction in reversal frequency with increasing (ω<sub>max</sub> and K<sub>ω</sub> (see new Fig. S14 and S15). This effect is stronger with (ω<sub>max</sub>, while it quickly saturates with K<sub>ω</sub> (see new Fig. S14). Therefore, one possibility of reconciling the model and experiment results in terms of constant reversal frequency with filament length would be to assume that (ω<sub>max</sub> is decreasing with filament length (see new Fig. S16). Testing this hypothesis - or adding additional mechanisms into the model - will constitute the basis of future studies.”

      Regarding suggestion (3) - role of mechanosensing:

      We have tried several experiments to evaluate mechanosensing. First, we have used a micropipette or a thin wire placed on the agar, to create a physical barrier in the way of the filaments. The micropipette approach was not quite feasible in our current setup. The wire approach was possible to implement, but the wire caused a significant undulation / perturbation on agar. Possibly relating to this, filaments tended to continue moving alongside the wire barrier. Therefore, these experiments were inconclusive at this stage with regards to mechanosensing a physical barrier. As an alternative, we have attempted trapping gliding filaments using an optical trap with a far red laser that should not affect the physiology of the cells. This did not cause an immediate reversal in filament motion. However, this could be due to the optical trap strength being below the threshold value for mechanosensing. The force per unit length generated by filamentous cyanobacteria has been calculated via a model of self-buckling rods, giving a value of ≈1nN/μm [8]. In comparison, the optical trap generates forces on the scale of pN. Thus, the trap force is several orders of magnitude lower than the propulsive force generated by a filament, given filament lengths in the range of ten to several hundreds μm. We conclude that the lack of observed response may be due to the optical trap force being too weak.

      Thus, the experiments we can perform using our current available methods and equipment are not able to prove either the presence or the absence of mechanosensing in the filament. We plan to perform further experiments in this direction, involving new and/or improved experimental setups, such as use of Atomic Force Microscopy.

      We would like to note that there is an additional observation that supports the idea of reversals being mediated by mechanosensing at the end of a track, instead of the locations of the track ends being caused by the intrinsic reversal frequency of the filament. In a few instances (N = 4), filaments on agar ended up on a circular track (see Movie S4 for an example). These filaments did not reverse over durations a few times longer than the ‘expected reversal interval’ of a filament on a straight track.

      Should $N$ following eq 7 and in eq 9 be $N_f$?

      We have corrected this typo.

      It would be useful to include references to what is known about mechanosensing in cyanobacteria.

      We agree with the reviewer, and we have not updated the discussion section to include this information. Mechanosensing has not yet been shown directly in any cyanobacteria, but several species are shown to harbor genes that are implicated (by homology) to be involved in mechanosensing. In particular, analysis of cyanobacterial genomes predicts the presence of a significant number of homologues of the Escherichia coli mechanosensory ion channels MscS and MscL [9]. We have also identified similar MscS protein sequences in F. draycotensis. These channels open when the membrane tension increases, allowing the cell to protect itself from swelling and rupturing when subject to extreme osmotic shock. [10,11]

      We also note that F. draycotensis, as with other filamentous cyanobacteria, have genes associated with the type IV pili, which may be involved in the surface-based motility [1]. Type IV pili have been shown to be mechanosensitive. For example, in cells of Pseudomonas aeruginosa that ‘twitch’ on a surface using type IV pili, application of mechanical shear stress results in increased production of an intracellular signalling molecule involved in promoting biofilm production. The pilus retraction motor has been shown to be involved in this shear-sensing response [12]. Additionally, twitching P. aeruginosa cells often reverse in response to collisions with other cells. Reversal is also caused by collisions with inert glass microfibres, which suggests that the pili-based motility can be affected by a mechanical stimulus [13].

      References

      (1) D. D. Risser, Hormogonium Development and Motility in Filamentous Cyanobacteria. Appl Environ Microbiol 89, e0039223 (2023).

      (2) T. Lamparter et al., The involvement of type IV pili and the phytochrome CphA in gliding motility, lateral motility and photophobotaxis of the cyanobacterium Phormidium lacuna. PLoS One 17, e0249509 (2022)

      (3) E. Hoiczyk, Gliding motility in cyanobacteria: observations and possible explanations. Arch Microbiol 174, 11-17 (2000).

      (4) D. G. Adams, D. Ashworth, B. Nelmes, Fibrillar Array in the Cell Wall of a Gliding Filamentous Cyanobacterium. Journal of Bacteriology 181 (1999).

      (5) L. N. Halfen, R. W. Castenholz, Gliding in a blue-green alga: a possible mechanism. Nature 225, 1163-1165 (1970).

      (6) S. N. Menon, P. Varuni, F. Bunbury, D. Bhaya, G. I. Menon, Phototaxis in Cyanobacteria: From Mutants to Models of Collective Behavior. mBio 12, e0239821 (2021).

      (7) F. D. Conradi, C. W. Mullineaux, A. Wilde, The Role of the Cyanobacterial Type IV Pilus Machinery in Finding and Maintaining a Favourable Environment. Life (Basel) 10 (2020).

      (8) M. Kurjahn, A. Deka, A. Girot, L. Abbaspour, S. Klumpp, M. Lorenz, O. Bäumchen, S. Karpitschka Quantifying gliding forces of filamentous cyanobacteria by self-buckling. eLife 12:RP87450 (2024).

      (9) S.C. Johnson, J. Veres, H. R. Malcolm, Exploring the diversity of mechanosensitive channels in bacterial genomes. Eur Biophys J 50, 25–36 (2021).

      (10) S.I. Sukharev, W.J. Sigurdson, C. Kung, F. Sachs, Energetic and spatial parameters for gating of the bacterial large conductance mechanosensitive channel, MscL. Journal of General Physiology, 113(4), 525-540 (1999).

      (11) N. Levina, S. Tötemeyer, N.R. Stoke, P. Louis, M.A. Jones, I.R. Boot. Protection of Escherichia coli cells against extreme turgor by activation of MscS and MscL mechanosensitive channels: identification of genes required for MscS activity. The EMBO journal (1999).

      (12) V.D. Gordon, L. Wang, Bacterial mechanosensing: the force will be with you, always. Journal of cell science 132(7):jcs227694 (2019).

      (13) M.J. Kühn, L. Talà, Y.F. Inclan, R. Patino, X. Pierrat, I. Vos, Z. Al-Mayyah, H. Macmillan, J. Negrete Jr, J.N. Engel, A. Persat, Mechanotaxis directs Pseudomonas aeruginosa twitching motility. Proceedings of the National Academy of Sciences. 118(30):e2101759118 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews:

      We sincerely thank the reviewers for their thoughtful review and feedback. We believe that our work will provide valuable insights into how MRSA evolves under bacteriophage predation and stimulate efforts to use genetic trade-offs to combat drug resistance. We have substantially revised the paper and performed several additional experiments to address the reviewers' questions and concerns.

      Summary:

      (1) Testing for genetic trade-offs in additional S. aureus strains

      We obtained 30 clinical isolates of the S. aureus USA300 strain that were isolated between 2008 and 2011 (see Table S1). We first tested the FStaph1N, Evo2, and FNM1g6 phages against this expanded strain panel and found that Evo2 showed strong activity against all 30 strains (Table S4). We tested whether Evo2 infection could elicit trade-offs in b-lactam resistance for a subset of these strains. We found that Evo2 infection caused a ~10-100-fold reduction in their MIC against oxacillin. This data is now incorporated into a revised Figure 2 in panel C.

      (2) Testing additional staphylococcal phages

      We isolated from the environment a phage called SATA8505. Similar to FStaph1N and Evo2, SATA8505 belongs to the Kayvirus genus and infects the MRSA strains MRSA252, MW2, and LAC. Phage-resistant MRSA recovered following SATA8505 infection also showed a strong reduction in oxacillin resistance (Figure S5). Furthermore, we confirmed that resistance against FNM1g6, which belongs to the Dubowvirus genes, does not elicit tradeoffs in b-lactam resistance (Figure S4). Sequencing analysis of FNM1g6 - resistant LAC strains showed a different mutation fmhC, which was not observed with the FStaph1N and Evo2 phages (Table 1). We have added this new data into the main text and supplemental figures and tables. Future work will focus on obtaining comprehensive analysis of a wide range of phage families. 

      (3) Testing additional antibiotics

      We also expanded our trade-off analysis include wider range of antibiotic classes (Table S3). Overall, the loss of resistance appears to be confined to b-lactams.

      (4) Genetic analysis of ORF141

      In order determine the function of ORF141, which is mutated in Evo2, we attempted to clone wild-type ORF141 into a staphylococcal plasmid and perform complementation assays with Evo2. Unfortunately, obtaining the plasmid-borne wild-type ORF141 has proven to be tricky, as all clones developed frameshift or deletions in the open reading frame. We posit that the gene product of ORF141 is toxic to the bacteria. We are currently working on placing the gene under more stringent expression conditions but feel that these efforts fall outside of the scope of this paper.  

      (5) Testing the effect of single mutants  

      Our genomic analysis showed that phage-resistant MRSA evolved multiple mutations following phage infection, making it difficult to determine the mechanism of each mutation alone. For example, phage-resistant MW2 and LAC evolved nonsense mutations in transcriptional regulators mgrA, arlR, and sarA. To test whether these mutations alone were sufficient to confer resistance, we obtained MRSA strains with single-gene knockouts of mgrA, arlR, and sarA and tested their ability to resist phage. We observed that deletion of mgrA in the MW2 resulted in a modest reduction in phage sensitivity (Figure S7). However, we did not the observe any changes in the other mutant strains. These results suggest that phage resistance in these strains is likely caused by a combination of mutations. Determining the mechanisms of these mutations is the focus if our future work.

      (6) Transcriptomics of phage-resistant MRSA strains

      To further assess the effects of the phage resistance mutations, we performed bulk RNA-seq on phage-resistant MW2 and LAC strains and compared their differential expression levels to the respective wild-type strains. We picked these strains because our genomic data showed that they had evolved mutations in known transcriptional regulators (e.g. mgrA). Our analysis shows that both strains significantly modulate their gene expression (Figure 4). Notably, both strains upregulate the cell wall-associated protein ebh, while downregulating several genes involved in quorum sensing, virulence, and secretion. We have included this new data in Figure 4 and Table S5 and added an entire section in the manuscript discussing these results and their implications.  

      (7) Co-treatment of MRSA with phage and b-lactam

      We performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance, showing a distinct mutational profile. We discuss these results in detail in the main text.

      Reviewer # 1:

      Strengths:

      Phage-mediated re-sensitization to antibiotics has been reported previously but the underlying mutational analyses have not been described. These studies suggest that phages and antibiotics may target similar pathways in bacteria.

      We thank Reviewer 1 for this assessment. We hope that the data provided in this work will help stimulate further inquiries into this area and help in the development of better phage-based therapies to combat MRSA.

      Weaknesses:

      One limitation is the lack of mechanistic investigations linking particular mutations to the phenotypes reported here. This limits the impact of the work.

      We acknowledge the limitations of our initial analysis. We note (and cite) that separate studies have already linked mutations in femA, mgrA, arlR, and sarA with reduced b-lactam resistance and virulence phenotypes in MRSA, but not to phage resistance. For the other mutations, we could not find literature linking them to our observed phenotypes. We analyzed the effects of single gene knockouts of mgrA, arlR, and sarA on MRSA’s phage resistance. However, as shown above, the results only showed modest effects on phage resistance in the MW2 strain (see Figure S7 and lines 309-317). We therefore believe that mutations in single genes are not sufficient to cause the trade-offs in phage/ b-lactam resistance. Because each MRSA strain evolved multiple mutations (e.g. MW2 evolved 6 or more mutations), we feel that determining the effects of all possible permutations of those mutations was beyond the scope of the paper.

      However, to bridge the mutational data with our phenotypic observations, we performed RNAseq and compared the transcriptomes of un-treated and phage-treated MRSA strains (see Figure 4, Table S5, and lines 337-391). Our results show that phage-treated MRSA strains significantly modulate their transcript levels. Indeed, some of the changes in gene expression can explain for the phenotypic observations (e.g. overexpression of ebh can lead to reduced clumping). Further, the results shown some unexpected patterns, such as the downregulation of quorum sensing genes or genes involved in type VII secretion.

      Another limitation of this work is the use of lab strains and a single pair of phages. However, while incorporation of clinical isolates would increase the translational relevance of this work it is unlikely to change the conclusions.

      We thank the reviewer for this suggestion. We would like to clarify that MW2, MRSA252, and LAC are pathogenic clinical isolates that were isolated between 1997 and 2000’s. However, we acknowledge that, because these 3 strains have been propagated for many generations, they might have acquired laboratory adaptations. We therefore obtained 30 USA300 clinical strains that were isolated in more recent years (~2008-2011) and tested our phages against them. We note that these clinical isolates (generously provided by Dr. Petra Levin’s lab) were preserved with minimal passaging to reduce the effects of laboratory adaptation. We found that the Evo2 phage was able to elicit oxacillin trade-offs in those strains as well. (see Table S1, Table S7, Fig 2C, and lines 210 – 225)

      For the phages, we had to work with phage(s) that could infect all three MRSA strains. That is why in our initial tests, we focused on FStaph1N and Evo2, both members of the Kayvirus genus. Now in our revised work, we extend our analysis to FNM1g6, a member of the Dubowvirus genus, that also infects the LAC strain, but not MW2 and MRSA252. We find that FNM1g6 is unable to drive trade-offs in b-lactam resistance (see lines 229 – 238). Next, we analyzed the effects of SATA8505, also a member of the Kayvirus genus. Here, we observed that SATA8505 can elicit trade-offs in b-lactam resistance (see Figure S5 and lines 238 – 246). These results suggest that not all staphylococcal phages can elicit these trade-offs and call for more comprehensive analyses of different types of phages.

      Reviewer #1 (Recommendations for the authors):

      Specific questions:

      (1) The Evo2 isolate is an evolved version of phage Staph1N with more potent lytic activity. Is this reflected in more pronounced antibiotic sensitivity?

      We did not observe that Evo2-treated MRSA cells showed more sensitivity towards b-lactams. However, we did observe that Evo2 was able to elicit these trade-offs at lower multiplicities of infection (MOI) (see lines 173 – 176 and Figure S2). Further, we did observe that Evo2 caused a greater trade-off in virulence phenotypes (hemolysis and cell agglutination) (see lines 416 - 419 lines 433 – 435, and Figure 5)

      In our revisions, we also tested Evo2-treated MRSA against a wide range of antibiotics. We did not observe significant changes in MICs against those agents.   

      (2) Are there mutations in the SCCmec cassette or the MecA gene after selection against ΦStaph1N?

      We did not observe any mutations in known resistance genes SCCmec or blaZ. Furthermore, we did not see any differential expression of those genes in our transcriptomic data (see lines 344 and 346).  

      (3) The authors report that phage ΦNM1γ6 does not induce antibiotic sensitivity changes despite being effective against bacterial strain LAC. Were mutational sequencing studies performed with the resistant isolates that emerged against this strain? Can the authors hypothesize why these did not impact the virulence or resistance of LAC despite effective killing? How does this align with their models for ΦStaph1N?

      We thank the reviewer for that insightful question. In our revised manuscript, we found that ΦNM1γ6 elicits a point mutation in the fmhC gene, which is involved in cell wall maintenance (see lines 326 – 335). To our knowledge, this point mutation has not been linked to phage resistance or drug sensitivity MRSA. Notably this mutation was not observed with ΦStaph1N or Evo2. We therefore speculate that ΦNM1γ6 binds to a different receptor molecule on the MRSA cell wall.   

      (4) If I understand correctly, the authors attribute these effects of phage predation on antibiotic sensitivity and virulence to orthogonal selection pressures. A good test of this model would be to examine the mutations that emerge in antibiotic/phage co-treatment. This should be done.

      We thank the reviewer for this suggestion. As described in the summary section above, we performed checkerboard experiments on MRSA strains with phage and b-lactam gradients (see lines 440 – 494 and Figure 6). We found that under most conditions, MRSA cells were only able to recover under low phage and b-lactam concentrations. Notably, these recovered cells were still phage resistant and b-lactam sensitive. However, under one condition where MW2 was treated with FStaph1N and b-lactam, we found that some recovered cells still had high levels of b-lactam resistance and only limited phage resistance, showing a distinct mutational profile (Figure S6). Under these conditions, we think that the selective pressure exerted by FStaph1N is “overcome” by the selective pressure of the high oxacillin concentration, a point that we discuss in the main text.

      Reviewer #2 (Public review):

      Summary:

      The work presented in the manuscript by Tran et al deals with bacterial evolution in the presence of bacteriophage. Here, the authors have taken three methicillin-resistant S. aureus strains that are also resistant to beta-lactams. Eventually, upon being exposed to phage, these strains develop beta-lactam sensitivity. Besides this, the strains also show other changes in their phenotype such as reduced binding to fibrinogen and hemolysis.

      Strengths:

      The experiments carried out are convincing to suggest such in vitro development of sensitivity to the antibiotics. Authors were also able to "evolve" phage in a similar fashion thus showing enhanced virulence against the bacterium. In the end, authors carry out DNA sequencing of both evolved bacteria and phage and show mutations occurring in various genes. Overall, the experiments that have been carried out are convincing.

      We thank Reviewer 2 for their positive comments.

      Weaknesses:

      Although more experiments are not needed, additional experiments could add more information. For example, the phage gene showing the HTH motif could be reintroduced in the bacterial genome and such a strain can then be assayed with wildtype phage infection to see enhanced virulence as suggested. At least one such experiment proves the discoveries regarding the identification of mutations and their outcome.

      We thank the reviewer for this suggestion. We attempted to clone ORF141 into an expression plasmid and perform complementation experiments with Evo2 phage; however, all transformants that were isolated had premature stop-codons and frameshifts in the wild-type ORF141 insert that would disrupt protein function. We therefore think that the gene product of ORF141 might be toxic to the cells. We are currently working on placing the gene under more stringent transcriptional control but feel that these efforts fall outside of the scope of this paper.  

      Secondly, I also feel that authors looked for beta-lactam sensitivity and they found it. I am sure that if they look for rifampicin resistance in these strains, they will find that too. In this case, I cannot say that the evolution was directed to beta-lactam sensitivity; this is perhaps just one trait that was observed. This is the only weakness I find in the work. Nevertheless, I find the experiments convincing enough; more experiments only add value to the work.  

      We thank the reviewer for their comments. Because both phages and β-lactams interface with the bacterial cell wall, we posited that phage resistance would reduce resistance in cell wall targeting antibiotics. In our revisions, we have expanded our analysis to include a much wider range of antibiotic classes, including rifampicin, mupirocin, erythromycin, and other cell wall disruptors, such as daptomycin and teicoplanin. We did not observe any significant changes to the MICs of these other antibiotics (see Table S3 and lines 191-199). It therefore appears that the effects of these trade-offs are confined to beta-lactams.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors describe a novel pattern of ncRNA processing by Pac1. Pac1 is a RNase III family member in S. pombe that has previously been shown to process pre-snoRNAs. Other RNase III family members, such as Rnt1 in S. cerevisiae and Dosha in human, have similar roles in cleaving precursors to ncRNAs (including miRNA, snRNA, snoRNA, rRNA). All RNAse III family members share that they recognize and cleave dsRNA regions, but differ in their exact sequence and structure requirement. snoRNAs can be processed from their own precursor, a polycistronic pre-cursor, or the intron of a snoRNA host gene. After the intron is spliced out, the snoRNA host gene can either encode an protein or be a non-functional by product.

      In the current manuscript the authors show that in S. pombe snoRNA snR107 and U14 are processed from a common precursor in a way that has not previously been described. snR107 is encoded within an intron and processed from the spliced out intron, similar to a typical intron-encoded snoRNA. What is different is that upon splicing, the host gene can adopt a new secondary structure that requires base-pairing between exon 1 and exon2, generating a Pac1 recognition site. This site is recognized, resulting in cleaving of the RNA and further processing of the 3' cleavage product into U14 snoRNA. In addition, the 5' cleavage product is processed into a ncRNA named mamRNA. The experiments describing this processing are thorough and convincing, and include RNAseq, degradome sequencing, northern blotting, qRT-PCR and the analysis of mutations that disrupt various secondary structures in figures 1, 2, and 3. The authors thereby describe a previously unknown gene design where both the exon and the intron are processed into a snoRNA. They conclude that making the formation of the Pac1 binding site dependent on previous splicing ensures that both snoRNAs are produced in the correct order and amount. Some of the authors findings are further confirmed by a different pre-print (reference 19), but the other preprint did not reveal the involvement of Pac1.

      While the analysis on the mamRNA/snR107/U14 precursor is convincing, as a single example the impact of these findings is uncertain. In Figure 4 and supplemental table 1, the authors use bioinformatic searches and identify other candidate loci in plans and animals that may be processed similarly. Each of these loci encode a putative precursor that results in one snoRNA processed from an intron, a different snoRNA processed from an exon, and a double stranded structure that can only form after splicing. While is potentially interesting, it is also the least developed and could be discussed and developed further as detailed below.

      Major comments:

      1. The proposal that plant and animal pre-snoRNA clusters are processed similarly is speculative. the authors provide no evidence that these precursors are processed by an RNase III enzyme cutting at the proposed splicing-dependent structure. This should not be expected for publication, but would greatly increase the interest.

      All three reviewers expressed a similar concern, and we now provide additional evidence supporting the conservation of the proposed mechanism. Specifically, we focused on the SNHG25 gene in H. sapiens, which hosts two snoRNAs—one intronic, as previously shown in Figure 4B, and one non-intronic. We substantiated our predictions through the re-analysis of multiple sequencing datasets in human cell lines, as outlined below:

      I. Analysis of CAGE-seq and nano-COP datasets indicates a single major transcription initiation site at the SNHG25 locus. Both the intronic and non-intronic snoRNAs are present within the same nascent precursor transcripts (Supplementary Figure 4D).

      II. Degradome-seq experiments in human cell lines reveal that the predicted splicing-dependent stem-loop structure within the SNHG25 gene is subject to endonucleolytic cleavage (Supplementary Figure 4D). The cleavage sites are located at the apical loop and flanking the stem, displaying a staggered symmetry characteristic of RNase III activity (Figure 4C). Importantly, the nucleotide sequence surrounding the 3' cleavage site and the 3' splice-site are conserved in other vertebrates (Supplementary Figure 4.D).

      III. fCLIP experiments demonstrate that DROSHA associates with the spliced SNHG25 transcript (Supplementary Figure 4D).

      Together, these analyses support the generalizability of our model beyond fission yeast. They confirm the structure of the SNHG25 gene as a single non-coding RNA precursor hosting two snoRNAs, one of which is intronic. Importantly, these findings show that the predicted stem-loop structure contains conserved elements and is subject to endonucleolytic cleavage. Human DROSHA, an RNase III enzyme, could be responsible for this processing step.

      The authors provide examples of similarly organized snoRNA clusters from human, mouse and rat, but the examples are not homologous to each other. Does this mean these snoRNA clusters are not conserved, even between mammals? Are the examples identified in Arabidopsis conserved in other plants? If there is no conservation, wouldn't that indicate that this snoRNA cluster organization offers no benefit?

      We noticed during this revision that the human SNHG25 locus is actually very well conserved in mice at the GM36220 locus, where both snoRNAs (SNORD104 and SNORA50C/GM221711) are similarly arranged. Although the murine host gene, GM36220, also contains an intron in the UCSC annotation, it is intronless in the Ensembl annotation we used to screen for mixed snoRNA clusters, which explains why it was not part of our initial list of candidates (Supplementary Table 1). Importantly, sequence elements in SNHG25, close to the splice sites and cleavage sites in exon 2, are also well conserved in mice and other vertebrates (Supplementary Figure 4D). Therefore, it is reasonable to think that the mechanism described for SNHG25 in humans may also apply in mice and other vertebrates.

      That being said, snoRNAs are highly mobile genetic elements. For example, it is well established that even between relatively closely related species (e.g., mouse and human), the positions of intronic snoRNAs within their host genes are not strictly conserved, even when both the snoRNAs and their host genes are. In the constrained drift model of snoRNA evolution (Hoeppner et al., BMC Evolutionary Biology, 2012; doi: 10.1186/1471-2148-12-183), it is proposed that snoRNAs are mobile and “may occupy any genomic location from which expression satisfies phenotype.”

      Therefore, a low level of conservation in mixed snoRNA clusters is generally expected and does not necessarily imply that is offers no benefit. Despite the limited conservation of snoRNA identity across species, mixed snoRNA clusters consistently display two recurring features: (1) non-intronic snoRNAs often follow intronic snoRNAs, and (2) the predicted secondary structure tends to span the last exon–exon junction. These enriched features support the idea that enforcing sequential processing of mixed snoRNA clusters may confer a selective advantage. We now explicitly discuss these points in the revised manuscript.

      Supplemental Figure 4 shows some evidence that the S. pombe gene organization is conserved within the Schizosaccharomyces genus, but could be enhanced further by showing what sequences/features are conserved. Presumably the U14 sequence is conserved, but snR107 is not indicated. Is it not conserved? Is the stem-loop more conserved than neighboring sequences? Are there any compensatory mutations that change the sequence but maintain the structure? Is there evidence for conservation outside the Schizosaccharomyces genus?

      We thank the reviewer for these excellent suggestions, which helped us significantly improve Supplementary Figure 4. In the revised version, we now include an additional species—S. japonicus, which is more evolutionarily distant—and show that the intronic snR107 is conserved across the Schizosaccharomyces genus (Supplementary Figure 4A). The distance between conserved elements (splice sites, snoRNAs, and RNA structures) varies, indicating that surrounding sequences are less conserved compared to these functionally constrained features

      We also performed a detailed alignment of the sequences corresponding to the predicted RNA secondary structures. This revealed that the apical regions are less conserved than the base, particularly near the splice and cleavage sites. In these regions, we observe compensatory or base-pair-neutral mutations (e.g., U-to-C or C-to-U, which both pair with G), suggesting structural conservation through evolutionary constraint (Supplementary Figures 4B–C). These observations are now described in greater detail in the revised manuscript, along with a discussion of the specific features likely to be under selective pressure at this locus.

      Conservation outside the Schizosaccharomyces genus is less clear. As already noted in the manuscript, the S. cerevisiae locus retains synteny between snR107 and snoU14, but the polycistronic precursor encompassing both is intronless and processed by RNase III (Rnt1) between the cistrons. Similarly, in Ashbya gossypii and a few other fungal species, synteny is preserved, but no intron appears to be present in the presumed common precursor. Notably, secondary structure predictions for the A. gossypii locus (not shown) suggest the formation of a stable stem-loop encompassing the first snoRNA in a large apical loop. This could reflect a distinct mode of snoRNA maturation, possibly analogous to pri-miRNA processing, where cleavage by an RNase III enzyme contributes to both 5′ and 3′ end formation. In Candida albicans, snoU14 is annotated within an intron of a host gene, but no homolog of snR107 is annotated. Other cases either resemble one of the above scenarios or are inconclusive due to the lack of a clearly conserved snoRNA (or possibly due to incomplete annotation). Although these examples are potentially interesting, we have chosen not to elaborate on them in the manuscript in order to maintain focus and avoid speculative interpretation in the absence of stronger evidence.

      The authors suggest that snoRNAs can be processed from the exons of protein coding genes, but snoRNA processing would destroy the mRNA. Thus snoRNAs processing and mRNA function seem to be alternative outcomes that are mutually exclusive. Can the authors comment?

      In theory, we agree with reviewer on the mutually exclusive nature of mRNA and snoRNA expression for putative snoRNA hosted in the exon of protein coding genes. However, we want to clarify that the specific examples of snoRNA precursor (or host) developed in the manuscript (mamRNA-snoU14 in S.pombe and, in this resubmission, SNHG25 in H. sapiens) are non-coding. So although we do not exclude that our model of sequential processing through splicing and endonucleolytic cleavage could apply to coding snoRNA precursors, it is not something we want to insist on, especially given the lack of experimental evidence for these cases.

      It is possible that the use of the term "exonic snoRNA" in the first version of the manuscript lead to the reviewer's impression that we explicitly meant that snoRNA processing can be processed from the exon of protein coding genes, which was not what we meant (although we do not exclude it). If that was the case, we apologize for the confusion. We have now clarified the issue (see next point).

      Minor comments:

      The term "exonic snoRNA" is confusing. Isn't any snoRNA by definition an exon?

      We agree that this term can be confusing, a sentiment that was also shared by reviewer 3. We replaced the problematic term by either "non-intronic snoRNA", "snoRNA" or "snoRNA gene located in exon" depending on the context, which are more unambiguous in conveying our intended meaning.

      The methods section does not include how similar snoRNA clusters were identified in other species

      We have now corrected this omission in the method section ('Identification of mixed snoRNA clusters' subsection): "To identify mixed snoRNA clusters, we downloaded the latest genome annotation from Ensembl and selected snoRNAs co-hosted within the same precursor, with at least one being intronic and at least one being non-intronic. We filtered out ambiguous cases where snoRNAs overlapped exons defined as 'retained introns', reasoning that in these cases the snoRNA is more likely to be intronic than not."

      In the discussion the authors argue that a previously published observation that S. pombe U14 does not complement a S. cerevisiae mutation can be explained because "was promoter elements... were simply not included in the transgene sequence". However, even if promoter elements were included, the dsRNA structure of S. pombe would not be cleaved by the S. cerevisiae RNase III. I doubt that missing promoter elements are the full explanation, and the authors provide insufficient data to support this conclusion.

      We agree with the reviewer that, given the substantial divergence in substrate specificity between Pac1 and Rnt1, it is unlikely that S. pombe snoU14 would be efficiently processed from its precursor in S. cerevisiae. We did not intend to suggest otherwise, and we have now removed this part of the discussion. As the experiment reported by Samarsky et al. did not detect expression of the S. pombe snoU14 precursor (even its unprocessed form), it remains inconclusive with respect to the conservation (or lack thereof) of snoU14 processing mechanisms.

      For the record, we had originally included this discussion to point out that the lack of cryptic promoter activity (or at least none that S. cerevisiae can use) within the S. pombe snoU14 precursor supports the idea that transcription initiates solely upstream of the mamRNA precursor. However, we recognize that this argument is speculative and potentially confusing. We have therefore removed it from the revised manuscript to maintain clarity and focus.

      **Referees cross-commenting**

      I agree with the other 2 reviewers but think the thiouracil pulse labeling reviewer 2 suggests would take considerable work and if snoRNA processing is very fast might not be as conclusive as the reviewer suggests.

      We are grateful to the reviewer for this comment, which helped us perform this reviewing in a timely manner.

      Reviewer #1 (Significance (Required)):

      In the current manuscript the authors show that in S. pombe snoRNA snR107 and U14 are processed from a common precursor in a way that has not previously been described. The experiments describing this processing are thorough and convincing, and include RNAseq, degradome sequencing, northern blotting, qRT-PCR and the analysis of mutations that disrupt various secondary structures in figures 1, 2, and 3. The authors thereby describe a previously unknown gene design where both the exon and the intron are processed into a snoRNA.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      __ __The manuscript presents a novel mode of processing for polycistronic snoRNAs in the yeast Saccharomyces pombe. The authors demonstrate that the processing sequence of a transcription unit containing U14, intronic snR107, and an overlapping non-coding mamRNA is determined by secondary structures recognized by RNase III (Pac1). Specifically, the formation of a stem structure over the mamRNA exon-exon junction facilitates the processing of terminal exonic-encoded U14. Consequently, U14 maturation occurs only after the mamRNA intron (containing snR107) is spliced out. This mechanism prevents the accumulation of unspliced, truncated mamRNA.

      1.The first section describing the processing steps is challenging to follow due to the unusual organization of the locus and maturation pathway. If the manuscript is intended for a broad audience, I recommend simplifying this section and presenting it in a more accessible manner. A larger diagram illustrating the transcription unit and processing intermediates would be beneficial. Additionally, introducing snR107 earlier in the text would improve clarity.

      We thank the reviewer for these excellent suggestions. In the previous version of the manuscript, we were cautious in how we introduced the locus, as snR107 and the associated intron had not yet been published. This is no longer the case, as the locus is now described in Leroy et al. (2025). Accordingly, we now introduce the complete locus at the beginning of the manuscript and have improved the corresponding diagram (new Figure 1A). We believe these changes enhance clarity and make the section more accessible to a broader audience.

      2.Evaluation of some results is difficult due to the overexposure of Northern blot signals in Figures 1 and 2. The unspliced and spliced precursors appear as a single band, making it hard to distinguish processing intermediates. Would the authors consider presenting these results similarly to Figure 3, where bands are more clearly resolved? Or presenting both overexposed and underexposed blots?

      For all blots (probes A, B, and C), we selected an exposure level that allows detection of precursor forms under wild-type (WT) conditions. This necessarily results in some overexposure of the accumulating precursors in mutant conditions, due to their broad dynamic range of accumulation. To address this, we now provide an additional supplementary "source data" file containing all uncropped blots with both low and high exposures.

      For example, a lower exposure version of the blot in new Figure 1.B (included in the source data file) confirms the consistent accumulation of the spliced precursor when Pac1 activity is compromised. The unspliced precursor also shows slight accumulation in the Pac1-ts mutant, although to a much lesser extent than the spliced precursor. This observation is consistent with our qPCR results (new Figure 1.C).

      Importantly, because this effect is not observed in neither the Pac1-AA or the steam-dead (SD) mutants, we interpret it as an indirect effect—possibly reflecting a mild growth defect in the Pac1-ts strain, even under growth-permissive conditions. We now explicitly address this point in the revised manuscript.

      3.Additionally, I noticed a discrepancy in U14 detection: Probe B gives a strong signal for U14 in Figure 3B, whereas in Figures 1 and 2, U14 appears as faint bands. Could the authors clarify this inconsistency?

      We thank the reviewer for pointing out this discrepancy. The variation in U14 signal intensity is most likely due to technical differences in UV crosslinking efficiency during the Northern blot procedure. This step can differentially affect the membrane retention of RNA species depending on their length, as previously reported (PMID: 17405769). Because U14 is a relatively abundant snoRNA, the fainter signal observed in Figure 1 (relative to the accumulating precursor) likely reflects suboptimal crosslinking of shorter RNAs in that particular blot.

      Importantly, this technical variability does not impact the conclusions of our study, as we do not compare RNA species of different lengths directly. To increase transparency, we now provide a supplementary "source data" file that includes all uncropped blots from our Northern blot experiments. These include examples—such as the uncropped blot for Figure 1B—where U14 retention is more consistent.

      4.Furthermore, ethidium bromide (EtBr) staining of rRNA is used as a loading control, but overexposed signals from the gel may not accurately reflect RNA amounts on the membrane. This could affect the interpretation of mature RNA species' relative abundance.

      We thank the reviewer for pointing this out and have now measured rRNAs loading on the same northern blot membrane from probes complementary to mature rRNA. We updated new Figures 1B, 2B, 3B, S1B, and S3A accordingly.

      5.To further support the sequential processing model, the authors could use pulse-labeling thiouracil to test the accumulation of newly transcribed RNAs and accumulation of individual species. Additionally, it could help determine whether U14 can be processed through alternative, less efficient pathways. Would the authors consider incorporating this approach?

      We thank the reviewer for this pertinent suggestion. We actually plan to investigate the putative alternative U14 maturation pathway in future work, and the suggested approach will definitely be instrumental for that. However, to keep the present manuscript focused, and also to keep the review timely (successful pulse-chase experiments are likely to take time to optimize – as also suggested by the other reviewers in their cross-commenting section), we prefer not to perform this experiment for this reviewing.

      7.In the final section, the authors propose that this processing mechanism is conserved across species, identifying 12 similar genetic loci in different organisms. This is very interesting finding. In my opinion, providing any experimental evidence would greatly strengthen this claim and the manuscript's significance. Even preliminary validation would add substantial value!

      We thank the reviewer for his/her enthusiasm and are glad to provide some preliminary validation to the final section of our manuscript. Specifically, we focused on the SNHG25 gene in H. sapiens, which hosts two snoRNAs—one intronic, as previously shown in Figure 4B, and one non-intronic. We substantiated our predictions through the re-analysis of multiple sequencing datasets in human cell lines, as outlined below:

      I.Analysis of CAGE-seq and nano-COP datasets indicates a single major transcription initiation site at the SNHG25 locus. Both the intronic and non-intronic snoRNAs are present within the same nascent precursor transcripts (Supplementary Figure 4D).

      II.Degradome-seq experiments in human cell lines reveal that the predicted splicing-dependent stem-loop structure within the SNHG25 gene is subject to endonucleolytic cleavage (Supplementary Figure 4D). The cleavage sites are located at the apical loop and flanking the stem, displaying a staggered symmetry characteristic of RNase III activity (Figure 4C). Importantly, the nucleotide sequence surrounding the 3' cleavage site and the 3' splice-site are conserved in other vertebrates (Supplementary Figure 4.D).

      III. fCLIP experiments demonstrate that DROSHA associates with the spliced SNHG25 transcript (Supplementary Figure 4D).

      Together, these analyses support the generalizability of our model beyond fission yeast. They confirm the structure of the SNHG25 gene as a single non-coding RNA precursor hosting two snoRNAs, one of which is intronic. Importantly, these findings unambiguously show that the predicted stem-loop structure is subject to endonucleolytic cleavage, and they are consistent with DROSHA, an RNase III enzyme, being responsible for this processing step.

      **Referees cross-commenting**

      The other two reviewers' comments are justified.

      Reviewer #2 (Significance (Required)):

      The authors describe an interesting novel mode of snoRNA procseeimg form the host transcript. The results appear sound and intriguing, especially if the proposed mechanism can be confirmed across different organisms. Including such validation would significantly enhance the impact and make this work of broad audience interest.

      My expertise: transcription, non-coding RNAs

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      The manuscript by Migeot et al., focuses on a new Pac1-mediated snoRNA processing pathway for intron-encoded snoRNA pairs in yeast Schizosaccharomyces pombe. The novelty of the findings described in MS is the report of an unusual and relatively rare genomic organization and sequential processing of a few snoRNA genes in S. pombe and other eukaryotic organisms. It appears that in the case of snoRNA pairs, hosted in pre-mRNA in the intron and exon, respectively, the release of separate pre-snoRNAs from the host gene relies first on splicing to free the intron-encoded snoRNA, followed by endonucleolytic cleavage by RNase III (Pac1 in S. pombe) to produce snoRNA present in the mRNA exon. The sequential processing pathway, ensuring proper maturation of two snoRNAs, was demonstrated and argued in an elegant and clear way. The main message of the MS is straightforward, most experiments are properly conducted and specific conclusions based on the data are justified and valid. The text is clearly written and well-presentded.

      But there are some shortcomings.

      1.First of all, the title of the MS and general conclusions regarding the Pac1-mediated sequential release of snoRNA pairs hosted within the intron are definitely an overstatement. Especially the title suggests that this genomic organization and unusual processing mode of these snoRNAs is widespread. Later in the discussion the authors themselves admit that such mixed exonic-intronic snoRNAs are rare, although their presence may be underestimated due to annotation problems. It is likely that such snoRNA arrangement and processing is conserved, but the evidence is missing and only unique cases were identified based on bioinformatics mining and their processing has not been assayed. This makes the generalization impossible based on a single documented mamRNA/snoU14 example, no matter how carefully examined.

      We thank the reviewer for clearly articulating this concern. In response, we now provide additional evidence supporting conservation of the proposed mechanism in other species:

      • Conservation within the Schizosaccharomyces genus (Figures S4A–C) has been further analyzed, as suggested by Reviewer 1. This expanded analysis highlights conserved features—such as splice sites and cleavage sites within the predicted stem-loop structure—indicating that these elements are under selective constraint.

      • Conservation in mammals is now supported by experimental data, as detailed in our responses to point #7 of Reviewer 2 and major comment #1 of Reviewer 1. Specifically, we show that for the SNHG25 gene in H. sapiens (Figure S4D):

      (1) nascent transcription give rise to a single non-coding RNA precursor that hosts two snoRNAs, one of which is intronic;

      (2) the predicted stem-loop structure contains conserved elements and is subject to endonucleolytic cleavage;

      (3) the RNase III enzyme DROSHA associates with the spliced SNHG25 precursor.

      Together, these analyses strengthen the evidence for the evolutionary conservation of the mechanism and support the general conclusions and title of the manuscript.

      Another interesting observation is that, similarly to other intron-encoded snoRNA in other species, there is a redundant pathway to produce mature U14 in addition to Pac1-mediated cleavage. In the case of intronic snoRNAs in S. cerevisiae, their release could be performed either by splicing/debranching or Rnt1 cleavage, but there is also a third alternative option, that is processing following transcription termination downstream of the snoRNA gene, which at the same time interferes with the expression of the host gene. Is such a scenario possible as an alternative pathway for U14? Are there any putative, or even cryptic, terminators downstream of the U14 gene? The authors did not consider or attempt to inspect this possibility.

      We thank the reviewer for this interesting and thoughtful comment. First, we would like to clarify that snoU14 is not intron-encoded; rather, it is located on the exon downstream of the intron-encoded snR107.

      Regarding the possibility of transcription termination-based processing: downstream of snoU14, we identified a non-consensus polyadenylation signal (AUUAAA) preceded by a U-rich tract, followed by three consensus polyadenylation signals (AAUAAA) within a 500-nt window. These elements likely contribute to robust and redundant transcription termination at this highly expressed locus. However, since all these sites are located downstream of snoU14, they do not provide an alternative 5′-end processing mechanism for this snoRNA –they reflect normal termination.

      If we correctly understood the reviewer’s suggestion (apologies if not), they may have been referring to the possibility of a cryptic or alternative polyadenylation site between snR107 and snoU14 instead. If cleavage were to occur in this inter-snoRNA region while transcription continued past snoU14, it could, in principle, allow for alternative processing of snoU14. We have indeed considered this scenario. However, we currently do not find strong support for it: there are no identifiable polyadenylation signals motifs between the two snoRNAs, aside from a weakly conserved and questionable AAUAAU hexamer that does not appear to be used as polyA site at least in WT conditions (DOI: 10.4161/rna.25758). Given the lack of evidence, we chose not to explore this hypothesis further in the present manuscript, though it remains an interesting possibility for future investigation.

      I also have some concerns or comments related to the presented research, which are no major, but are mainly related to data quatification, but have to be addressed.

      • In Pac1-ts and Pac1-AA strains the level of mature U14 seems upregulated compared to respective WT (Figure 1A). At the same time mature 25S and 18S rRNAs are less abundant. But there is no quantification and it is not mentioned in the text. What could be the reason for these effects?

      We thank the reviewer for this observation. As reviewer 2 also noted, ethidium bromide staining of mature rRNAs is not a reliable quantitative loading control. In response to this concern, we have now reprobed all northern blots with radiolabeled rRNA probes. These provide a more accurate and consistent loading for our blots (new Figures 1B, 2B, 3B, S1B, S3A).

      Using these improved loading controls, it is evident that snoU14, snR107, and the unspliced precursor are all slightly upregulated in the Pac1-ts strain, although to a much lesser extent than the spliced precursor, which accumulates dramatically. We do not observe this effect in either the Pac1-AA or stem-dead (SD) mutants. We therefore interpret the modest upregulation as an indirect effect, possibly linked to the physiological state of the Pac1-ts mutant, which exhibits slower growth even at growth-permissive temperatures. We now explicitly discuss this in the revised manuscript.

      Regarding the suggestion to include quantification of the northern blot signal: we opted not to include this in the figures for the following reasons. First, the accumulation of the spliced precursor—the central focus of our analysis—is large and highly reproducible across all replicates and conditions. Second, northern blot quantification by pixel intensity remains semi-quantitative, particularly for comparisons across RNAs of highly different abundance. Finally, we support our conclusions with additional quantitative data from RT-qPCR and RNA-seq, which provide more robust measures of RNA accumulation.

      • Processing of the other snoRNA from the mamRNA/snoU14 precursor is largely overlooked in the MS. It is commented on only in the context of mutants expressing constitutive mamRNA-CS constructs (Figure 3B). Its level was checked in Pac1-ts and Pac1-AA (Supplementary Figure 1), but the authors conclude that "its expression remained largely unaffected by Pac1 inactivation", which is clearly not true. Similarly to U14, also snR170 is increased in Pac1-ts and Pac1-AA strains, at least judged "by eye" because the loading control or quantification is not provided. This matter should be clarified.

      We thank the reviewer for pointing this out. We have now included appropriate loading controls for Supplementary Figure 1 to clarify the interpretation. As discussed in our response to the previous comment, we observe a general upregulation of the mamRNA locus in the Pac1-ts strain, which likely contributes to the increased levels of both snR107 and snoU14. However, because this upregulation is not observed in the Pac1-AA or stem-dead (SD) mutants, we interpret it as an indirect effect, possibly related to the altered physiological state of the Pac1-ts strain (e.g., slightly reduced growth rate even at the permissive temperature). This interpretation has now been clearly explained in the revised manuscript.

      We also identified and corrected a labeling error in the previous version of Supplementary Figure 1, where the Pac1-ts and Pac1-AA strains were inadvertently swapped. We sincerely apologize for the confusion this may have caused and have now ensured that all figure panels are correctly labeled and consistent with the text.

      Other minor comments:

      Minor points:

      1. Page 1, Abstract. The sentence "The hairpin recruits the RNase III Pac1 that cleaves and destabilizes the precursor transcript while participating in the maturation of the downstream exonic snoRNA, but only after splicing and release of the intronic snoRNA" is not entirely clear and should be simplified, maybe split into two sentences. This message is clear after reading the MS and learning the data, but not in the abstract.

      We thank the reviewer for pointing this out and have now clarified the abstract following the suggestion to split and simplify the problematic sentence : "... the sequence surrounding an exon-exon junction within their precursor transcript folds into a hairpin after splicing of the intron. This hairpin recruits the RNase III ortholog Pac1, which participates in the maturation of the downstream snoRNA by cleaving the precursor."

      Page 1, Introduction. I am not convinced by the need to use the term "exonic snoRNA" for all snoRNA that are not intronic, which is misleading, and is rather associated per se with snoRNA encoded in the mRNA exon. It has been used before in the review about snoRNAs by Michelle Scott published in RNA Biol (2024), but it does not justify its common use.

      We thank the reviewer for raising this important point. We agree that the term “exonic snoRNA” can be misleading, as it was previously used to specifically refer to snoRNAs embedded within exons of mRNA transcripts—an rare and potentially artifactual scenario, as very cautiously discussed by Michelle Scott and colleagues in their review published in RNA Biol (2024).

      In the previous version of our manuscript, we actually used “exonic snoRNA” in a broader sense to denote any snoRNA not encoded within an intron, primarily for convenience in contrasting the processing of intronic snR107 with that of non-intronic/exonic snoU14. However, we recognize that this usage is non-standard and risks confusion due to the ambiguity surrounding the term’s definition in the literature.

      In light of this, and in agreement with reviewer 1 who raised a similar concern, we have revised the manuscript to remove the term “exonic snoRNA” entirely. Depending on the context, we now refer more precisely to “non-intronic snoRNA,” “snoRNA gene located in exon,” or simply “snoRNA.”

      Supplementary Figure 3. It is difficult to assess whether the level of mature rRNAs is unchanged in the mutants based on EtBr staining and without calculations. Northern blotting should be performed and the levels properly calculated.

      As suggested, we performed northern blotting on mature 18S and 25S, quantified the signal and observed no significant differences (new Supplementary Figure 3).

      **Referees cross-commenting**

      I also agree that 4sU labeling may require too much work with a questionable result.

      We are grateful to the reviewer for this comment, which helped us perform this reviewing in a timely manner.

      Reviewer #3 (Significance (Required)):

      Strengths: 1. Novelty of the described genomic arrangement of snoRNA/ncRNA genes and their processing in a sequential and regulated manner.

      Potential conservation of this pathways across eukaryotic organisms. Well designed and performed experiments followed by proper conclusions.

      Limitations: 1. Insufficient evidence to support generalization of the study results.

      Moderate overall impact of the study

      Advance: This research can be placed within publications describing specific processing pathways for various non-coding RNAs, including for example unusual chimeric species such as sno-lncRNAs. In this context, the presented results do advance the knowledge in the field by providing mechanistic evidence for a tightly controlled and coordinated maturation of selected ncRNAs.

      Audience: Basic research and specialized. The interest in this research will rather be limited to a specific field.

    1. Author response:

      The following is the authors’ response to the previous reviews

      General Response to Reviewers:

      We thank the Reviewers for their comments, which continue to substantially improve the quality and clarity of the manuscript, and therefore help us to strengthen its message while acknowledging alternative explanations.

      All three reviewers raised the concern that we have not proven that Rab3A is acting on a presynaptic mechanism to increase mEPSC amplitude after TTX treatment of mouse cortical cultures.  The reviewers’ main point is that we have not shown a lack of upregulation of postsynaptic receptors in mouse cortical cultures. We want to stress that we agree that postsynaptic receptors are upregulated after activity block in neuronal cultures.  However, the reviewers are not acknowledging that we have previously presented strong evidence at the mammalian NMJ that there is no increase in AChR after activity blockade, and therefore the requirement for Rab3A in the homeostatic increase in quantal amplitude points to a presynaptic contribution. We agree that we should restrict our firmest conclusions to the data in the current study, but in the Discussion we are proposing interpretations. We have added the following new text:

      “The impetus for our current study was two previous studies in which we examined homeostatic regulation of quantal amplitude at the NMJ.  An advantage of studying the NMJ is that synaptic ACh receptors are easily identified with fluorescently labeled alpha-bungarotoxin, which allows for very accurate quantification of postsynaptic receptor density. We were able to detect a known change due to mixing 2 colors of alpha-BTX to within 1% (Wang et al., 2005).  Using this model synapse, we showed that there was no increase in synaptic AChRs after TTX treatment, whereas miniature endplate current increased 35% (Wang et al., 2005). We further showed that the presynaptic protein Rab3A was necessary for full upregulation of mEPC amplitude (Wang et al., 2011). These data strongly suggested Rab3A contributed to homeostatic upregulation of quantal amplitude via a presynaptic mechanism.  With the current study showing that Rab3A is required for the homeostatic increase in mEPSC amplitude in cortical cultures, one interpretation is that in both situations, Rab3A is required for an increase in the presynaptic quantum.”

      The point we are making is that the current manuscript is an extension of that work and interpretation of our findings regarding the variability of upregulation of postsynaptic receptors in our mouse cortical cultures further supports the idea that there is a Rab3Adependent presynaptic contribution to homeostatic increases in quantal amplitude.

      Public Reviews:

      Reviewer #1 (Public review):

      Koesters and colleagues investigated the role of the small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cortical cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed no significant changes in GluA2 puncta size, intensity, and integral after TTX treatment in control and Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which neuronal Rab3A is required for homeostatic scaling of synaptic transmission, potentially through GluA2-independent mechanisms.

      The major finding - impaired homeostatic up-scaling after TTX treatment in Rab3A KO and Rab3 earlybird mutant neurons - is supported by data of high quality. However, the paper falls short of providing any evidence or direction regarding potential mechanisms. The data on GluA2 modulation after TTX incubation are likely statistically underpowered, and do not allow drawing solid conclusions, such as GluA2-independent mechanisms of up-scaling.

      The study should be of interest to the field because it implicates a presynaptic molecule in homeostatic scaling, which is generally thought to involve postsynaptic neurotransmitter receptor modulation. However, it remains unclear how Rab3A participates in homeostatic plasticity.

      Major (remaining) point:

      (1) Direct quantitative comparison between electrophysiology and GluA2 imaging data is complicated by many factors, such as different signal-to-noise ratios. Hence, comparing the variability of the increase in mini amplitude vs. GluA2 fluorescence area is not valid. Thus, I recommend removing the sentence "We found that the increase in postsynaptic AMPAR levels was more variable than that of mEPSC amplitudes, suggesting other factors may contribute to the homeostatic increase in synaptic strength." from the abstract.

      We have not removed the statement, but altered it to soften the conclusion. It now reads, “We found that the increase in postsynaptic AMPAR levels in wild type cultures was more variable than that of mEPSC amplitudes, which might be explained by a presynaptic contribution, but we cannot rule out variability in the measurement.”.

      Similarly, the data do not directly support the conclusion of GluA2-independent mechanisms of homeostatic scaling. Statements like "We conclude that these data support the idea that there is another contributor to the TTX- induced increase in quantal size." should be thus revised or removed.

      This particular statement is in the previous response to reviewers only, we deleted the sentence that starts, “The simplest explanation Rab3A regulates a presynaptic contributor….”. and “Imaging of immunofluorescence more variable…”. We deleted “ our data suggest….consistently leads to an increase in mEPSC amplitude and sometimes leads to….” We added “…the lack of a robust increase in receptor levels leaves open the possibility that there is a presynaptic contributor to quantal size in mouse cortical cultures. However, the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Reviewer #2 (Public review):

      I thank the authors for their efforts in the revision. In general, I believe the main conclusion that Rab3A is required for TTX-induced homeostatic synaptic plasticity is wellsupported by the data presented, and this is an important addition to the repertoire of molecular players involved in homeostatic compensations. I also acknowledge that the authors are more cautious in making conclusions based on the current evidence, and the structure and logic have been much improved.

      The only major concern I have still falls on the interpretation of the mismatch between GluA2 cluster size and mEPSC amplitude. The authors argue that they are only trying to say that changes in the cluster size are more variable than those in the mEPSC amplitude, and they provide multiple explanations for this mismatch. It seems incongruous to state that the simplest explanation is a presynaptic factor when you have all these alternative factors that very likely have contributed to the results. Further, the authors speculate in the discussion that Rab3A does not regulate postsynaptic GluA2 but instead regulates a presynaptic contributor. Do the authors mean that, in their model, the mEPSC amplitude increases can be attributed to two factors- postsynaptic GluA2 regulation and a presynaptic contribution (which is regulated by Rab3A)? If so, and Rab3A does not affect GluA2 whatsoever, shouldn't we see GluA2 increase even in the absence of Rab3A? The data in Table 1 seems to indicate otherwise.

      The main body of this comment is addressed in the General Response to Reviewers. In addition, we deleted text “current data, coupled with our previous findings at the mouse neuromuscular junction, support the idea that there are additional sources contributing to the homeostatic increase in quantal size.” We added new text, so the sentence now reads: “Increased receptors likely contribute to increases in mESPC amplitudes in mouse cortical cultures, but because we do not have a significant increase in GluA2 receptors in our experiments, it is impossible to conclude that the increase is lacking in cultures from Rab3A<sup>-/-</sup> neurons.”

      I also question the way the data are presented in Figure 5. The authors first compare 3 cultures and then 5 cultures altogether, if these experiments are all aimed to answer the same research question, then they should be pooled together. Interestingly, the additional two cultures both show increases in GluA2 clusters, which makes the decrease in culture #3 even more perplexing, for which the authors comment in line 261 that this is due to other factors. Shouldn't this be an indicator that something unusual has happened in this culture?

      Data in this figure is sufficient to support that GluA2 increases are variable across cultures, which hardly adds anything new to the paper or to the field. 

      A major goal of performing the immunofluorescence measurements in the same cultures for which we had electrophysiological results was to address the common impression that the homeostatic effect itself is highly variable, as the reviewer notes in the comment “…GluA2 increases are variable across cultures…” Presumably, if GluA2 increases are the mechanism of the mEPSC amplitude increases, then variable GluA2 increases should correlate with variable mEPSC amplitude increases, but that is not what we observed. We are left with the explanation that the immunofluorescence method itself is very variable. We have added the point to the Discussion, which reads, “the variability could arise from technical factors associated with the immunofluorescence method, and the mechanism of Rab3A-dependent homeostatic plasticity could be presynaptic for the NMJ and postsynaptic for cortical neurons.”

      Finally, the implication of “Shouldn’t this be an indicator that something unusual has happened in this culture?” if it is not due to culture to culture variability in the homeostatic response itself, is that there was a technical problem with accurately measuring receptor levels. We have no reason to suspect anything was amiss in this set of coverslips (the values for controls and for TTX-treated were not outside the range of values in other experiments). In any of the coverslips, there may be variability in the amount of primary anti-GluA2 antibody, as this was added directly to the culture rather than prepared as a diluted solution and added to all the coverslips. But to remove this one experiment because it did not give the expected result is to allow bias to direct our data selection.

      The authors further cite a study with comparable sample sizes, which shows a similar mismatch based on p values (Xu and Pozzo-Miller 2007), yet the effect sizes in this study actually match quite well (both ~160%). P values cannot be used to show whether two effects match, but effect sizes can. Therefore, the statement in lines 411-413 "... consistently leads to an increase in mEPSC amplitudes, and sometimes leads to an increase in synaptic GluA2 receptor cluster size" is not very convincing, and can hardly be used to support "the idea that there are additional sources contributing to the homeostatic increase in quantal size.”

      We have the same situation; our effect sizes match (19.7% increase for mEPSC amplitude; 18.1% increase for GluA2 receptor cluster size, see Table 1), but in our case, the p value for receptors does not reach statistical significance. Our point here is that there is published evidence that the variability in receptor measurements is greater than the variability in electrophysiological measurements. But we have softened this point, removing the sentences containing “…consistently leads and sometimes...” and “……additional sources contributing…”.

      I would suggest simply showing mEPSC and immunostaining data from all cultures in this experiment as additional evidence for homeostatic synaptic plasticity in WT cultures, and leave out the argument for "mismatch". The presynaptic location of Rab3A is sufficient to speculate a presynaptic regulation of this form of homeostatic compensation.

      We have removed all uses of the word “mismatch,” but feel the presentation of the 3 matched experiments, 23-24 cells (Figure 5A, D), and the additional 2 experiments for a total of 5 cultures, 48-49 cells (Figure 5C, F), is important in order to demonstrate that the lack of statistically significant receptor response is due neither to a variable homeostatic response in the mEPSC amplitudes, nor to a small number of cultures.

      Minor concerns:

      (1) Line 214, I see the authors cite literature to argue that GluA2 can form homomers and can conduct currents. While GluA2 subunits edited at the Q/R site (they are in nature) can form homomers with very low efficiency in exogenous systems such as HEK293 cells (as done in the cited studies), it's unlikely for this to happen in neurons (they can hardly traffic to synapses if possible at all).

      We were unable to identify a key reference that characterized GluA2 homomers vs. heteromers in native cortical neurons, but we have rewritten the section in the manuscript to acknowledge the low conductance of homomers:

      “…to assess whether GluA2 receptor expression, which will identify GluA2 homomers and GluA2 heteromers (the former unlikely to contribute to mEPSCs given their low conductance relative to heteromers (Swanson et al., 1997; Mansour et al., 2001)…”

      (2) Lines 221-222, the authors may have misinterpreted the results in Turrigiano 1998. This study does not show that the increase in receptors is most dramatic in the apical dendrite, in fact, this is the only region they have tested. The results in Figures 3b-c show that the effect size is independent of the distance from soma.

      Figure 3 in Turrigiano et al., shows that the increase in glutamate responsiveness is higher at the cell body than along the primary dendrite. We have revised our description to indicate that an increase in responsiveness on the primary dendrite has been demonstrated in Turrigiano et al. 1998.

      “We focused on the primary dendrite of pyramidal neurons as a way to reduce variability that might arise from being at widely ranging distances from the cell body, or, from inadvertently sampling dendritic regions arising from inhibitory neurons. In addition, it has been shown that there is a clear increase in response to glutamate in this region (Turrigiano et al., 1998).”

      “…synaptic receptors on the primary dendrite, where a clear increase in sensitivity to exogenously applied glutamate was demonstrated (see Figure 3 in (Turrigiano et al., 1998)).

      (3) Lines 309-310 (and other places mentioning TNFa), the addition of TNFa to this experiment seems out of place. The authors have not performed any experiment to validate the presence/absence of TNFa in their system (citing only 1 study from another lab is insufficient). Although it's convincing that glia Rab3A is not required for homeostatic plasticity here, the data does not suggest Rab3A's role (or the lack of) for TNFa in this process.

      We have modified the paragraph in the Discussion that addresses the glial results, to describe more clearly the data that supported an astrocytic TNF-alpha mechanism: “TNF-alpha accumulates after activity blockade, and directly applied to neuronal cultures, can cause an increase in GluA1 receptors, providing a potential mechanism by which activity blockade leads to the homeostatic upregulation of postsynaptic receptors (Beattie et al., 2002; Stellwagen et al., 2005; Stellwagen and Malenka, 2006).”

      We have also acknowledged that we cannot rule out TNF-alpha coming from neurons in the cortical cultures: “…suggesting the possibility that neuronal Rab3A can act via a non-TNF-alpha mechanism to contribute to homeostatic regulation of quantal amplitude, although we have not ruled out a neuronal Rab3A-mediated TNF-alpha pathway in cortical cultures.”

      Reviewer #3 (Public review):

      This manuscript presents a number of interesting findings that have the potential to increase our understanding of the mechanism underlying homeostatic synaptic plasticity (HSP). The data broadly support that Rab3A plays a role in HSP, although the site and mechanism of action remain uncertain.

      The authors clearly demonstrate that Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength is already elevated. In this context, it is unclear if the plasticity is absent, already induced by this mutation, or just occluded by a ceiling effect due to the synapses already being strengthened. Occlusion may also occur in the mixed cultures when Rab3A is missing from neurons but not astrocytes. The authors do appropriately discuss these options. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between changes in synaptic strength and AMPA receptor trafficking during HSP, and conclude that trafficking may not be solely responsible for the changes in synaptic strength during HSP.

      Strengths:

      This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is likely only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms, including whether Rab3A is active pre-synaptically to regulate quantal amplitude.

      As Rab3A is primarily known as a pre-synaptic molecule, this possibility is intriguing. However, it is based on the partial dissociation of AMPAR trafficking and synaptic response and lacks strong support. On average, they saw a similar magnitude of change in mEPSC amplitude and GluA2 cluster area and integral, but the GluA2 data was not significant due to higher variability. It is difficult to determine if this is due to biology or methodology - the imaging method involves assessing puncta pairs (GluA2/VGlut1) clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, with usually less than 20 synapses per neuron analyzed, which would be expected to be more variable than mEPSC recordings averaged across several hundred events. However, when they reduce the mEPSC number of events to similar numbers as the imaging, the mESPC amplitudes are still less variable than the imaging data. The reason for this remains unclear. The pool of sampled synapses is still different between the methods and recent data has shown that synapses have variable responses during HSP. Further, there could be variability in the subunit composition of newly inserted AMPARs, and only assessing GluA2 could mask this (see below). It is intriguing that pre-synaptic changes might contribute to HSP, especially given the likely localization of Rab3A. But it remains difficult to distinguish if the apparent difference in imaging and electrophysiology is a methodological issue rather than a biological one. Stronger data, especially positive data on changes in release, will be necessary to conclude that pre-synaptic factors are required for HSP, beyond the established changes in post-synaptic receptor trafficking.

      Regarding the concern that the lack of increase in receptors is due to a technical issue, please see General Response to Reviewers, above. We have also softened our conclusions throughout, acknowledging we cannot rule out a technical issue.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a strong frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. But the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the conclusions about the GluA2 imaging as compared to the mEPSC amplitude data.

      The key finding in Figure 3 is that NASPM did not eliminate the statistically significant increase in mEPSC amplitude after TTX treatment (Fig 3A).  Whether or not NASPM sensitive receptors contribute to mESPC amplitude is a separate question (Fig 3B). We are open to the possibility that NASPM reduces mEPSC amplitude in both control and TTX treated cells (p = 0.08 for both), but that does not change our conclusion that NASPM has no effect on the TTX-induced increase in mEPSC amplitude. The mechanism underlying the decrease in mEPSC frequency following NASPM is interesting, but does not alter our conclusions regarding the role of Rab3A in homeostatic synaptic plasticity of mEPSC amplitude. In addition, the Reviewer does not acknowledge the Supplemental Figure #1, which shows a similar lack of correspondence between homeostatic increases in mEPSC amplitude and GluA1 receptors in two cultures where matched data were obtained. Therefore, we do not think our lack of a robust increase in receptors can be explained by our failing to look at the relevant receptor.

      To understand the role of Rab3A in HSP will require addressing two main issues:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. More concrete support for the authors' suggestion of a pre-synaptic site of control would be helpful.

      We agree that definitive evidence for a presynaptic role of Rab3A in homeostatic plasticity of mEPSC amplitudes in mouse cortical cultures requires demonstrating that loss of Rab3A in postsynaptic neurons does not disrupt the plasticity, whereas loss in presynaptic neurons does. Without these data, we can only speculate that the Rab3A-dependence of homeostatic plasticity of quantal size in cortical neurons may be similar to that of the neuromuscular junction, where it cannot be receptors. We have added to the Discussion that the mechanism of Rab3A regulation of homeostatic plasticity of quantal amplitude could different between cortical neurons and the neuromuscular junction (lines 448-450 in markup,). Establishing a way to co-culture Rab3A-/- and Rab3A+/+ neurons in ratios that would allow us to record from a Rab3A-/- neuron that has mainly Rab3A+/+ inputs (or vice versa) is not impossible, but requires either transfection or transgenic expression with markers that identify the relevant genotype, and will be the subject of future experiments.

      (2): Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs or a decrease in GABA release (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at those synapses.

      We agree with the Reviewer, that it is important to determine the generality of Rab3A function in homeostatic plasticity. Establishing the homeostatic effect on mIPSCs and then examining them in Rab3A-/- cultures is a large undertaking and will be the subject of future experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor (remaining) points:

      (1) The figure referenced in the first response to the reviewers (Figure 5G) does not exist.

      We meant Figure 5F, which has been corrected in the current response.

      (2) I recommend showing the data without binning (despite some overlap).

      The box plot in Origin will not allow not binning, but we can make the bin size so small that for all intents and purposes, there is close to 1 sample in each bin. When we do this, the majority of data are overlapped in a straight vertical line. Previously described concerns were regarding the gaps in the data, but it should be noted that these are cell means and we are not depicting the distributions of mEPSC amplitudes within a recording or across multiple recordings.

      (3) Please auto-scale all axes from 0 (e.g., Fig 1E, F).

      We have rescaled all mEPSC amplitude axes in box plots to go from 0 (Figures 1, 2 and 6).

      (4) Typo in Figure legend 3: "NASPM (20 um)" => uM

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 140, frequencies are reported in Hz while other places are in sec-1, while these are essentially the same, they should be kept consistent in writing.

      All mEPSC frequencies have been changed to sec<sup>-1</sup>, except we have left “Hz” for repetitive stimulation and filtering.

      (2) Paragraph starting from line 163 (as well as other places where multiple groups are compared, such as the occlusion discussion), the authors assessed whether there was a change in baseline between WT and mutant group by doing pairwise tests, this is not the right test. A two-way ANOVA, or at least a multivariant test would be more appropriate.

      We have performed a two-way ANOVA, with genotype as one factor, and treatment as the other factor. The p values in Figures 1 and 2 have been revised to reflect p values from the post-hoc Tukey test on the specific interactions (for each particular genotype, TTX vs CON effects). The difference in the two WT strains, untreated, was not significant in the Post-Hoc Tukey test, and we have revised the text. The difference between the untreated WT from the Rab3A+/Ebd colony and the untreated Rab3AEbd/Ebd mutant was still significant in the Post-Hoc Tukey test, and this has replaced the Kruskal-Wallis test. The two-way ANOVA was also applied to the neuron-glia experiments and p values in Figure 6 adjusted accordingly.

      (3) Relevant to the second point under minor concerns, I suggest this sentence be removed, as reducing variability and avoiding inhibitory projects are reasons good enough to restrict the analysis to the apical dendrites.

      We have revised the description of the Turrigiano et al., 1998 finding from their Figure 3 and feel it still strengthens the justification for choosing to analyze only synapses on the apical dendrite.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The comments on lines 256-7 could seem misleading - the NASPM results wouldn't rule out contribution of those other subunits, only non-GluA2 containing combinations of those subunits. I would suggest revising this statement. Also, NASPM does likely have an effect, just not one that changes much with TTX treatment.

      At new line 213 (markup) we have added the modifier “homomeric” to clarify our point that the lack of NASPM effect on the increase in mEPSC amplitude after TTX indicates that the increase is not due to more homomeric Ca<sup>2+</sup>-permeable receptors. We have always stated that NASPM reduces mEPSC amplitude, but it is in both control and treated cultures.

      Strong conclusions based on a single culture (lines 314-5) seem unwarranted.

      We have softened this statement with a “suggesting that” substituted for the previous “Therefore,” but stand by our point that the mEPSC amplitude data support a homeostatic effect of TTX in Culture #3, so the lack of increase in GluA2 cluster size needs an explanation other than variability in the homeostatic effect itself.

      Saying (line 554) something is 'the only remaining possibility' also seems unwarranted.

      We have softened this statement to read, “A remaining possibility…”.

      Beattie EC, Stellwagen D, Morishita W, Bresnahan JC, Ha BK, Von Zastrow M, Beattie MS, Malenka RC (2002) Control of synaptic strength by glial TNFalpha. Science 295:2282-2285.

      Mansour M, Nagarajan N, Nehring RB, Clements JD, Rosenmund C (2001) Heteromeric AMPA receptors assemble with a preferred subunit stoichiometry and spatial arrangement. Neuron 32:841-853. Stellwagen D, Malenka RC (2006) Synaptic scaling mediated by glial TNF-alpha. Nature 440:1054-1059.

      Stellwagen D, Beattie EC, Seo JY, Malenka RC (2005) Differential regulation of AMPA receptor and GABA receptor trafficking by tumor necrosis factor-alpha. J Neurosci 25:3219-3228.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

      Turrigiano GG, Leslie KR, Desai NS, Rutherford LC, Nelson SB (1998) Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391:892-896.

      Wang X, Wang Q, Yang S, Bucan M, Rich MM, Engisch KL (2011) Impaired activity-dependent plasticity of quantal amplitude at the neuromuscular junction of Rab3A deletion and Rab3A earlybird mutant mice. J Neurosci 31:3580-3588.

      Wang X, Li Y, Engisch KL, Nakanishi ST, Dodson SE, Miller GW, Cope TC, Pinter MJ, Rich MM (2005) Activity-dependent presynaptic regulation of quantal size at the mammalian neuromuscular junction in vivo. J Neurosci 25:343-351.

    1. n short, the argument for a nonzero risk of a paperclip maximizer scenario rests on assumptions that may or may not be true, and it is reasonable to think that research can give us a better idea of whether these assumptions hold true for the kinds of AI systems that are being built or envisioned. For these reasons, we call it a ‘speculative’ risk, and examine the policy implications of this view in Part IV.

      This isn't a real objection

    2. rutiny, and it remains to be seen how much its safety attitude will cost the company.53 53. Jonathan Stempel. 2024. Tesla must face vehicle owners’ lawsuit over self-driving claims. Reuters (May 2024). https://www.reuters.com/legal/tesla-must-face-vehicle-owners-lawsuit-over-self-driving-claims-2024-05-15/. We think that these correlations are causal. Cruise’s license being revoked was a big part of the reason that it fell behind Waymo, and safety was also a factor in Uber’s self-driving failure.54

      I feel like this paper might just be a series of bad analogies

    3. articularly Graphics Processing Units. Computational and cost limits continue to be relevant to new paradigms, including inference-time scaling. New slowdowns may emerge: Recent signs point to a shift away from the culture of open knowledge sharing in the industry.

      Argument: we might get bottlenecked on tech. I don't think so but idk. This isn't really a probability estimate, it's just a vague phrase. I guess the paper isn't really trying to do much realistic forecasting though

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Comments on revisions:

      The authors have addressed most of the previous comments. However, they should clarify the following response:

      *"For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect."*

      It has been reported that the anterior part of the cerebellum may have a lower regenerative capacity compared to the posterior lobe. To avoid potential ambiguity, the authors should clarify that "the phenotype" and "prominent defect" refer to more severe EGL depletion at an earlier stage after IR rather than a poorer regenerative outcome. Additionally, they should provide a reference to support their statement or indicate if it is based on unpublished observations.

      Our comment does not refer to a more severe EGL depletion at an earlier stage. There is instead poorer regeneration of the anterior region. The irradiation approach used provides consistent cell killing of GCPs across the cerebellum. This can be seen in Fig. 1c, e, g, i in our previous publication: Wojcinski, et al. (2017) Cerebellar granule cell replenishment post-injury by adaptive reprogramming of Nestin+ progenitors. Nature Neuroscience, 20:1361-1370). Also, Fig 2e, g, k, m in the paper shows that by P5 and P8, posterior lobule 8 recovers better than anterior lobules 1-5.

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models, including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case. Additional experiments and tools are required to assess mechanisms. Regardless, the data still implicate microglia in the neonatal regenerative response, and this finding remains an important advance.

      As stated previously, the suggested follow up experiments while relevant are extensive and considered beyond the scope of the current paper.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Pakula et al. explore the impact of reactive oxygen species (ROS) on neonatal cerebellar regeneration, providing evidence that ROS activates regeneration through Nestin-expressing progenitors (NEPs). Using scRNA-seq analysis of FACS-isolated NEPs, the authors characterize injury-induced changes, including an enrichment in ROS metabolic processes within the cerebellar microenvironment. Biochemical analyses confirm a rapid increase in ROS levels following irradiation, and forced catalase expression, which reduces ROS levels, and impairs external granule layer (EGL) replenishment post-injury.

      Strengths:

      Overall, the study robustly supports its main conclusion and provides valuable insights into ROS as a regenerative signal in the neonatal cerebellum.

      Weaknesses:

      (1) The diversity of cell types recovered from scRNA-seq libraries of sorted Nes-CFP cells is unexpected, especially the inclusion of minor types such as microglia, meninges, and ependymal cells. The authors should validate whether Nes and CFP mRNAs are enriched in the sorted cells; if not, they should discuss the potential pitfalls in sampling bias or artifacts that may have affected the dataset, impacting interpretation.

      In our previous work, we thoroughly assessed the transgene using RNA in situ hybridization for Cfp, immunofluorescent analysis for CFP and scRNA-seq analysis for Cfp transcripts (Bayin et al., Science Adv. 2021, Fig. S1-2)(1), and characterized the diversity within the NEP populations of the cerebellum. Our present scRNA-seq data also confirms that Nes transcripts are expressed in all the NEP subtypes. A feature plot for Nes expression has been added to the revised manuscript (Fig 1E), as well as a sentence explaining the results. Of note, since this data was generated from FACS-isolated CFP+ cells, the perdurance of the protein allows for the detection of immediate progeny of Nes-expressing cells, even in cells where Nes is not expressed once cells are differentiated. Finally, oligodendrocyte progenitors, perivascular cells, some rare microglia and ependymal cells have been demonstrated to express Nes in the central nervous system; therefore, detecting small groups of these cells is expected (2-4). We have added the following sentence (lines 391-394):

      “Detection of Nes mRNA confirmed that the transgene reflects endogenous Nes expression in progenitors of many lineages, and also that the perdurance of CFP protein in immediate progeny of Nes-expressing cells allowed the isolation of these cells by FACS (Figure 1E)”.

      (2) The authors should de-emphasize that ROS signaling and related gene upregulation exclusively in gliogenic NEPs. Genes such as Cdkn1a, Phlda3, Ass1, and Bax are identified as differentially expressed in neurogenic NEPs and granule cell progenitors (GCPs), with Ass1 absent in GCPs. According to Table S4, gene ontology (GO) terms related to ROS metabolic processes are also enriched in gliogenic NEPs, neurogenic NEPs, and GCPs.

      As the reviewer requested, we have de-emphasized that ROS signaling is preferentially upregulated in gliogenic NEPs, since we agree with the reviewer that there is some evidence for similar transcriptional signatures in neurogenic NEPs and GCPs. We added the following (lines 429-531):

      “Some of the DNA damage and apoptosis related genes that were upregulated in IR gliogenic-NEPs (Cdkn1a, Phlda3, Bax) were also upregulated in the IR neurogenic-NEPs and GCPs at P2 (Supplementary Figure 2B-E).”

      And we edited the last few sentences of the section to state (lines 453-459):

      “Interestingly, we did not observe significant enrichment for GO terms associated with cellular stress response in the GCPs that survived the irradiation compared to controls, despite significant enrichment for ROS signaling related GO-terms (Table S4). Collectively, these results indicate that injury induces significant and overlapping transcriptional changes in NEPs and GCPs. The gliogenic- and neurogenic-NEP subtypes transiently upregulate stress response genes upon GCP death, and an overall increase in ROS signaling is observed in the injured cerebella.”

      (3) The authors need to justify the selection of only the anterior lobe for EGL replenishment and microglia quantification.

      We thank the reviewers for asking for this clarification. Our previous publications on regeneration of the EGL by NEPs have all involved quantification of these lobules, thus we think it is important to stay with the same lobules. For reasons we have not explored, the phenotype is most prominent in these lobules, that is why they were originally chosen. We edited the following sentence (lines 578-579):

      “First, we analyzed the replenishment of the EGL by BgL-NEPs in vermis lobules 3-5, since our previous work showed that these lobules have a prominent defect.”

      (4) Figure 1K: The figure presents linkages between genes and GO terms as a network but does not depict a gene network. The terminology should be corrected accordingly.

      We have corrected the terminology and added the following (lines 487-489):

      “Finally, linkages between the genes in differentially open regions identified by ATAC-seq and the associated GO-terms revealed an active transcriptional network involved in regulating cell death and apoptosis (Figure 1K).”

      (5) Figure 1H and S2: The x-axis appears to display raw p-values rather than log10(p.value) as indicated. The x-axis should ideally show -log10(p.adjust), beginning at zero. The current format may misleadingly suggest that the ROS GO term has the lowest p-values.

      Apologies for the mistake. The data represents raw p-values and the x-axis has been corrected.

      (6) Genes such as Ppara, Egln3, Foxo3, Jun, and Nos1ap were identified by bulk ATAC-seq based on proximity to peaks, not by scRNA-seq. Without additional expression data, caution is needed when presenting these genes as direct evidence of ROS involvement in NEPs.

      We modified the text to discuss the discrepancies between the analyses. While some of this could be due to the lower detection limits in the scRNA-seq, it also highlights that chromatin accessibility is not a direct readout for expression levels and further analysis is needed. Nevertheless, both scRNA-seq and ATAC-seq have identified similar mechanisms, and our mutant analysis confirmed our hypothesis that an increase in ROS levels underlies repair, further increasing the confidence in our analyses. Further investigation is needed to understand the downstream mechanisms. We added the following sentence (lines 478-481):

      “However, not all genes in the accessible areas were differentially expressed in the scRNA-seq data. While some of this could be due to the detection limits of scRNA-seq, further analysis is required to assess the mechanisms of how the differentially accessible chromatin affects transcription.”

      (7) The authors should annotate cell identities for the different clusters in Table S2.

      All cell types have been annotated in Table S2.

      (8) Reiterative clustering analysis reveals distinct subpopulations among gliogenic and neurogenic NEPs. Could the authors clarify the identities of these subclusters? Can we distinguish the gliogenic NEPs in the Bergmann glia layer from those in the white matter?

      Thank you for this clarification. As shown in our previous studies, we can not distinguish between the gliogenic NEPs in the Bergmann glia layer and the white matter based on scRNA-seq, but expression of the Bergmann glia marker Gdf10 suggests that a large proportion of the cells in the Hopx+ clusters are in the Bergmann glia layer. The distinction within the major subpopulations that we characterized (Hopx-, Ascl1-expressing NEPs and GCPs) are driven by their proliferative/maturation status as we previously observed. We have included a detailed annotation of all the clusters in Table S2, as requested and a UMAP for mKi57 expression in Fig 1E. We have clarified this in the following sentence (lines 383-385):

      “These groups of cells were further subdivided into molecularly distinct clusters based on marker genes and their cell cycle profiles or developmental stages (Figure 1D, Table S2).”

      (9) In the Methods section, the authors mention filtering out genes with fewer than 10 counts. They should specify if these genes were used as background for enrichment analysis. Background gene selection is critical, as it influences the functional enrichment of gene sets in the list.

      As requested, the approach used has been added to the Methods section of the revised paper. Briefly, the background genes used by the goseq function are the same genes used for the probability weight function (nullp). The mm8 genome annotation was used in the nullp function, and all annotated genes were used as background genes to compute GO term enrichment. The following was added (lines 307-308):

      “The background genes used to compute the GO term enrichment includes all genes with gene symbol annotations within mm8.”

      (10) Figure S1C: The authors could consider using bar plots to better illustrate cell composition differences across conditions and replicates.

      As suggested, we have included bar plots in Fig. S1D-F.

      (11) Figures 4-6: It remains unclear how the white matter microglia contribute to the recruitment of BgL-NEPs to the EGL, as the mCAT-mediated microglia loss data are all confined to the white matter.

      We have thought about the question and had initially quantified the microglia in the white matter and the rest of the lobules (excluding the EGL) separately. However, there are very few microglia outside the white matter in each section, thus it is not possible to obtain reliable statistical data on such a small population. We therefore did not include the cells in the analysis. We have added this point in the main text (line 548).

      “As a possible explanation for how white matter microglia could influence NEP behaviors, given the small size of the lobules and how the cytoarchitecture is disrupted after injury, we think it is possible that secreted factors from the white matter microglia could reach the BgL NEPs. Alternatively, there could be a relay system through an intermediate cell type closer to the microglia.” We have added these ideas to the Discussion of the revised paper (lines 735-738).

      Reviewer #2 (Public review):

      Summary:

      The authors have previously shown that the mouse neonatal cerebellum can regenerate damage to granule cell progenitors in the external granular layer, through reprogramming of gliogenic nestin-expressing progenitors (NEPs). The mechanisms of this reprogramming remain largely unknown. Here the authors used scRNAseq and ATACseq of purified neonatal NEPs from P1-P5 and showed that ROS signatures were transiently upregulated in gliogenic NEPs ve neurogenic NEPs 24 hours post injury (P2). To assess the role of ROS, mice transgenic for global catalase activity were assessed to reduce ROS. Inhibition of ROS significantly decreased gliogenic NEP reprogramming and diminished cerebellar growth post-injury. Further, inhibition of microglia across this same time period prevented one of the first steps of repair - the migration of NEPs into the external granule layer. This work is the first demonstration that the tissue microenvironment of the damaged neonatal cerebellum is a major regulator of neonatal cerebellar regeneration. Increased ROS is seen in other CNS damage models including adults, thus there may be some shared mechanisms across age and regions, although interestingly neonatal cerebellar astrocytes do not upregulate GFAP as seen in adult CNS damage models. Another intriguing finding is that global inhibition of ROS did not alter normal cerebellar development.

      Strengths:

      This paper presents a beautiful example of using single cell data to generate biologically relevant, testable hypotheses of mechanisms driving important biological processes. The scRNAseq and ATACseq analyses are rigorously conducted and conclusive. Data is very clearly presented and easily interpreted supporting the hypothesis next tested by reduce ROS in irradiated brains.

      Analysis of whole tissue and FAC sorted NEPS in transgenic mice where human catalase was globally expressed in mitochondria were rigorously controlled and conclusively show that ROS upregulation was indeed decreased post injury and very clearly the regenerative response was inhibited. The authors are to be commended on the very careful analyses which are very well presented and again, easy to follow with all appropriate data shown to support their conclusions.

      Weaknesses:

      The authors also present data to show that microglia are required for an early step of mobilizing gliogenic NEPs into the damaged EGL. While the data that PLX5622 administration from P0-P5 or even P0-P8 clearly shows that there is an immediate reduction of NEPs mobilized to the damaged EGL, there is no subsequent reduction of cerebellar growth such that by P30, the treated and untreated irradiated cerebella are equivalent in size. There is speculation in the discussion about why this might be the case, but there is no explanation for why further, longer treatment was not attempted nor was there any additional analyses of other regenerative steps in the treated animals. The data still implicate microglia in the neonatal regenerative response, but how remains uncertain.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is an exemplary manuscript.

      The methods and data are very well described and presented.

      I actually have very little to ask the authors except for an explanation of why PLX treatment was discontinued after P5 or P8 and what other steps of NEP reprogramming were assessed in these animals? Was NEP expansion still decreased at P8 even in the presence of PLX at this stage? Also - was there any analysis attempted combining mCAT and PLX?

      We agree with the reviewer that a follow up study that goes into a deeper analysis of the role of microglia in GCP regeneration and any interaction with ROS signaling would interesting. However, it would require a set of tools that we do not currently have. We did not have enough PLX5622 to perform addition experiments or extend the length of treatment. Plexxikon informed us in 2021 that they were no longer manufacturing PLX5622 because they were focusing on new analogs for in vivo use, and thus we had to use what we had left over from a completed preclinical cancer study. We nevertheless think it is important to publish our preliminary results to spark further experiments by other groups.

      References

      (1) Bayin N. S. Mizrak D., Stephen N. D., Lao Z., Sims P. A., Joyner A. L. Injury induced ASCL1 expression orchestrates a transitory cell state required for repair of the neonatal cerebellum. Sci Adv. 2021;7(50):eabj1598.

      (2) Cawsey T, Duflou J, Weickert CS, Gorrie CA. Nestin-Positive Ependymal Cells Are Increased in the Human Spinal Cord after Traumatic Central Nervous System Injury. J Neurotrauma. 2015;32(18):1393-402.

      (3) Gallo V, Armstrong RC. Developmental and growth factor-induced regulation of nestin in oligodendrocyte lineage cells. The Journal of neuroscience : the official journal of the Society for Neuroscience. 1995;15(1 Pt 1):394-406.

      (4) Huang Y, Xu Z, Xiong S, Sun F, Qin G, Hu G, et al. Repopulated microglia are solely derived from the proliferation of residual microglia after acute depletion. Nat Neurosci. 2018;21(4):530-40.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Jin et al. investigated how the bacterial DNA damage (SOS) response and its regulator protein RecA affects the development of drug resistance under short-term exposure to beta-lactam antibiotics. Canonically, the SOS response is triggered by DNA damage, which results in the induction of error-prone DNA repair mechanisms. These error-prone repair pathways can increase mutagenesis in the cell, leading to the evolution of drug resistance. Thus, inhibiting the SOS regulator RecA has been proposed as means to delay the rise of resistance.

      In this paper, the authors deleted the RecA protein from E. coli and exposed this ∆recA strain to selective levels of the beta-lactam antibiotic, ampicillin. After an 8h treatment, they washed the antibiotic away and allowed the surviving cells to recover in regular media. They then measured the minimum inhibitory concentration (MIC) of ampicillin against these treated strains. They note that after just 8 h treatment with ampicillin, the ∆recA had developed higher MICs towards ampicillin, while by contrast, wild-type cells exhibited unchanged MICs. This MIC increase was also observed subsequent generations of bacteria, suggesting that the phenotype is driven by a genetic change.

      The authors then used whole genome sequencing (WGS) to identify mutations that accounted for the resistance phenotype. Within resistant populations, they discovered key mutations in the promoter region of the beta-lactamase gene, ampC; in the penicillin-binding protein PBP3 which is the target of ampicillin; and in the AcrB subunit of the AcrAB-TolC efflux machinery. Importantly, mutations in the efflux machinery can impact the resistances towards other antibiotics, not just beta-lactams. To test this, they repeated the MIC experiments with other classes of antibiotics, including kanamycin, chloramphenicol, and rifampicin. Interestingly, they observed that the ∆recA strains pre-treated with ampicillin showed higher MICs towards all other antibiotic tested. This suggests that the mutations conferring resistance to ampicillin are also increasing resistance to other antibiotics.

      The authors then performed an impressive series of genetic, microscopy, and transcriptomic experiments to show that this increase in resistance is not driven by the SOS response, but by independent DNA repair and stress response pathways. Specifically, they show that deletion of the recA reduces the bacterium's ability to process reactive oxygen species (ROS) and repair its DNA. These factors drive accumulation of mutations that can confer resistance towards different classes of antibiotics. The conclusions are reasonably well-supported by the data, but some aspects of the data and the model need to be clarified and extended.

      Strengths:

      A major strength of the paper is the detailed bacterial genetics and transcriptomics that the authors performed to elucidate the molecular pathways responsible for this increased resistance. They systemically deleted or inactivated genes involved in the SOS response in E. coli. They then subjected these mutants the same MIC assays as described previously. Surprisingly, none of the other SOS gene deletions resulted an increase in drug resistance, suggesting that the SOS response is not involved in this phenotype. This led the authors to focus on the localization of DNA PolI, which also participates in DNA damage repair. Using microscopy, they discovered that in the RecA deletion background, PolI co-localizes with the bacterial chromosome at much lower rates than wild-type. This led the authors to conclude that deletion of RecA hinders PolI and DNA repair. Although the authors do not provide a mechanism, this observation is nonetheless valuable for the field and can stimulate further investigations in the future.

      In order to understand how RecA deletion affects cellular physiology, the authors performed RNA-seq on ampicillin-treated strains. Crucially, they discovered that in the RecA deletion strain, genes associated with antioxidative activity (cysJ, cysI, cysH, soda, sufD) and Base Excision Repair repair (mutH, mutY, mutM), which repairs oxidized forms of guanine, were all downregulated. The authors conclude that down-regulation of these genes might result in elevated levels of reactive oxygen species in the cells, which in turn, might drive the rise of resistance. Experimentally, they further demonstrated that treating the ∆recA strain with an antioxidant GSH prevents the rise of MICs. These observations will be useful for more detailed mechanistic follow-ups in the future.

      Weaknesses:

      Throughout the paper, the authors use language suggesting that ampicillin treatment of the ∆recA strain induces higher levels of mutagenesis inside the cells, leading to the rapid rise of resistance mutations. However, as the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, in what is known as cross-resistance. The current data is not clear on whether the elevated "mutagenesis" is driven ampicillin selection or by a bona fide increase in mutation rate.

      Furthermore, on a technical level, the authors employed WGS to identify resistance mutations in the treated ampicillin-treated wild-type and ∆recA strains. However, the WGS methodology described in the paper is inconsistent. Notably, wild-type WGS samples were picked from non-selective plates, while ΔrecA WGS isolates were picked from selective plates with 50 μg/mL ampicillin. Such an approach biases the frequency and identity of the mutations seen in the WGS and cannot be used to support the idea that ampicillin treatment induces higher levels of mutagenesis.

      Finally, it is important to establish what the basal mutation rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has inherently higher mutagenesis than WT, with a larger subpopulation of resistant clones. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.

      Comments on revisions:

      Thank you for responding to the concerns raised previously. The manuscript overall has improved.

      We sincerely thank the reviewer for raising this important point. In our initial submission, we acknowledge that our mutation analysis was based on a limited number of replicates (n=6), which may not have been sufficient to robustly distinguish between mutation induction and selection. In response to this concern, we have substantially expanded our experimental dataset. Specifically, we redesigned the mutation rate validation experiment by increasing the number of biological replicates in each condition to 96 independent parallel cultures. This enabled us to systematically assess mutation frequency distributions under four conditions (WT, WT+ampicillin, ΔrecA, ΔrecA+ampicillin), using both maximum likelihood estimation (MLE) and distribution-based fluctuation analysis (new Figure 1F, 1G, and Figure S5).

      These expanded datasets revealed that:

      (1) While the estimated mutation rate was significantly elevated in ΔrecA+ampicillin compared to ΔrecA alone (Fig. 1G),

      (2) The distribution of mutation frequencies in ΔrecA+ampicillin was highly skewed with evident jackpot cultures (Fig. 1F), and

      (3) The observed pattern significantly deviated from Poisson expectations, which is inconsistent with uniform mutagenesis and instead supports clonal selection from an early-arising mutational pool (Fig. S5).

      Importantly, these new results do not contradict our original conclusions but rather extend and refine them. The previous evidence for ROS-mediated mutagenesis remains valid and is supported by our GSH experiments, transcriptomic analysis of oxidative stress genes, and DNA repair pathway repression. However, the additional data now indicate that ROS-induced variants are not uniformly induced after antibiotic exposure but are instead generated stochastically under the stress-prone ΔrecA background and then selectively enriched upon ampicillin treatment.

      Taken together, we now propose a two-step model of resistance evolution in ΔrecA cells (new Figure 5):

      Step i: RecA deficiency creates a hypermutable state through impaired repair and elevated ROS, increasing the probability of resistance-conferring mutations.

      Step ii: β-lactam exposure acts as a selective bottleneck, enriching early-arising mutants that confer resistance.

      We have revised both the Results and Discussion sections to clearly articulate this complementary relationship between mutational supply and selection, and we believe this integrated model better explains the observed phenotypes and mechanistic outcomes.

      Reviewer #2 (Public review):

      This study aims to demonstrate that E. coli can acquire rapid antibiotic resistance mutations in the absence of a DNA damage response. The authors employed a modified Adaptive Laboratory Evolution (ALE) workflow to investigate this, initiating the process by diluting an overnight culture 50-fold into an ampicillin selection medium. They present evidence that a recA- strain develops ampicillin resistance mutations more rapidly than the wild-type, as indicated by the Minimum Inhibitory Concentration (MIC) and mutation frequency. Whole-genome sequencing of recA- colonies resistant to ampicillin showed predominant inactivation of genes involved in the multi-drug efflux pump system, contrasting with wild-type mutations that seem to activate the chromosomal ampC cryptic promoter. Further analysis of mutants, including a lexA3 mutant incapable of inducing the SOS response, led the authors to conclude that the rapid evolution of antibiotic resistance occurs via an SOS-independent mechanism in the absence of recA. RNA sequencing suggests that antioxidative response genes drive the rapid evolution of antibiotic resistance in the recA- strain. They assert that rapid evolution is facilitated by compromised DNA repair, transcriptional repression of antioxidative stress genes, and excessive ROS accumulation.

      Strengths:

      The experiments are well-executed and the data appear reliable. It is evident that the inactivation of recA promotes faster evolutionary responses, although the exact mechanisms driving this acceleration remain elusive and deserve further investigation.

      Weaknesses:

      Some conclusions are overstated. For instance, the conclusion regarding the LexA3 allele, indicating that rapid evolution occurs in an SOS-independent manner (line 217), contradicts the introductory statement that attributes evolution to compromised DNA repair.

      We thank the reviewer for this insightful observation, which highlights a central conceptual advance of our study. Our data indeed indicate that resistance evolution in ΔrecA occurs independently of canonical SOS induction (as shown by the lack of resistance in lexA3, dpiBA, and translesion polymerase mutants), yet is clearly associated with impaired DNA repair capacity (e.g., downregulation of polA, mutH, mutY).

      This apparent “contradiction” reflects the dual role of RecA: it functions both as the master activator of the SOS response and as a key factor in SOS-independent repair processes. Thus, the rapid resistance evolution in ΔrecA is not due to loss of SOS, but rather due to the broader suppression of DNA repair pathways that RecA coordinates, which elevates mutational load under stress (This point is discussed in further detail in our response to Reviewer 1).

      The claim made in the discussion of Figure 3 that the hindrance of DNA repair in recA- is crucial for rapid evolution is at best suggestive, not demonstrative. Additionally, the interpretation of the PolI data implies its role, yet it remains speculative.

      We appreciate this comment and would like to respectfully clarify that our conclusion regarding the role of DNA repair impairment is supported by several independent lines of mechanistic evidence.

      First, our RNA-seq analysis revealed transcriptional suppression of multiple DNA repair genes in ΔrecA cells following ampicillin treatment, including polA (DNA Pol I) and the base excision repair genes mutH, mutY, and mutM (Fig. 4K). This indicates that multiple repair pathways, including those responsible for correcting oxidative DNA lesions, are downregulated under these conditions.

      Second, we observed a significant reduction in DNA Pol I protein expression as well as reduced colocalization with chromosomal DNA in ΔrecA cells, suggesting impaired engagement of repair machinery (Fig. 3C-E). These phenotypes are not limited to transcriptional signatures but extend to functional protein localization.

      Third, and most importantly, resistance evolution was fully suppressed in ΔrecA cells upon co-treatment with glutathione (GSH), which reduces ROS levels. As GSH did not affect ampicillin killing (Fig. 4J), these findings suggest that mutagenesis and thus the emergence of resistance requires both ROS accumulation and the absence of efficient repair.

      Therefore, we believe these data go beyond correlation and demonstrate a mechanistic role for DNA repair impairment in driving stress-associated resistance evolution in ΔrecA. We have revised the Discussion to emphasize the strength of this evidence while avoiding overstatement.

      In Figure 2A table, mutations in amp promoters are leading to amino acid changes.

      We thank the reviewer for spotting this inconsistency. Indeed, the ampC promoter mutations we identified reside in non-coding regulatory regions and do not result in amino acid substitutions. We have corrected the annotation in Fig. 2A and clarified in the main text that these mutations likely affect gene expression through transcriptional regulation, rather than protein sequence alteration.

      The authors' assertion that ampicillin significantly influences persistence pathways in the wild-type strain, affecting quorum sensing, flagellar assembly, biofilm formation, and bacterial chemotaxis, lacks empirical validation.

      We thank the reviewer for pointing this out. In the original version, we acknowledged transcriptional enrichment of genes related to quorum sensing, flagellar assembly, and chemotaxis in the wild-type strain upon ampicillin treatment. However, as we did not directly assess persistence phenotypes (e.g., biofilm formation or persister levels), we agree that such functional inferences were not fully supported. We have revised the relevant statements to focus solely on transcriptomic changes and have removed language suggesting direct effects on persistence pathways.

      Figure 1G suggests that recA cells treated with ampicillin exhibit a strong mutator phenotype; however, it remains unclear if this can be linked to the mutations identified in Figure 2's sequencing analysis.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Reviewer #3 (Public review):

      In the present work, Zhang et al investigate involvement of the bacterial DNA damage repair SOS response in the evolution of beta-lactam drug resistance evolution in Escherichia coli. Using a combination of microbiological, bacterial genetics, laboratory evolution, next-generation, and live-cell imaging approaches, the authors propose short-term (transient) drug resistance evolution can take place in RecA-deficient cells in an SOS response-independent manner. They propose the evolvability of drug resistance is alternatively driven by the oxidative stress imposed by accumulation of reactive oxygen species and compromised DNA repair. Overall, this is a nice study that addresses a growing and fundamental global health challenge (antimicrobial resistance).

      Strengths:

      The authors introduce new concepts to antimicrobial resistance evolution mechanisms. They show short-term exposure to beta-lactams can induce durably fixed antimicrobial resistance mutations. They propose this is due to comprised DNA repair and oxidative stress. Antibiotic resistance evolution under transient stress is poorly studied, so the authors' work is a nice mechanistic contribution to this field.

      Weaknesses:

      The authors do not show any direct evidence of altered mutation rate or accumulated DNA damage in their model.

      We appreciate the reviewer’s comment. This point is discussed in further detail in our response to Reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest two minor changes to the text.

      (1) Re. WGS data.

      The authors write in their response "We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.

      I think the source of my confusion stemmed from this part in the text:

      "In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations developed over time to provide high levels of resistance (29). To verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain, we..."

      I would change the phrase "verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain" to "identify the types of mutations present in the wild type and ΔrecA strains after treatment of ampicillin." This would explicitly state what the sequencing was for (ie. ID-ing mutations). The current phrase can give the impression that WGS was used to validate rapid or high mutagenesis.

      Thanks for this suggestion. We have revised this description to “In bacteria, resistance to most antibiotics requires the accumulation of drug resistance associated DNA mutations that can arise stochastically and, under stress conditions, become enriched through selection over time to confer high levels of resistance (33). Having observed a non-random and right-skewed distribution of mutation frequencies in ΔrecA isolates following ampicillin exposure, we next sought to determine whether specific resistance-conferring mutations were enriched in ΔrecA isolates following antibiotic exposure.”

      (2) Re. whether the mutations are "induced" or "pre-existing."

      The authors write:

      "We appreciate your detailed feedback on the language used to describe our data. We understand the concern regarding the use of the term "induced" in relation to beta-lactam exposure. To clarify, we employed not only beta-lactam antibiotics but also other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics."

      I think it is important to discuss the negative data for the other antibiotics (along with the other points made in your Reviewer response) in the main text.

      This point is discussed in further detail in our response to Reviewer 1 (Public Review).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      n this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment.

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review):

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy.

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review):

      Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine.

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      This study assesses eRNA activity as a classifier of different subtypes of breast cancer and as a prognosis tool. The authors take advantage of previously published RNA-seq data from human breast cancer samples and assess it more deeply, considering the cancer subtype of the patient. They then apply two machine learning approaches to find which eRNAs can classify the different breast cancer subtypes. While they do not find any eRNA that helps distinguish ductal vs. lobular breast cancers, their approach helps identify eRNAs that distinguish luminal A, B, basal and Her2+ cancers. They also use motif enrichment analysis and ChIP-seq datasets to characterize the eRNA regions further. Through this analysis, they observe that those eRNAs where ER binds strongest are associated with a poor patient prognosis.

      Major comments

      • Part of the rationale for this study is the previous observation that eRNAs are less associated with the prognosis of breast cancer patients in comparison to mRNAs and they claim that the high heterogeneity between breast cancer subtypes would mask the importance of eRNAs. In this study, the authors solely focus on eRNAs as a classification of breast cancer subtypes and prognostic tool and do not answer whether eRNAs or mRNAs are a better predictor of cancer subtypes and of prognosis. Since the answer and the tools are already in their hands, it would be important to also see a comparative analysis where they assess which of the two (mRNAs or eRNAs) is a better predictor.
      • The authors run the umaps of Fig. 1C only taking the predictor eRNAs. It is then somewhat expected to observe a separation. Coming from a single-cell omics field, what I would suggest is to take the eRNA loci and compute a umap with the highly variable regions, perform clustering on it and assess how the cancer subtypes are structured within the data. This would give a first overview of how much segregation and structure one can have with this data. Having a first step of data exploration would also strengthen the paper. If the authors have tried it, could the authors comment on it?
      • 'neither measures could classify any distinct eRNAs for invasive ductal vs lobular cancer samples' S1B. Just by eye, I can see a potential enrichment of ductal on the left and on the right while lobular stays in the center. This suggests to me that, while perhaps each eRNA alone does not have the power to classify the lobular vs ductal subtype, perhaps there is a difference - which could result from a cooperative model of eRNA influence - that would need further exploration. Would a PCA also show enrichments of ductal vs. lobular in specific parts of the plot? It may be worth exploring the PC loadings to see which eRNAs could play an influence. In this regard, a more unbiased visual examination, as suggested in my previous point, could help clarify whether there could be an association of certain eRNAs that cannot be captured by ML.
      • "we employed machine learning approaches on 302,951 eRNA loci identified from RNA-seq datasets from 1,095 breast cancer patient samples from previous studies" - the previous studies from which the authors take the data [11,12] highlight the presence of ~60K enhancers in the human genome and they use less than that in their analysis. Could the authors please clarify the differences in numbers with previous studies and give a reasoning? Also, from the methods section, they discard many patient samples due to low QC, so, from what I understand, the number of samples analyzed in the end is 975 and not 1,095.

      Minor comments

      • Can the authors please state the parameters of the umap in methods? Although it could be intrinsic to the dataset, data points are grouped in a way that makes me think that the granularity is too forced. Could the authors please show how the umap would behave with more lenient parameters? Or even with PCA?
      • 'Majority of the basal' -> The majority of the basal.

      Significance

      This is a paper relevant in the cancer field, particularly for breast cancer research. The significance of the paper lies in digging into the breast cancer samples, taking the different existing subtypes into account to assess the contribution of eRNAs as a classifier and as a prognostic tool. The data is already available but it has not been studied to this degree of detail. It highlights the importance of characterizing cancer samples in more depth, considering its intrinsic heterogeneity, as averaging across different subtypes would mask biology. My expertise lies in gene regulation and single-cell omics. My contribution will therefore be more focused on the analysis and extraction of biological information. The extent of its specific relevance in cancer research falls beyond my expertise.

    1. Before we talk about public criticism and shaming and adults, let’s look at the role of shame in childhood. In at least some views about shame and childhood[1], shame and guilt hold different roles in childhood development [r1]: Shame is the feeling that “I am bad,” and the natural response to shame is for the individual to hide, or the community to ostracize the person. Guilt is the feeling that “This specific action I did was bad.” The natural response to feeling guilt is for the guilty person to want to repair the harm of their action. In this view [r1], a good parent might see their child doing something bad or dangerous, and tell them to stop. The child may feel shame (they might not be developmentally able to separate their identity from the momentary rejection). The parent may then comfort the child to let the child know that they are not being rejected as a person, it was just their action that was a problem. The child’s relationship with the parent is repaired, and over time the child will learn to feel guilt instead of shame and seek to repair harm instead of hide.

      I find the contrast between shame and guilt to be particularly illuminating, especially in the context of parenting. It made me think about how my own parents treated discipline. When I was younger and did something wrong, I recall them emphasizing on what I did rather than characterizing me as a "bad kid"—which corresponds to the concept of encouraging guilt over shame. That type of answer taught me to accept responsibility and correct my actions rather than feeling useless. I'm curious, though, how this strategy would change across cultures where shame is employed more intentionally as a weapon for social conformity.

    2. 18.1. Shame vs. Guilt in childhood development# Before we talk about public criticism and shaming and adults, let’s look at the role of shame in childhood. In at least some views about shame and childhood[1], shame and guilt hold different roles in childhood development [r1]: Shame is the feeling that “I am bad,” and the natural response to shame is for the individual to hide, or the community to ostracize the person. Guilt is the feeling that “This specific action I did was bad.” The natural response to feeling guilt is for the guilty person to want to repair the harm of their action. In this view [r1], a good parent might see their child doing something bad or dangerous, and tell them to stop. The child may feel shame (they might not be developmentally able to separate their identity from the momentary rejection). The parent may then comfort the child to let the child know that they are not being rejected as a person, it was just their action that was a problem. The child’s relationship with the parent is repaired, and over time the child will learn to feel guilt instead of shame and seek to repair harm instead of hide.

      When parents criticize their children's bad actions but still show love and patience, their kids can learn to fix mistakes instead of feeling worthless. Moreover, I believe shame can make kids hide or feel negative, which is bad for their development, but guilt can teach them to take responsibility in their future life. Hence, I think a good parenting style should focus on shaping kids' behavior instead of only blaming them, which can help their children build confidence and kindness.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript the authors have done cryo-electron tomography of the manchette, a microtubule-based structure important for proper sperm head formation during spermatogenesis. They also did mass-spectrometry of the isolated structures. Vesicles, actin and their linkers to microtubules within the structure are shown.

      __We thank the reviewer for the critical reading of our manuscript; we have implemented the suggestions as detailed below, which we believe indeed improved the manuscript. __

      Major:

      The data the conclusions are based on seem very limited and sometimes overinterpreted. For example, only one connection between actin and microtubules was observed, and this is thought to be MACF1 simply based on its presence in the MS.

      __We regret giving the impression that the data is limited. We in fact collected >100 tilt series from 3 biological replicas for the isolated manchette. __

      __In the revised version, we added data from in-situ studies showing vesicles interacting with the manchette (as requested below, new Fig. 1). __

      Specifically, for the interaction of actin with microtubule we added more examples (Revised Fig. 6) and we toned down the discussion related to the relevance of this interaction (lines 193-194, 253-255). MACF1 is mentioned only as a possible candidate in the discussion (line 254).

      Another, and larger concern, is that the authors do a structural study on something that has been purified out of the cell, a process which is extremely disruptive. Vesicles, actin and other cellular components could easily be trapped in this cytoskeletal sieve during the purification process and as such, not be bona fide manchette components. This could create both misleading proteomics and imaging. Therefore, an approach not requiring extraction such as high-pressure freezing, sectioning and room-temperature electron tomography and/or immunoEM on sections to set aside this concern is strongly recommended. As an additional bonus, it would show if the vesicles containing ATP synthase are deformed mitochondria.

      __We recognise the concern raised by the reviewer. __

      __To alleviate this concern, we added imaging data of manchettes in-situ that show vesicles, mitochondria and filaments interacting with the manchette (new Fig. 1), essentially confirming the observations that were made on the isolated manchette. __

      __The benefits of imaging the isolated manchette were better throughput (being able to collect more data) and reaching higher resolution allowing to resolve unequivocally the dynein/dynactin and actin filaments. __

      Minor: Line 99: "to study IMT with cryo-ET, manchettes were isolated ...(insert from which organism)..."

      __Added in line 102 in the revised version. __

      Line 102 "...demonstrating that they can be used to study IMT".. can the authors please clarify?

      This paragraph was revised (lines 131-137), we hope it is now more clear.

      Line 111 "densities face towards the MT plus-end" How can a density "face" anywhere? For this, it needs to have a defined front and back.

      Microtubule motor proteins (kinesin and dynein) are often attached to the microtubules with an angle and dynactin and cargo on one side (plus end). We rephrased this part and removed the word “face” in the revised version to make it more clear (lines 161-162).

      Line 137: is the "perinuclear ring" the same as the manchette?

      The perinuclear ring is the apical part of the manchette that connects it to the nucleus. We added to the revised version imaging of the perinuclear ring with observations on how it changes when the manchette elongates (new Fig. 2).

      Figure 2B: How did the authors decide not to model the electron density found between the vesicle and the MT at 3 O'clock? Is there no other proteins with a similar lollipop structure as ATP synthase, so that this can be said to be this protein with such certainty?

      __The densities connecting the vesicles to the microtubules shown in (now) Fig. 4D are not consistent enough to be averaged. __

      __The densities resembling ATP synthase are inside the vesicles. Nevertheless, we have decided to remove the averaging of the ATP synthases from the revised manuscipt as they are not of great importance for this manuscript. Instead, the new in-situ data clearly show mitochondria (with their characteristic double membrane and cristae) interacting with manchette microtubule (new Fig 1C). __

      Line 189: "F-actin formed organized bundles running parallel to mMTs" - this observation needs confirming in a less disrupted sample.

      __Phalloidin (actin marker) was shown before to stain the manchette (PMID: 36734600). As actin filaments are very thin (7 nm) they are very hard to observe in plastic embedded EM. __

      In the in-situ data we added to the revised manuscipt (new Fig 1D), we observe filaments with a diameter corresponding to actin. In addition, we added more examples of microtubules interacting with actin in isolated manchette (new Fig. 6 E-K).

      Line 242 remove first comma sign.

      Removed.

      Line 363 "a total of 2 datasets" - is this manuscript based on only two tilt-series? Or two datasets from each of the 4 grids? In any case, this is very limited data.

      We apologise for not clearly providing the information about the data size in the original manuscipt. The data is based on three biological replicas (3 animals). We collected more than 100 tomograms of different regions of the manchettes. As such, we would argue that the data is not limited per se.

      Reviewer #1 (Significance (Required)):

      The article is very interesting, and if presented together with the suggested controls, would be informative to both microtubule/motorprotein researchers as well as those trying studying spermatogenesis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manchette appears as a shield-like structure surrounding the flagellar basal body upon spermiogenesis. It consists of a number of microtubules like a comb, but actin (Mochida et al. 1998 Dev. Biol. 200, 46) and myosin (Hayasaka et al. 2008 Asian J. Androl. 10, 561) were found, suggesting transportation inside the manchette. Detailed structural information and functional insight into the manchette was still awaited. There is a hypothesis called IMT (intra-machette transport) based on the fact that machette and IFT (intraflagellar transport) share common components (or homologues) and on their transition along the stages of spermiogenesis. While IMT is considered as a potential hypothesis to explain delivery of centrosomal and flagellar components, no one has witnessed IMT at the same level as IFT. IMT has never been purified, visualized in motion or at high resolution. This study for the first time visualized manchette using high-end cryo-electron tomography of isolated manchettes, addressing structural characterization of IMT. The authors successfully microtubular bundles, vesicles located between microtubules and a linker-like structure connecting the vesicle and the microtubule. On multilamellar membranes in the vesicles they found particles and assigned them to ATPase complexes, based on intermediate (~60A) resolution structure. They further identified interesting structures, such as (1) particles on microtubules, which resemble dynein and (2) filaments which shows symmetry of F-actin. All the molecular assignments are consistent with their proteomics of manchettes.

      __We thank the reviewer for highlighting the novelty of our study.____ __

      Their assignment of ATPase will be strengthened by MS data, if it proves absence of other possible proteins forming such a membrane protein complex.

      All the ATPase components were indeed found in our proteomics data. Nevertheless, we have decided to remove the averaging of the ATPase as it does not directly relate to IMT, the focus of this manuscript.

      They discussed possible role of various motor proteins based on their abundance (Line 134-151, Line 200). This makes sense only with a control. Absolute abundance of proteins would not necessarily present their local importance or roles. This reviewer would suggest quantitative proteomics of other organelles, or whole cells, or other fractions obtained during manchette isolation, to demonstrate unique abundance of KIF27 and other proteins of their interest.

      We agree with the reviewer that absolute abundance does not necessarily indicate importance or a role. As such, we removed this part of the discussion from the revised manuscript.

      A single image from a tomogram, Fig.6B, is not enough to prove actin-MT interaction. A gallery and a number (how many such junctions were found from how many MTs) will be necessary.

      We agree that one example is not enough. In the new Fig. 6E-K, we provide a gallery of more examples. We have revised the text to reflect the point that these observations are still rare and more data will be needed to quantify this interaction (Lines 253-254).

      Minor points: Their manchette purification is based on Mochida et al., which showed (their Fig.2) similarity to the in vivo structure (for example, Fig.1 of Kierszenbaum 2001 Mol. Reproduc. Dev. 59, 347). Nevertheless, since this is not a very common prep, it is helpful to show the isolated manchette’s wide view (low mag cryo-EM or ET) to prove its intactness.

      We thank the reviewer for this suggestion, in the revised version, new Fig. 2 provides a cryo-EM overview of purified manchette from different developmental stages.

      Line 81: Myosin -> myosin (to be consistent with other protein names)

      Corrected.

      This work is a significant step toward the understanding of manchettes. While the molecular assignment of dynein and ATPase is not fully decisive, due to limitation of resolution (this reviewer thinks the assignment of actin filament is convincing, based on its helical symmetry), their speculative model still deserves publication.

      Reviewer #2 (Significance (Required)):

      This work is a significant step toward the understanding of manchettes. While the molecular assignment of dynein and ATPase is not fully decisive, due to limitation of resolution (this reviewer thinks the assignment of actin filament is convincing, based on its helical symmetry), their speculative model still deserves publication.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      ->Summary:

      The manchette is a temporary microtubule (MT)-based structure essential for the development of the highly polarised sperm cell. In this study, the authors employed cryo-electron tomography (cryo-ET) and proteomics to investigate the intra-manchette transport system. Cryo-EM analysis of purified rat manchette revealed a high density of MTs interspersed with actin filaments, which appeared either bundled or as single filaments. Vesicles were observed among the MTs, connected by stick-like densities that, based on their orientation relative to MT polarity, were inferred to be kinesins. Subtomogram averaging (STA) confirmed the presence of dynein motor proteins. Proteomic analysis further validated the presence of dynein and kinesins and showed the presence of actin crosslinkers that could bundle actin filaments. Proteomics data also indicated the involvement of actin-based transport mediated by myosin. Importantly, the data indicated that the intraflagellar transport (IFT) system is not part of the intra-manchette transport mechanism. The visualisation of motor proteins directly from a biological sample represents a notable technical advancement, providing new insights into the organisation of the intra-manchette transport system in developing sperm.

      We thank the reviewer for summarising the novelty of our observations.

      -> Are the key conclusions convincing? Below we comment on three main conclusions. MT and F-actin bundles are both constituents of the manchette While the data convincingly shows that MT and F-actin are part of the manchette, one cannot conclude from it that F-actin is an integral part of the manchette. The authors would need to rephrase so that it is clear that they are speculating.

      We have rephrased our statements and replaced “integral” with ‘actin filaments are associated’. Of note previous studies suggested actin are part of the manchette including staining with phalloidin (PMID: 36734600, PMID: 9698455, PMID: 18478159) and we here visualised the actin in high resolution.

      The transport system employs different transport machinery on these MTs Proteomics data indicates the presence of multiple motor proteins in the manchette, while cryo-EM data corroborates this by revealing morphologically distinct densities associated with the MTs. However, the nature of only one of these MT-associated densities has been confirmed-specifically, dynein, as identified through STA. The presence of kinesin or myosin in the EM data remains unconfirmed based on just the cryo-ET density, and therefore it is unclear whether these proteins are actively involved in cargo transport, as this cannot be supported by just the proteomics data. In summary, we recommend that the authors rephrase this conclusion and avoid using the term "employ".

      We agree that our cryo-ET only confirmed the motor protein dynein. As such, we removed the term employ and rephrased our claims regarding the active transport and accordingly changed the title.

      Dynein mediated transport (Line 225-227) The data shows that dynein is present in the manchette; however, whether it plays and active role in transport cannot be determined from the cryo-ET data provided in the manuscript, as it does not clearly display a dynein-dynactin complex attached to cargo. The attachment to cargo is also not revealed via proteomics as no adaptor proteins that link dynein-dynactin to its cargo have been shown.

      A list of cargo adaptor proteins were found in our proteomics data but we agree that cryo-ET and proteomics alone cannot prove active transport. As such we toned down the discussion about active transport (lines 212-220).

      -> Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      F-actin • In the abstract, the authors state that F-actin provides tracks for transport as well as having structural and mechanical roles. However, the manuscript does not include experiments demonstrating a mechanical role. The authors appear to base this statement on literature where actin bundles have been shown to play a mechanical role in other model systems. We suggest they clarify that the mechanical role the authors suggest is speculative and add references if appropriate.

      __ ____We removed the claim about the mechanical role of the actin from the abstract and rephrased this in the discussion to suggest this role for the F-actin (lines 242-243).__

      • Lines 15,92, 180 and 255: The statement "Filamentous actin is an integral part of the manchette" is misleading. While the authors show that F-actin is present in their purified manchette structures, whether it is integral has not been tested. Authors should rephrase the sentence.

      We removed the word integral.

      • To support the claim that F-actin plays a role in transport within the manchette, the authors present only one instance where an unidentified density is attached to an actin filament. This is insufficient evidence to claim that it is myosin actively transporting cargo. Although the proteomics data show the presence of myosin, we suggest the authors exercise more caution with this claim.

      We agree that our data do not demonstrate active transport as such we removed that claim. We mention the possibility of cargo transport in the discussion (lines 250-255).

      • The authors mention the presence of F-actin bundles but do not show direct crosslinking between the F-actin filaments. They could in principle just be closely packed F-actin filaments that are not necessarily linked, so the term "bundle" should be used more cautiously.

      We do not assume that a bundle means that the F-actin filaments are crosslinked. A bundle simply indicates the presence of multiple F-actin filaments together. We rephrased it to call them actin clusters.

      Observations of dynein • Relating to Figure 2B: From the provided image it is not clear whether the density corresponds to a dynein complex, as it does not exhibit the characteristic morphological features of dynein or dynactin molecules.

      We indeed do not claim that the densities in this figure are dynein or dynactin. __We revised this paragraph and hope that it is now more clear (lines 135-137). __

      • Lines 171-172 and Figure 4: It is well established that dynein is a dimer and should always possess two motor domains. The authors have incorrectly assumed they observed single motor heads, except possibly in Figure 4A (marked by an arrow). In all other instances, the dynein complexes show two motor domains in proximity, but these have not been segmented accurately. Furthermore, the "cargos" shown in grey are more likely to represent dynein tails or the dynactin molecule, based on comparisons with in vitro structures of these complexes (see references 1-3).

      We thank the reviewer for this correction. We improved the annotations in the figure and revised the text to clarify that we identified dimers of dynein motor heads (lines 140-144). We further added a projection of a dynein dynactin complex to compare to the observation on the manchette (new Fig. 5E). We further changed claims on the presence of protein cargo to the presence of dynein/dynactin that allows cargo tethering based on the presence of cargo adaptors in the proteomics data.

      • Lines 21, 173, and 233 mention cargos, but as noted above, it seems to be parts of the dynein complex the authors are referring to.

      This was corrected as mentioned above.

      • Panel 4B appears to show a dynein-dynactin complex, but whether there is a cargo is unclear and if there is it should be labelled accordingly. To assessment of whether there is any cargo bound to the dynein-dynactin complex a larger crop of the panel would be helpful In summary, we recommend that the authors revisit their segmentations in Figures 2B and 4, revise their text based on these observations, and perform quantification of the data (as suggested in the next section).

      We thank the reviewers for sharing their expertise on dynein-dynactin complexes. We have revised the text as detailed above and excluded the assignment of any cargo, as we cannot (even from larger panels) see a clear association of cargo. We have made clear that we only refer to dynein dynactin with the capability of linking cargo based on the presence of proteomics data. We have removed claims on active transport with dynein.

      Dynein versus kinesin-based transport The calculation presented in lines 147-151 does not account for the fact that both the dynein-dynactin complex and kinesin proteins require cargo adaptors to transport cargo. Additionally, the authors overlook the possibility that multiple motors could be attached to a single cargo. If the authors did not observe this, they should explicitly mention it to support their argument. In short, the calculations are based on an incorrect premise, rendering the comparison inaccurate. Unless the authors have identified any dynein-dynactin or kinesin cargo adaptors in their proteomics data which could be used for such a comparison, we believe the authors lack sufficient data to accurately estimate the "active transport ratio" between dynein and kinesin.

      Even though we detect cargo adaptors in our proteomics, we agree that calculating relative transport based only on the proteomics can be inaccurate as such we removed absolute quantification and comparison between dynein and kinesin-based IMT.

      • Would additional experiments be essential to support the claims of the paper?

      F-actin distance and length distribution • To support the claim that F-actin is bundled (line 189), could the authors provide the distance between each F-actin filament and its neighbours? Additionally, could they compare the average distance to the length of actin crosslinkers found in their proteomics data, or compare it to the distances between crosslinked F-actin observed in other research studies?

      We measured distances between the actin filaments and added a plot to new Fig 6.

      • While showing that F-actin is important for the manchette would require cellular experiments, authors could provide quantification of how frequently these actin structures are observed in comparison to MTs to support their claims that these actin filaments could be important for the manchette structure.

      We agree that claims on the role and function of actin in the manchette require cellular experiments that are beyond the scope of this study. Absolute quantification of the ratio between MTs and actin from cryoET is very hard and will be inaccurate as the manchette cannot be imaged as a whole due to its size and thickness. The ratio we have is based on the relative abundance provided by the proteomics (Fig. 5F).

      • In line 193, the authors claim that the F-actin in bundles appears too short for transport. Could they provide length distributions for these filaments? This might provide further support to their claim that individual F-actin filaments can serve as transport tracks (line 266).

      __In addition to the limitation mentioned in the previous point, quantification of length from high magnification imaging will likely be inaccurate as the length of the actin in most cases is bigger than the field of view that is captured. Nevertheless, we removed the claim about the actin being too short for transport. __

      • Could the authors also quantify the abundance of individual F-actin filaments observed, compared to MTs and F-actin bundles, to support the idea that they could play a role in transport?

      As explained for the above points absolute quantification of the ratio between MTs and actin is not feasible from cryoET data that cannot capture all of the manchette in high enough resolution to resolve the actin.

      • In the discussion, the authors mention "interactions between F-actin singlets and mMTs" (line 269), yet they report observing only one instance of this interaction (lines 210 and 211). Given the limited data, they should refer to this as a single interaction in the discussion. The scarcity of data raises questions about how representative this event truly is.

      We agree that one example is not enough. In the new Fig. 6E-K, we provide a gallery of more examples as also requested by reviewers 1 and 2. We have also revised the text to reflect the point that these observations are still rare (Lines 190-194).

      Quantifications for judgement of representativity The authors should quantify how often they observed vesicles with a stick-like connection to MTs (lines 106-107); this would strengthen the interpretation of the density, as currently only one example is shown in the manuscript (Figure 4A). If possible, they could show how many of them are facing towards the MT plus end.

      __As mentioned in the text (lines 135-137), the linkers connecting vesicles to MTs were irregular and so we could not interpret them further this is in contrast to dynein that were easily recognisable but were not associated with vesicles. __

      Dynein quantifications • The authors are recommended to quantify how many dynein molecules per micron of MT they observe and how often they are angled with their MT binding domain towards the minus-end.

      As the manchette is large and highly dense any quantification will likely be biased towards parts of the manchette that are easier to image, for example the periphery. As such we do not think quantifying the dynein density will yield meaningful insight.

      • Could the authors quantify how many dynein densities they found to be attached to a (vesicle) cargo, if any (line 175)? They could show these observations in a supplementary figure.

      We did not observe any case of a connection between a vesicle and dynein motors, we edited this sentence to be more clear on that.

      • For densities that match the size and location of dynein but lack clear dynein morphology (as seen in Figure 2B), could the authors quantify how many are oriented towards the MT minus end?

      We had many cases where the connection did not have a clear dynein morphology, and as the morphology is not clear, it is impossible to make a claim about whether they are oriented towards the minus end.

      Artefacts due to purification: Authors should discuss if the purification could have effects on visualizing components of the manchette. For example, if it has effect on the MTs and actin structure or the abundance/structure of the motor protein complexes (bound to cargo or isolated).

      We have followed a protocol that was published before and showed the overall integrity of the manchette. Nevertheless, losing connections between manchette and other cellular organelles are expected. To address this point, we added in-situ data (new Fig 1) showing manchette in intact spermatids interacting with vesicles and mitochondria, as well as overviews of manchettes (new Fig 2), the text was revised accordingly.

      • Are the experiments adequately replicated and statistical analysis adequate? The cryo-ET data presented in the manuscript is collected using two separate sample preparations. Along with the quantifications of the different observations suggested above which will help the reader assess how abundant and representative these observations are, the authors could further strengthen their claims by acquiring data from a third sample preparation and then analysing how consistent their observations are between different purifications. This however could be time consuming so it is not a major requirement but recommended if possible within a short time frame.

      We regret not explicitly mentioning our data set size, it was added now to the revised version. In essence, the data is based on three biological replicas (3 animals). We collected more than 100 tomograms of different regions of the manchettes. We provided in the revised version more observations (new Fig 1, 2, 4B-C and 6E-K).

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Most of the comments deal with either modifying the text or analysing the data already presented, so the revision could be done with 1-3 months.


      Minor comments: - Specific experimental issues that are easily addressable. 1) Could the authors state how many tilt series were collected for each dataset/independent sample preparation? We recommend that they upload their raw data or tomograms to EMPAIR.

      We added this information in the material and methods.

      2) It is not clear to me if the same sample was used for cryo-ET and proteomics. Could the authors clarify how comparable the sample preparation for the cryo-ET and proteomics data is or if the same sample was used for both. If there is a discrepancy between these preparations, they would need to discuss how this can affect comparing observations from cryo-ET and mass spectrometry. Ideally both samples should be the same.

      After sample preparation the manchettes were directly frozen on grids. The rest of the samples was used for proteomics. Consequently, EM and MS data were acquired on the same samples. We clarified this in the text (lines 327-328).

      • Are prior studies referenced appropriately? We recommend including additional references to support the claim that F-actin has a mechanical role (line 242). Could the authors compare their proteomics data to other mass spectrometry studies conducted on the Manchette (for example, see reference 4)?

      We added the comparison but it is important to point out that in reference 4 the manchettes were isolated from mice testes.

      • Are the text and figures clear and accurate? Text: We do not see the necessity of specifying the microtubules (MTs) in the data as "manchette MTs" or "mMTs" rather than simply "MTs". However, we recommend that the authors use either "MT" or "mMT" consistently throughout the manuscript.

      We changed to only MTs.

      The authors appear to refer to both dynein-1 (cytoplasmic dynein) and dynein-2 (axonemal dynein or IFT dynein). To avoid confusion, it is important that the authors clearly specify which dynein they are referring to throughout the text. This is particularly relevant as the study aims to demonstrate that IFT is not part of the manchette transport system.

      • Introduction: In the third paragraph (lines 59-75), the authors should specify that they are referring to dynein-2, which is distinct from cytoplasmic dynein discussed in the previous paragraph (lines 44-58).

      We specify the respective dyneins in the text (line 66,140-141,145).

      • Figure 4D: The authors could fit a dynein-1 motor domain instead of a dynein-2 into the density to stay consistent with the fact that the density belongs to cytoplasmic dynein-1.

      __We changed the figure and fitted a cytosolic dynein-1 structure (5nvu) instead. __

      Figures: • Figure 2B: The legend mentions a large linker complex; however, this may correspond to two or three separate densities.

      We have addressed this and changed the wording.

      • Figure 4: please revisit the segmentation of this whole figure based on previous comments.

      __We revised as suggested. __

      • Figures 1, 2, 4, 5, and 6: It would be helpful to state in the legends that the tomograms are denoised. There are stripe-like densities visible in the images (e.g., in the vesicle in Figure 2B). Do these artefacts also appear in the raw data?

      As stated in the Methods section, tomograms were generally denoised with CryoCare for visualisation purposes. The “stripe-like densities” are artefacts of the gold fiducials used for tomogram alignment and appear in the raw data (before denoising).

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? We suggest revising the paragraph title "Dynein-mediated cargo along the manchette" (line 165) to "Dynein-mediated cargo transport along the manchette".

      __We have changed this in the revised version. __

      We recommend that the authors provide additional evidence to support the interpretation that the observed EM densities correspond to motor proteins. Specifically: • Include scale bars or reference lines indicating the known dimensions of motor proteins, based on previous data, to demonstrate that the observed densities match the expected size.

      The dynein structure is provided for reference. We also added the cytosolic dynein–dynactin as a reference (Fig 5E).

      • Make direct comparisons to existing EM data and highlight morphological similarities.

      We have added a comparison to existing data (Fig 5E).

      In the discussion (lines 249-254), the authors could speculate on alternative roles for the IFT components in the manchette, particularly if they are not part of the IFT trains. We also suggest rephrasing the claim in line 266 to make it more speculative in tone.

      __We have addressed this in the revised version (lines 221-230). __

      Finally, a schematic overview of the manchette ultrastructure in a spermatid would greatly aid the reader in understanding the material presented.

      We now include a graphical abstract and overviews of isolated manchettes on cryo-EM grids.

      References: 1. Chowdhury, S., Ketcham, S., Schroer, T. et al. Structural organization of the dynein-dynactin complex bound to microtubules. Nat Struct Mol Biol 22, 345-347 (2015). https://doi.org/10.1038/nsmb.2996

      1. Grotjahn, D.A., Chowdhury, S., Xu, Y. et al. Cryo-electron tomography reveals that dynactin recruits a team of dyneins for processive motility. Nat Struct Mol Biol 25, 203-207 (2018). https://doi.org/10.1038/s41594-018-0027-7

      2. Chaaban, S., Carter, A.P. Structure of dynein-dynactin on microtubules shows tandem adaptor binding. Nature 610, 212-216 (2022).https://doi.org/10.1038/s41586-022-05186-y

      3. W. Hu, R. Zhang, H. Xu, Y. Li, X. Yang, Z. Zhou, X. Huang, Y. Wang, W. Ji, F. Gao, W. Meng, CAMSAP1 role in orchestrating structure and dynamics of manchette microtubule minus-ends impacts male fertility during spermiogenesis, Proc. Natl. Acad. Sci. U.S.A. 120 (45) e2313787120, https://doi.org/10.1073/pnas.2313787120 (2023).

      Reviewer #3 (Significance (Required)):

      This study employs cryo-electron tomography (cryo-ET) and proteomics to elucidate the architecture of the manchette. It advances our understanding of the components involved in intracellular transport within the manchette and introduces the following technical and conceptual innovations:

      a) Technical Advances: The authors have visualized the manchette at high resolution using cryo-ET. They optimized a purification pipeline capable of retaining, at least partially, the transport machinery of the manchette. Notably, they observed dynein and putative kinesin motors attached to microtubules-a significant achievement that, to our knowledge, has not been reported previously.

      b) Conceptual Advances: This study provides novel insights into spermatogenesis. The findings suggest that intraflagellar transport (IFT) is unlikely to play a role at this stage of sperm development while shedding light on alternative transport systems. Importantly, the authors demonstrate that actin filaments organize in two distinct ways: clustering parallel to microtubules or forming single filaments.

      This work is likely to be of considerable interest to researchers in sperm development and structural biology. Additionally, it may appeal to scientists studying motor proteins and the cytoskeleton.

      We thank the reviewers for appreciating the significance and novelty of our study.

      The reviewers possess extensive expertise in in situ cryo-electron tomography and single-particle microscopy, including work on dynein-based complexes. Collectively, they have significant experience in the field of cytoskeleton-based transport.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that at a given time averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.

      The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.

      The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore it is not clear whether the results generalize beyond the specifics of the setup here.

      Strengths:

      The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.

      The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).

      The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable, although the authors only show significant clustering - there is no analysis of its grid-like regularity.

      First of all, we would like to thank the reviewer for their comprehensive feedback, and their insightful comments. Importantly, as you point out, our goal with this model was to build a minimal model of place cell representations, where representations were encouraged to be place-like, but free to vary in tuning and firing locations. By doing so, we could explore what upstream representations facilitate place-like representations, and even remapping (as it turned out) with minimal assumptions. However, we agree that our task does not capture some of the nuances of real-world navigation, such as sensory observations, which could be useful extensions in future work. Then again, the simplicity of our setup makes it easier to interpret the model, and makes it all the more surprising that it learns many behaviors exhibited by real world place cells.

      As to the distribution of phases - we also agree that a hexagonal arrangement likely reflects some optimal configuration for decoding of location.

      And we agree that the symmetry within the experimental data is important; we have revised analyses on experimental phase distributions, and included an analysis of ensemble grid score, to quantify any hexagonal symmetries within the data.

      Weaknesses:

      The navigation problem that needs to be solved by the model is a bit of an odd one. Without any initial position information, the network needs to figure out where it is, and then path-integrate with respect to a velocity signal. As the authors remark in Methods 4.2, without additional input, the only way to infer location is from border interactions. It is like navigating in absolute darkness. Therefore, it seems likely that the salient wall representations found in the recurrent units are just a consequence of the specific navigation task here; it is unclear if the same would apply in natural navigation. In natural navigation, there are many more sensory cues that help inferring location, most importantly vision, but also smell and whiskers/touch (which provides a more direct wall interaction; here, wall interactions are indirect by constraining velocity vectors). There is a similar but weaker concern about whether the (place cell like) localised firing fields of the output units are a direct consequence of the decoding procedure that only considers activity center locations.

      Thank you for raising this point; we absolutely agree that the navigation task is somewhat niche. However, this was a conscious decision, to minimize any possible confounding from alternate input sources, such as observations. In part, this experimental design was inspired by the suggestion that grid cells support navigation/path integration in open-field environments with minimal sensory input (as they could, conceivably do so with no external input). This also pertains to your other point, that boundary interactions are necessary for navigation. In our model, using boundaries is one solution, but there is another way around this problem, which is conceivably better: to path integrate in an egocentric frame, starting from your initial position. Since the locations of place fields are inferred only after a trajectory has been traversed, the network is free to create a new or shifted representation every time, independently of the arena. In this case, one might have expected generalized solutions, such as grid cells to emerge. That this is not the case, seems to suggest that grid cells may somehow not be optimal for pure path integration, or at the very least, hard to learn (but may still play a part, as alluded to by place field locations). We have tried to make these points more evident in the revised manuscript.

      As for the point that the decoding may lead to place-like representations, this is a fair point. Indeed, we did choose this form of decoding, inspired by the localized firing of place cells, in the hope that it would encourage minimally constrained, place-like solutions. However, compared to other works (Sorscher and Xu) hand tuning the functional form of their place cells, our (although biased towards centralized tuning curves) allows for flexible functional forms such as the position of the place cell centers, their tuning width, whether or not it is center-surround activity, and how they should tune to different environments/rooms. This allows us to study several features of the place cell system, such as remapping and field formation. We have revised to make this more clear in the model description.

      The conclusion that 'contexts are attractive' (heading of section 2) is not well-supported. The authors show 'attractor-like behaviour' within a single context, but there could be alternative explanations for the recovery of stable ratemaps after noise injection. For example, the noise injection could scramble the network's currently inferred position, so that it would need to re-infer its position from boundary interactions along the trajectory. In that case the stabilisation would be driven by the input, not just internal attractor dynamics. Moreover, the authors show that different contexts occupy different regions in the space of low-dimensional projections of recurrent activity, but not that these regions are attractive.

      We agree that boundary interactions could facilitate the convergence of representations after noise injection. We did try to moderate this claim by the wording “attractor-like”, but we agree that boundaries could confound this result. We have therefore performed a modified noise injection experiment, where we let the network run for an extended period of time, before noise injection (and no velocity signal), see Appendix Velocity Ablation in the revised text. Notably, representations converge to their pre-scrambled state after noise injection, even without a velocity signal. However, place-like representations do not converge for all noise levels in this case, possibly indicating that boundary interactions do serve an error-correcting function, also. Thank you for pointing this out.

      As for the attractiveness of contexts, we agree that more analyses were required to demonstrate this. We have therefore conducted a supplementary analysis where we run the trained network with a mismatch in context/geometry, and demonstrate that the context signal fixes the representation, up to geometric distortions.

      The authors report empirical data that shows clustering of place cell centers like they find for their output units. They report that 'there appears to be a tendency for the clusters to arrange in hexagonal fashion, similar to our computational findings'. They only quantify the clustering, but not the arrangement. Moreover, in Figure 7e they only plot data from a single animal, then plot all other animals in the supplementary. Does the analysis of Fig 7f include all animals, or just the one for which the data is plotted in 7e? If so, why that animal? As Appendix C mentions that the ratemap for the plotted animal 'has a hexagonal resemblance' whereas other have 'no clear pattern in their center arrangements', it feels like cherrypicking to only analyse one animal without further justification.

      Thank you for pointing this out; we agree that this is not sufficiently explained and explored in the current version. We have therefore conducted a grid score analysis of the experimental place center distributions, to uncover possible hexagonal symmetries. The reason for choosing this particular animal was in part because it featured the largest number of included cells, while also demonstrating the most striking phase distribution, while including all distributions in the supplementary. Originally, this was only intended as a preliminary analysis, suggesting non-uniformity in experimental place field distributions, but we realize that these may all provide interesting insight into the distributional properties of place cells.

      We have explained these choices in the revised text, and expanded analyses on all animals to showcase these results more clearly.

      Reviewer #2 (Public Review):

      Summary:

      The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entorhinal border cells and CA1 place cells. The authors also suggested the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.

      Strengths:

      The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes a plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. This result is consistent with the observation that grid cells are unnecessary to generate CA1 place cells.

      The suggestion about the remapping mechanism shows an interesting theoretical possibility.

      We thank the reviewer for their kind feedback.

      Weaknesses:

      The explicit mechanisms of generating border cells and place cells and those underlying remapping were not clarified at a satisfactory level.

      The model cannot generate entorhinal grid cells. Therefore, how the proposed model is integrated into the entire picture of the hippocampal mechanism of memory processing remains elusive.

      We appreciate this point, and hope to clarify: From a purely architectural perspective, place-like representations are generated by linear combinations of recurrent unit representations, which, after training, appear border-like. During remapping, the network is simply evaluated/run in different geometries/contexts, which, it turns out, causes the network to exhibit different representations, likely as solutions to optimally encoding position in the different environments. We have attempted to revise the text to make some of these interpretations more clear. We have also conducted a supplementary analysis to demonstrate how representations are determined by the context signal directly, which helps to explain how recurrent and output units form their representations.

      We also agree that our model does not capture the full complexity of the Hippocampal formation. However, we would argue that its simplicity (focusing on a single cell type and a pure path integration task), acts as a useful baseline for studying the role of place cells during spatial navigation. The fact that our model captures a range of place cell behaviors (field formation, remapping and geometric deformation) without grid cells also point to several interesting possibilities, such that grid cells may not be strictly necessary for place cell formation and remapping, or that border cells may account for many of the peculiar behaviors of place cells. However, we wholeheartedly agree that including e.g. sensory information and memory storage/retrieval tasks would prove a very interesting extension of our model to more naturalistic tasks and settings. In fact, our framework could easily accommodate this, e.g. by decoding contexts/observations/memories from the network state, alongside location.

      Reviewer #3 (Public Review):

      Summary:

      The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.

      Strengths:

      The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.

      The paper and ideas were well explained.

      Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.

      Figure 7 was striking, and potentially very interesting.

      It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.

      Thank you for your insightful comments! Regarding the low path integration error, there is a slight statistical signal from the boundaries, as trajectories tend to turn away from arena boundaries. However, we agree, that studying path integration performance in the face of noise would make for a very interesting future development.

      Weaknesses:

      I felt that the stated neuroscience interpretations were not well supported by the presented evidence, for a few reasons I'll now detail.

      First, I was unconvinced by the interpretation of the reported recurrent cells as border cells. An equally likely hypothesis seemed to be that they were positions cells that are linearly encoding the x and y position, which when your environment only contains external linear boundaries, look the same. As in figure 4, in environments with internal boundaries the cells do not encode them, they encode (x,y) position. Further, if I'm not misunderstanding, there is, throughout, a confusing case of broken symmetry. The cells appear to code not for any random linear direction, but for either the x or y axis (i.e. there are x cells and y cells). These look like border cells in environments in which the boundaries are external only, and align with the axes (like square and rectangular ones), but the same also appears to be true in the rotationally symmetric circular environment, which strikes me as very odd. I can't think of a good reason why the cells in circular environments should care about the particular choice of (x,y) axes... unless the choice of position encoding scheme is leaking influence throughout. A good test of these would be differently oriented (45 degree rotated square) or more geometrically complicated (two diamonds connected) environments in which the difference between a pure (x,y) code and a border code are more obvious.

      Thank you for pointing this out. This is an excellent point, that we agree could be addressed more rigorously. Note that there is no position encoding in our model; the initial state of the network is a vector of zeros, and the network must infer its location from boundary interactions and context information alone. So there is no way for positional information to leak through to the recurrent layer directly. However, one possible reason for the observed symmetry breaking, is the fact that the velocity input signal is aligned with the cardinal directions. To investigate this, we trained a new model, wherein input velocities are rotated 45 degrees relative to the horizontal, as you suggest. The results, shown and discussed in appendix E (Learned recurrent representations align with environment boundaries), do indicate that representations are tuned to environment boundaries, and not the cardinal directions, which hopefully improves upon this point.

      Next, the decoding mechanism used seems to have forced the representation to learn place cells (no other cell type is going to be usefully decodable?). That is, in itself, not a problem. It just changes the interpretation of the results. To be a normative interpretation for place cells you need to show some evidence that this decoding mechanism is relevant for the brain, since this seems to be where they are coming from in this model. Instead, this is a model with place cells built into it, which can then be used for studying things like remapping, which is a reasonable stance.

      This is a great point, and we agree. We do write that we perform this encoding to encourage minimally constrained place-like representations (to study their properties), but we have revised to make this more evident.

      However, the remapping results were also puzzling. The authors present convincing evidence that the recurrent units effectively form 6 different maps of the 6 different environments (e.g. the sparsity of the code, or fig 6a), with the place cells remapping between environments. Yet, as the authors point out, in neural data the finding is that some cells generalise their co-firing patterns across environments (e.g. grid cells, border cells), while place cells remap, making it unclear what correspondence to make between the authors network and the brain. There are existing normative models that capture both entorhinal's consistent and hippocampus' less consistent neural remapping behaviour (Whittington et al. and probably others), what have we then learnt from this exercise?

      Thanks for raising this point! We agree that this finding is surprising, but we hold that it actually shows something quite important: that border-type units are sufficient to create place-like representations, and learns several of the behaviors associated with place cells and remapping (including global remapping and field stretching). In other words, a single cell type known to exist upstream of place cells is sufficient to explain a surprising range of phenomena, demonstrating that other cell types are not strictly necessary. However, we agree that understanding why the boundary type units sometimes rate remap, and whether that can be true for some border type cells in the brain (either directly, or through gating mechanisms) would be important future developments. Related to this point, we also expanded upon the influence of the context signal for representation selection (appendix F)

      Concerning the relationship to other models, we would argue that the simplicity of our model is one of its core strengths, making it possible to disentangle what different cell types are doing. While other models, including TEM, are highly important for understanding how different cell types and brain regions interact to solve complex problems, we believe there is a need for minimal, understandable models that allows us to investigate what each cell type is doing, and this is where we believe our work is important. As an example, our model not only highlights the sufficiency of boundary-type cells as generators of place cells, its lack of e.g. grid cells also suggest that grid cells may not be strictly necessary for e.g. open-field/sensory-deprived navigation, as is often claimed.

      One striking result was figure 7, the hexagonal arrangement of place cell centres. I had one question that I couldn't find the answer to in the paper, which would change my interpretation. Are place cell centres within a single clusters of points in figure 7a, for example, from one cell across the 100 trajectories, or from many? If each cluster belongs to a different place cell then the interpretation seems like some kind of optimal packing/coding of 2D space by a set of place cells, an interesting prediction. If multiple place cells fall within a single cluster then that's a very puzzling suggestion about the grouping of place cells into these discrete clusters. From figure 7c I guess that the former is the likely interpretation, from the fact that clusters appear to maintain the same colour, and are unlikely to be co-remapping place cells, but I would like to know for sure!

      This is a good point, and you are correct: one cluster tends to correspond to one unit. To make this more clear, we have revised Fig. 7, so that each decoded center is shaded by unit identity, which makes this more evident. And yes, this is, seemingly in line with some form of optimal packing/encoding of space, yes!

      I felt that the neural data analysis was unconvincing. Most notably, the statistical effect was found in only one of seven animals. Random noise is likely to pass statistical tests 1 in 20 times (at 0.05 p value), this seems like it could have been something similar? Further, the data was compared to a null model in which place cell fields were randomly distributed. The authors claim place cell fields have two properties that the random model doesn't (1) clustering to edges (as experimentally reported) and (2) much more provocatively, a hexagonal lattice arrangement. The test seems to collude the two; I think that nearby ball radii could be overrepresented, as in figure 7f, due to either effect. I would have liked to see a computation of the statistic for a null model in which place cells were random but with a bias towards to boundaries of the environment that matches the observed changing density, to distinguish these two hypotheses.

      Thanks for raising this point. We agree that we were not clear enough in our original manuscript. We included additional analyses in one animal, to showcase one preliminary case of non-uniform phases. To mitigate this, we have performed the same analyses for all animals, and included a longer discussion of these results (included in the supplementary material). We have also moderated the discussion on Ripley’s H to encompass only non-uniformity, and added a grid score analysis to showcase possible rotational symmetries in the data. We hope this gets our findings across more clearly

      Some smaller weaknesses:

      - Had the models trained to convergence? From the loss plot it seemed like not, and when including regularisors recent work (grokking phenomena, e.g. Nanda et al. 2023) has shown the importance of letting the regularisor minimise completely to see the resulting effect. Else you are interpreting representations that are likely still being learnt, a dangerous business.

      Longer training time did not seem to affect representations. However, due to the long trajectories and statefulness involved, training was time-intensive and could become unstable for very long training. We therefore stopped training at the indicated time.

      - Since RNNs are nonlinear it seems that eigenvalues larger than 1 doesn't necessarily mean unstable?

      This is a good point; stability is not guaranteed. We have updated the text to reflect this.

      - Why do you not include a bias in the networks? ReLU networks without bias are not universal function approximators, so it is a real change in architecture that doesn't seem to have any positives?

      We found that bias tended to have a detrimental effect on training, possibly related to the identity initialization used (see e.g. Le et al. 2015), and found that training improved when biases were fixed to zero.

      - The claim that this work provided a mathematical formalism of the intuitive idea of a cognitive map seems strange, given that upwards of 10 of the works this paper cite also mathematically formalise a cognitive map into a similar integration loss for a neural network.

      We agree that other works also provide ways of formalizing this concepts. However, our goal by doing so was to elucidate common features across these seemingly disparate models. We also found that the concept of a learned and target map made it easier to come up with novel models, such as one wherein place cells are constructed to match a grid cell label.

      Aim Achieved? Impact/Utility/Context of Work

      Given the listed weaknesses, I think this was a thorough exploration of how this network with these losses is able to path-integrate its position and remap. This is useful, it is good to know how another neural network with slightly different constraints learns to perform these behaviours. That said, I do not think the link to neuroscience was convincing, and as such, it has not achieved its stated aim of explaining these phenomena in biology. The mechanism for remapping in the entorhinal module seemed fundamentally different to the brain's, instead using completely disjoint maps; the recurrent cell types described seemed to match no described cell type (no bad thing in itself, but it does limit the permissible neuroscience claims) either in tuning or remapping properties, with a potentially worrying link between an arbitrary encoding choice and the responses; and the striking place cell prediction was unconvincingly matched by neural data. Further, this is a busy field in which many remapping results have been shown before by similar models, limiting the impact of this work. For example, George et al. and Whittington et al. show remapping of place cells across environments; Whittington et al. study remapping of entorhinal codes; and Rajkumar Vasudeva et al. 2022 show similar place cell stretching results under environmental shifts. As such, this papers contribution is muddied significantly.

      Thank you for this perspective; we agree that all of these are important works that arrive at complementary findings. We hold that the importance of our paper lies in its minimal nature, and its focus on place cells, via a purpose-built decoding that enables place-like representations. In doing so, we can point to possibly under explored relationships between cell types, in particular place cells and border cells, while challenging the necessity of other cell types for open-field navigation (i.e. grid cells). In addition, our work points to a novel connection between grid cells, place cells and even border cells, by way of the hexagonal arrangement of place unit centers. However, we agree that expanding our model to include more biologically plausible architectures and constraints would make for a very interesting extension in the future.

      Thank you again for your time, as well as insightful comments.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Even after reading Methods 5.3, I found it hard to understand how the ratemap population vectors that produce Fig 3e and Fig 5 are calculated. It's unclear to me how there can be a ratemap at a single timestep, because calculating a ratemap involves averaging the activity in each location, which would take a whole trajectory and not a single timestep. But I think I've understood from Methods 5.1 that instead the ratemap is calculated by running multiple 'simultaneous' trajectories, so that there are many visited locations at each timestep. That's a bit confusing because as far as I know it's not a common way to calculate ratemaps in rodent experiments (probably because it would be hard to repeat the same task 500 times, while the representations remain the same), so it might be worth explaining more in Methods 5.3.

      We understand the confusion, and have attempted to make this more clear in the revised manuscript. We did indeed create ratemaps over many trajectories for time-dependent plots, for the reasons you mentioned. We also agree that this would be difficult to do experimentally, but found it an interesting way to observe convergence of representations in our simulated scenario.

      Fig 3b-d shows multiple analyses to support output unit global remapping, but no analysis to support the claim that recurrent units remap by rate changes. The examples in Fig 3ai look pretty convincing, but it would be useful to also have a more quantitative result.

      We agree, and only showed that units turn off/become silent using ratemaps. We have therefore added an explicit analysis, showcasing rate remapping in recurrent units (see appendix G; Recurrent units rate remap)

      Reviewer #2 (Recommendations For The Authors):

      Some parts of the current manuscript are hard to follow. Particularly, the model description is not transparent enough. See below for the details.

      Major comments:

      (1) Mathematical models should be explained more explicitly and carefully. I had to guess or desperately search for the definitions of parameters. For instance, define the loss function L in eq.(1). Though I can assume L represents the least square error (in A.8), I could not find the definition in Model & Objective. N should also be defined explicitly in equation (3). Is this the number of output cells?

      Thank you for pointing this out, we have revised to make it more clear.

      (2) In Fig. 1d, how were the velocity and context inputs given to individual neurons in the network? The information may be described in the Methods, but I could not identify it.

      This was described in the methods section (Neural Network Architecture and Training), but we realize that we used confusing notation, when comparing with Fig. 1d. We have therefore changed the notation, and it should hopefully be clearer now. Thanks for pointing out this discrepancy.

      (3) I took a while to understand equations (3) and (4) (for instance, t is not defined here). The manuscript would be easier to read if equations (5) and (6) are explained in the main text but not on page 18 (indeed, these equations are just copies of equations 3 and 4). Otherwise, the authors may replace equations (3) and (4) with verbal explanations similar to figure legend for Fig. 1b.

      (4) Is there any experimental evidence for uniformly strong EC-to-CA1 projections assumed in the non-trainable decoder? This point should be briefly mentioned.

      Thank you for raising this point. The decoding from EC (the RNN) to CA1 (the output layer) consists of a trainable weight matrix, and may thus be non-uniform in magnitude. The non-trainable decoding acts on the resulting “CA1” representation only. We hope that improvements to the model description also makes this more evident.  

      (5) The explanation of Fig. 3 in the main text is difficult to follow because subpanels are explained in separate paragraphs, some of which are very short, as short as just a few lines.

      This presentation style makes it difficult to follow the logical relationships between the subpanels. This writing style is obeyed throughout the manuscript but is not popular in neuroscience.

      Thanks for pointing this out, we have revised to accommodate this.

      (6) Why do field centers cluster near boundaries? No underlying mechanisms are discussed in the manuscript.

      This is a good point; we have added a note on this; it likely reflects the border tuning of upstream units.

      (7) In Fig. 4, the authors presented how cognitive maps may vary when the shape and size of open arenas are modified. The results would be more interesting if the authors explained the remapping mechanism. For instance, on page 8, the authors mentioned that output units exhibit global remapping between contexts, whereas recurrent units mainly rate remapping.

      Why do such representational differences emerge?

      We agree! Thanks for raising this point. We have therefore expanded upon this discussion in section 2.4.

      (8) In the first paragraph of page 10, the authors stated ".. some output units display distinct field doubling (see both Fig. 4c), bottom right, and Fig. 4d), middle row)". I could not understand how Fig. 4d, middle row supports the argument. Similarly, they stated "..some output units reflect their main boundary input (with greater activity near one boundary)." I can neither understand what the authors mean to say nor which figures support the statement. Please clarify.

      This is a good point, there was an identifier missing; we have updated to refer to the correct “magnification”. Thanks!

      (9) The underlying mechanism of generating the hexagonal representation of output cells remains unclear. The decoder network uses a non-trainable decoding scheme based on localized firing patterns of output units. To what extent does the hexagonal representation depend on the particular decoding scheme? Similarly, how does the emergence of the hexagonal representation rely on the border representation in the upstream recurrent network? Showing several snapshots of the two place representations during learning may answer these questions.

      This is an interesting point, and we have added some discussion on this matter. In particular, we speculate whether it’s an optimal configuration for position reconstruction, which is demanded by the task and thus highly likely dependent on the decoding scheme. We have not reached a conclusive method to determine the explicit dependence of the hexagonal arrangement on the choice of decoding scheme. Still, it seems this would require comparison with other schemes. In our framework, this would require changing the fundamental operation of the model, which we leave as inspiration for future work. We have also added additional discussion concerning the relationship between place units, border units, and remapping in our model. As for exploring different training snapshots, the model is randomly initialized, which suggests that earlier training steps should tend to reveal unorganized/uninformative phase arrangements, as phases are learned as a way of optimizing position reconstruction. However, we do call for more analysis of experimental data to determine whether this is true in animals, which would strongly support this observation. We also hope that our work inspires other models studying the formation and remapping of place cells, which could serve as a starting point for answering this question in the future.

      (10) Figure 7 requires a title including the word "hexagonal" to make it easier to find the results demonstrating the hexagonal representations. In addition, please clarify which networks, p or g, gave the results shown here.

      We agree, and have added it!

      Minor comments:

      (11) In many paragraphs, conclusions appear near their ends. Stating the conclusion at the beginning of each paragraph whenever possible will improve the readability.

      We have made several rewrites to the manuscript, and hope this improves readability.

      (12) Figure A4 is important as it shows evidence of the CA1 spatial representation predicted by the model. However, I could not find where the figure is cited in the manuscript. The authors can consider showing this figure in the main text.

      We agree, and we have added more references to the experimental data analyses in the main text, as well as expanded this analysis.

      (13) The main text cites figures in the following format: "... rate mapping of Fig. 3a), i), boundary ...." The parentheses make reading difficult.

      We have removed the overly stringent use of double parentheses, thanks for letting us know.

      (14) It would be nice if the authors briefly explained the concept of Ripley's H function on page 14.

      Yes, we have added a brief descriptor.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Review 1:

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      We agree with the reviewers that the whole brain imaging approach is both a strength and a weakness. This manuscript and our previously published paper (Hotz et al., 2022) show indeed that the seizures have a initiation point and spread throughout the brain, interestingly affecting the telencephalon last. Localized seizure initiation was not the scope of this manuscript, however also here we would have to rely on imaging techniques. Using cell type specific drivers for specific neuronal subpopulation are an interesting approach, but outside of the scope of this study. An interesting approach would also include a more detailed analysis of glia in the context of epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      We also agree, that a more regional approach, after having more reliable information on the expression domains of the different galanin receptors, including more information on their respective role, is an important future research direction.

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We are in the process of preparing a manuscript describing a more detailed gene expression study of this and a chemically induced seizure model. Surprisingly we did not observe strong effects on glutamate receptor related genes. This does not preclude and indeed we deem it likely that additional factors play a role, e.g. other neuropeptides.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason to the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      Yes, we agree that galanin is likely not the only player. This warrants further investigations.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Review 2:

      Previous concerns about sex or developmental biological variables were addressed, as their model's seizure phenotype emerges rapidly and long prior to the establishment of zebrafish sexual maturity. However, in the course of re-review, some additional concerns (below) were detected that, if addressed, could further improve the manuscript. These concerns relate to how seizures were defined from the measurement of fluorescent calcium imaging data. Overall, this study is important and convincing, and carries clear value for understanding the multifaceted functions that neuronal galanin can perform under homeostatic and disease conditions.

      We are pleased that we could dispel the initial concerns.

      Additional Concerns:

      - The authors have validated their ability to measure behavioral seizures quantitatively in their 2022 Glia paper but the information provided on defining behavioral seizures was limited. The definition of behavioral seizure activity is not expanded upon in this paper, but could provide detail about how the behavioral seizures relate to a seizure detected via calcium imaging.

      In this paper we indeed do not address behavioral seizures but focus completely on neuronal seizures as defined in the material and methods section (“seizures were defined as calcium fluctuations reaching at least 100% of ΔF/F0 in the whole brain.”). Epileptic seizures in zebrafish, either evoked by pharmacological means or the result of genetic mutations, evoke stereotyped locomotor behavior in zebrafish as described in multiple publications (e.g. Baraban et al., 2005, Berghmans et al., 2007, Baxendale et al., 2012 and references therein).

      - Related to the previous point, for the calcium imaging, the difference between an increase in fluorescence that the authors think reflects increased neuronal activity and the fluorescence that corresponds to seizures is not very clear. This detail is necessary because exactly when the term "seizure" describes a degree of increased activity can be difficult to distinguish objectively.

      In our material and methods section, we describe our working definition of a seizure. Seizures are easily distinguished from increased activity by being synchronized.

      - The supplementary movies that were added were very useful, but raised some questions. For example, what brain regions were pulsating? What areas seemed to constantly exhibit strong fluorescence and was this an artifact? It seemed that sometimes there was background fluorescence in the body. Perhaps an anatomical diagram could be provided for the readers. In addition, there were some movies with much greater fluorescence changes - are these the seizures? These are some reasons for our request for clarified definitions of the term "seizure".

      The ”pulsating” (or “flickering”) brain activity is spontaneous neuronal activity. Some areas may appear to be more active, probably by a denser packing of neurons and intrinsically more spontaneous neuronal activity. However, since we only use normalized data, this does not affect our measurements.

      - While it is not critical to change, I will again note the possible confusion that the use of the word "sedative" in this context may cause. However, I do understand this is a stylistic choice.

      - Supplementary Figure 1B: the N values along the x-axis appear to have been duplicated and the duplications are offset and overlapping with one another by mistake.

      Thank you for pointing this out. We have corrected the figure accordingly.

      Review 3:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the revised manuscript still lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We agree that the mechanistic role of galanin still needs to be defined. The role is more complex that we expected, mainly due to its negative feedback properties. A complete mechanistic understanding will require a number of additional studies and is unfortunately outside of the scope of this manuscript.

      (2) The revised manuscript continues to heavily rely on calcium imaging of different mutant lines. Confirmation of knockouts has been provided with immunostaining in a new supplementary figure. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Cell recordings and biochemistry is challenging in the small larval zebrafish brain. We deem the genetic manipulations that we describe to be more informative than pharmacological experiments due to specificity issues.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank all the reviewers for their time and valuable feedback, which helped us improve our manuscript. Based on the comments, we have made several critical changes to the revised manuscript.

      (1) We have changed our threshold for detecting freezing epochs from 1 cm/s to 0 cm/s in this revised manuscript. This change allows us to capture periods when animals are completely still on the treadmill, better matching the "true freezing" behavior seen in freely moving set-ups. We have added a new supplementary video (Supplementary Video 2) that better demonstrates the freezing response we observe. All results and figures in the revised manuscript reflect this updated threshold (Figure 2-6, Supplementary Figures 16, Tables 1-6). Our main findings remain robust, demonstrating that freezing serves as a reliable conditioned response in our paradigms, comparable to freely moving animals. Specifically, freezing behavior increased reliably in the fear-conditioned environment following CFC across all paradigms. We have also added data from a no-shock control group (Supplementary Figure 2) which, when compared to the conditioned group, shows that freezing responses in the conditioned group result from fear conditioning rather than immobility. We do observe other avoidance behaviors unique to our treadmill-based task— such as hesitation, backward movement, and slow crawls. These conditioned behaviors are captured through a separate metric: the time taken to complete a lap.

      (2) As suggested by the reviewers, we have separately analyzed fear discrimination and extinction dynamics across recall days (Supplementary Figures 2, 5 and 6, Table 1-6). To assess fear discrimination, we use within-group comparisons to evaluate how well animals differentiate between the two VRs across days. For extinction, we use within-VR comparisons to examine freezing dynamics over time. Freezing across recall days is compared to baseline freezing (pre-conditioning) using a Linear Mixed Effects model (Tables 1-6), with recall days as fixed effects and mouse as a random effect, using baseline freezing as the reference.

      (3) We have expanded the behavioral dataset in Paradigm 1 to investigate the effect of shock amplitude on the conditioned fear response (Supplementary Figure 2 C-E). Consistent with findings in freely moving animals, our data show that increasing shock intensity from 0.6 mA to 1.0 mA leads to stronger freezing. For the revised manuscript, we specifically increased the sample size in the 0.6 mA group (n = 8) in Paradigm 1, as this intensity is used in Paradigm 3. These additional data demonstrate that combining a lower shock amplitude with shorter inter-shock intervals and retaining the tail-coat during recall can enhance freezing, suggesting that these parameters help compensate for lower shock intensity.

      (4) We have added more sample sizes to the imaging dataset (now n = 8, Figures 7-8).

      Finally, we acknowledge that many aspects of this paradigm still require optimization. The headfixed CFC paradigm is in its early stages compared to the decades of research dedicated to understanding fear learning parameters in freely moving CFC paradigms. While there are numerous parameters that could be tested—both those identified through our own discussions and those raised by the reviewers—it is not feasible for a single lab to conduct a full evaluation of all the possible factors that could influence CFC in the head-fixed prep. A key limitation is that our approach requires robust navigation behavior in the VR without rewards, which requires weeks of training per mouse. It also necessitates larger sample sizes at the outset as not all animals will make it through our behavioral criteria required for CFC. Another important consideration is scalability. Unlike freely moving CFC paradigms, which allow parallel testing of many animals with minimal pre-training, the VR-CFC setup requires several weeks of behavior training and involves a more complex integration of hardware and software to accurately track behavior in virtual space. The number of VR rigs that can be operated simultaneously in a single lab is often limited, making high-throughput testing more challenging. These factors mean that the testing of a single parameter in a group of animals requires approximately 3–4 months to complete. Despite these constraints, we are committed to continue refining this paradigm over time. With this manuscript, our main aim was to provide a detailed framework, initial parameters, and evidence for conditioned behavior in the head-fixed preparation. By doing so, we hope to facilitate the adoption of this paradigm by researchers interested in studying the neural correlates of learning and memory using multiphoton imaging and stimulation techniques. This approach enables investigations that are not possible in freely moving animals, while the presence of freezing as a conditioned response allows for direct comparisons to the extensive body of work done in freely moving paradigms. Moving forward, we anticipate that optimizing this paradigm and identifying the key parameters that drive learning will be a collaborative, community-led effort.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to develop a contextual fear learning (CFC) paradigm in head-fixed mice that would produce freezing as the conditioned response. Typically, lick suppression is the conditioned response in such designs, but this (1) introduces a potential confounding influence of reward learning on neural assessments of aversion learning and (2) does not easily allow comparison of head-fixed studies with extensive previous work in freely moving animals, which use freezing as the primary conditioned response.

      The first part of this study is a report on the development and outcomes of 3 variations of the CFC paradigm in a virtual reality environment. The fundamental design is strong, with headfixed mice required to run down a linear virtual track to obtain a water reward. Once trained, the water reward is no longer necessary and mice will navigate virtual reality environments. There are rigorous performance criteria to ensure that mice that make it to the experimental stage show very low levels of inactivity prior to fear conditioning. These criteria do result in only 40% of the mice making it to the experimental stage, but high rates of activity in the VR environment are crucial for detecting learning-related freezing. It is possible that further adjustments to the procedure could improve attrition rates.

      We acknowledge that further adjustments to the procedure could improve attrition rates, and we will continue to work on improving the paradigm.

      Paradigm versions 1 and 2 vary the familiarity of the control context while paradigm versions 2 and 3 vary the inter-shock interval. Paradigm version 1 is the most promising, showing the greatest increase in conditioned freezing (~40%) and good discrimination between contexts (delta ~15-20%). Paradigm version 2 showed no clear evidence of learning - average freezing at recall day 1 was not different than pre-shock freezing. First-lap freezing showed a difference, but this single-lap effect is not useful for many of the neural circuit questions for which this paradigm is meant to facilitate. Also, the claim that mice extinguished first-lap freezing after 1 day is weak. Extinction is determined here by the loss of context discrimination, but this was not strong to begin with. First-lap freezing does not appear to be different between Recall Day 1 and 2, but this analysis was not done.

      This is an important point. Following reviewer suggestions, we have replotted our figures for all paradigms to show within-VR freezing (see Supplementary Figures 2, 5 and 6) as the appropriate method for quantifying fear extinction across days. Using an LME model (Tables 16), we quantify freezing during recall days against baseline freezing levels measured before fear conditioning within each VR. In Paradigm 2, while some fear discrimination persists across days, extinction does occur rapidly. After the first lap in the CFC VR, we observed no significant differences in freezing compared to the baseline. These results are shown in the revised Supplementary Figure 5, and the revised text is in lines 393-399.

      Paradigm version 3 has some promise, but the magnitude of the context discrimination is modest (~10% difference in freezing). Thus, further optimization of the VR CFC will be needed to achieve robust learning and extinction. This could include factors not thoroughly tested in this study, including context pre-exposure timing and duration and shock intensity and frequency.

      We acknowledge that many aspects of this paradigm still need optimization, as virtual reality CFC is in its early stages, and we have not explored all of the parameter space. We describe above the reasoning for this. However, for this revised version of the paper we have added new behavioral data (Supplementary Figure 2 C-E) showing that increasing shock intensities from 0.6 mA to 1 mA enhances freezing, both in the first lap and on average. There are of course many other parameters that are likely important, like the ones pointed out here by the reviewer, but exploring the entire parameter space will take many years and will likely require many labs. The purpose of this paper is to show that VR-CFC fundamentally works and is a starting point from which the field can build on. We have now pointed out in the introduction (lines 54-58) and discussion (lines 730-737, 810-814) that there remains significant scope for improving this paradigm and optimizing parameters in the future.

      The second part of the study is a validation of the head-fixed CFC VR protocol through the demonstration that fear conditioning leads to the remapping of dorsal CA1 place fields, similar to that observed in freely moving subjects. The results support this aim and largely replicate previous findings in freely moving subjects. One difference from previous work of note is that VR CFC led to the remapping of the control environment, not just the conditioning context. The authors present several possible explanations for this lack of specificity to the shock context, further underscoring the need for further refinement of the CFC protocol before it can be widely applied. While this experiment examined place cell remapping after fear conditioning, it did not attempt to link neural activity to the learned association or freezing behavior.

      This is an interesting observation. We think that the remapping observed in the control context likely occurred due to the absence of reward in a previously rewarded environment. Our prior work has demonstrated that removal of reward causes increased remapping (Krishnan et al., 2022, Krishnan and Sheffield, 2023). In other words, the continued presence of reward within an environment stabilizes CA1 place fields. The Moita et al. (2004) paper, which showed remapping only in the fear conditioned context and not in the control context, provided rats with food pellets throughout the experimental session in both the control and conditioned context— likely to increase exploration necessary for identifying place cells. The presence of reward in the Moita et al experiment could explain the minimal remapping observed in their control context compared to our control context which lacked reward. Another possibility could lie in the differences in the intervals between place cell activity recordings in our study and that of Moita et al. While Moita et al. separated their recordings by just one hour, our recordings were separated by a full day, with a sleep period in between. The absence of sleep and the shorter time interval between conditioning and retrieval sessions in their study could explain the minimal remapping observed by Moita et al. compared to our findings. We have now addressed this discrepancy explicitly in lines 596-606.

      Although we agree with the reviewer that it would be informative to perform analysis of how neural activity correlates with freezing responses, we think this warrants its own stand-alone manuscript as the neural dynamics and methods to appropriately analyze them are complicated. We are in the midst of analyzing this data further and will present these findings in a separate publication.

      In summary, this is an important study that sets the initial parameters and neuronal validation needed to establish a head-fixed CFC paradigm that produces freezing behaviors. In the discussion, the authors note the limitations of this study, suggest the next steps in refinement, and point to several future directions using this protocol to significantly advance our understanding of the neural circuits of threat-related learning and behavior.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Krishnan et al devised three paradigms to perform contextual fear conditioning in head-fixed mice. Each of the paradigms relied on head-fixed mice running on a treadmill through virtual reality arenas. The authors tested the validity of three versions of the paradigms by using various parameters. As described below, I think there are several issues with the way the paradigms are designed and how the data are interpreted. Moreover, as Paradigm 3 was published previously in a study by the same group, it is unclear to me what this manuscript offers beyond the validations of parameters used for the previous publication. Below, I list my concerns point-by-point, which I believe need to be addressed to strengthen the manuscript.

      Major comments

      (1) In the analysis using the LME model (Tables 1 and 2), I am left wondering why the mice had increased freezing across recall days as well as increased generalization (increased freezing to the familiar context, where shock was never delivered). Would the authors expect freezing to decrease across recall days, since repeated exposure to the shock context should drive some extinction? This is complicated by the analysis showing that freeing was increased only on retrieval day 1 when analyzing data from the first lap only. Since reward (e.g., motivation to run) is removed during the conditioning and retrieval tests, I wonder if what the authors are observing is related to decreased motivation to perform the task (mice will just sit, immobile, not necessarily freezing per se). I think that these aspects need to be teased out.

      This is an important point and we agree teasing out a lack of motivation versus fearful freezing would be useful. To address the possibility that reduced motivation to run without reward could contribute to the observed freezing behavior, we have now included a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I). These control mice experienced the same protocol, including the wearing of a tail coat, but did not receive any shocks. We observed no increases in freezing across days in these controls, confirming that the increased freezing in the Familiar context of our experimental group stems from fear conditioning rather than the removal of reward from a previously rewarded context. If reduced motivation from reward removal were the primary driver, similar freezing patterns would have emerged in the no-shock controls. We have added lines 248-261 in the revised manuscript, discussing this point, and we thank the reviewer for motivating us to do this experiment and analysis.

      That said, the precise mechanisms underlying the fear generalization observed in the nonconditioned context—particularly its emergence during later recall days—remain unclear. Studies in freely moving animals have shown that fear memories initially specific to the conditioned context can become generalized with repeated exposures, which may be occurring here (Biedenkapp & Rudy, 2007; Wiltgen & Silva, 2007). Alternatively, it is possible that the combination of fear conditioning and the removal of expected reward contributes to a delayed generalization effect. This may reflect a limitation of our approach, which relies on reward to motivate initial training. As noted by another reviewer, we have now addressed this potential drawback of reward-based training in the discussion (see lines 809-817). Clearly, unique factors specific to the head-fixed VR paradigm may contribute to this phenomenon. Understanding the mechanisms underlying fear generalization in the head-fixed VR CFC paradigm will be a valuable direction for future research.

      (2) Related to point 1, the authors actually point out that these changes could be due to the loss of the water reward. So, in line 304, is it appropriate to call this freezing? I think it will be very important for the authors to exactly define and delineate what they consider as freezing in this task, versus mice just simply sitting around, immobile, and taking a break from performing the task when they realize there is no reward at the end.

      As noted in point 1 above, we have added a no-shock control group (n = 7; Supplementary Figure 2A-B, H–I) to determine whether the observed freezing was driven by fear conditioning or by reduced motivation to run in the absence of reward. The absence of increased freezing in these controls supports the interpretation that the behavior in the conditioned group is fearrelated. In future studies, incorporating additional physiological measures—such as heart rate monitoring—could further help distinguish fear-related freezing from other forms of immobility.

      (3) In the second paradigm, mice are exposed to both novel and (at the time before conditioning) neutral environments just before fear conditioning. There is a big chance that the mice are 'linking' the memories (Cai et al 2016) of the two contexts such that there is no difference in freezing in the shock context compared to the neutral context, which is what the authors observe (Lines 333-335). The experiment should be repeated such that exposure to the contexts does not occur on the conditioning day.

      This is an interesting idea. However, if memory linking were driving the observed freezing patterns, we would expect to see similarly reduced fear discrimination across all three paradigms, as mice experience both contexts sequentially in each case. However, this effect appears to be specific to Paradigm 2, suggesting this may be due to other factors. We agree it would be informative to eliminate pre-conditioning exposure to both environments—to assess whether this improves fear discrimination and helps clarify the potential contribution of memory linking. This is something we plan to do in future studies that are beyond the scope of this initial paper on VR-CFC.

      (4) On lines 360-361, the authors conclude that extinction happens rapidly, within the first lap of the VR trial. To my understanding, that would mean that extinction would happen within the first 5-10 seconds of the test (according to Figure S1E). That seems far too fast for extinction to occur, as this never occurs in freely behaving mice this quickly.

      We agree with the reviewer that extinction in Paradigm 2 appears to occur relatively rapidly.

      However, the average time to complete the first lap in the fear-conditioned context in Paradigm 2 is 25.68 ± 5.55 seconds (as stated in line 384), indicating that extinction occurs within approximately the first 30 seconds of context exposure—not within 5–10 seconds. This is specific to Paradigm 2 and does not happen in either of the other paradigms, as shown in Supplementary Figure 4. For clarification, Figure S1E pertains to baseline running in Paradigm 1 and does not apply to Paradigm 2.

      As the reviewer points out, even at 30 seconds, extinction seems to be happening more quickly in Paradigm 2 than seen in freely moving setups. This may be due to a key structural difference in our setup. The VR-CFC task is organized into discrete trials, with mice being teleported back to the start after reaching the end of the virtual track. Completing a full lap without receiving a shock could serve as a clear signal that the threat is no longer present within the environment as the completion of a lap means that the animals have surveyed all locations within the environment. This structure could accelerate extinction compared to freely moving setups, where animals take longer to explore their complete environment due to the lack of discrete trials. Although this is true for all our paradigms, the accelerated extinction seen in paradigm 2 versus 1 and 3 may be driven by other factors. As noted by the reviewers, other task parameters—such as context pre-exposure timing, shock intensity, and conditioning duration— are likely to play a role in shaping extinction dynamics. These factors warrant further investigation, and we plan to explore them in future studies to better understand the conditions influencing extinction in the VR-CFC paradigm.

      (5) Throughout the different paradigms, the authors are using different shock intensities. This can lead to differences in fear memory encoding as well as in levels of fear memory generalization. I don't think that comparisons can be made across the different paradigms as too many variables (including shock intensity - 0.5/0.6mA can be very different from 1.0 mA) are different. How can the authors pinpoint which works best? Indeed, they find Paradigm 3 'works' better than Paradigm 2 because mice discriminate better between the neutral and shock contexts. This can definitely be driven by decreased generalization from using a 0.6mA shock in Paradigm 3 compared to 1.0 mA shock in Paradigm 2.

      The reviewer brings up important points here. We have now added new data evaluating 0.6 mA shocks in Paradigm 1 (Supplementary Figure 2A–E, n=8). These data show that 1.0 mA shocks produced stronger conditioned responses and greater fear discrimination compared to 0.6 mA. Our goal in Paradigm 3 was to begin with a lower shock intensity and assess whether additional modifications—specifically the shorter ISI and retention of the tail-coat during recall—could enhance fear conditioning. Surprisingly, despite the weaker shock intensity, Paradigm 3 resulted in improved discrimination and freezing behavior relative to Paradigm 2. We have now clarified this point in the manuscript (lines 466-470), and we interpret this outcome as evidence that the shorter ISIs and contextual cue continuity (tail-coat) likely play a more significant role in enhancing learning and recall. However, as noted in the text (lines 511-514), further testing is needed to determine the individual contributions of each parameter to successful VR-CFC. Fully optimizing the parameter settings will take additional time and resources, and we aim to continually refine the parameter space in the future, as has been done over the years for freely moving animals.

      (6) There are some differences in the calcium imaging dataset compared to other studies, and the authors should perform additional testing to determine why. This will be integral to validating their head-fixed paradigm(s) and showing they are useful for modeling circuit dynamics/behaviors observed in freely behaving mice. Moreover, the sample size (number of mice) seems low.

      The one notable difference between our imaging study and that done in freely moving animals is that we observed remapping of place cells in the control context. In contrast, Moita et al. (2004) reported more stable place fields in the control context. A key distinction is that their study included rewards in the control context, which may have contributed to the spatial stability. We now discuss this difference in the manuscript (lines 599-605).

      It should be noted that there are many key distinctions among paradigms that study neural activity during fear conditioning in freely moving animals. These include varying exposure times to environments (1–6 days), the time interval between neural activity recordings, and the use of food rewards during the experiment stages in freely moving animals to encourage exploration for place cell identification. Although freely moving paradigms that investigate fear conditioning and place cells are heterogeneous, we were encouraged by the replication of several key findings. This validates VR-based CFC as a viable tool for neural circuit investigations. While future work will include more thorough analyses, our current findings demonstrate the paradigm's effectiveness for modeling circuit dynamics and behavior. We have now expanded our dataset, which includes four additional mice, further corroborating these original findings.

      (7) It appears that the authors have already published a paper using Paradigm 3 (Ratigan et al 2023). If they already found a paradigm that is published and works, it is unclear to me what the current manuscript offers beyond that initial manuscript.

      The reviewer is correct that we have published a paper using Paradigm 3. However, this manuscript goes beyond that one and provides a much more comprehensive description and fundamental analysis of the behavior and experimental parameters regarding VR-CFC, allowing the research community to adapt our paradigm reproducibly. While Ratigan et al. (2023) offered only a minimal description of behavior and included just Paradigm 3, we present two additional paradigms along with neuronal validation using hippocampal place cells. We have now explicitly stated this in the introduction (lines 50-55).

      (8) As written, the manuscript is really difficult to follow with the averages and standard error reported throughout the text. This reporting in the text occurred heterogeneously throughout the text, as sometimes it was reported and other times it was not. Cleaning this reporting up throughout the paper would greatly improve the flow of the text and qualitative description of the results.

      We completely agree with this point and have now cleaned up the text, leaving details only in a few places we felt were important.

      Reviewer #3 (Public review):

      Summary:

      Krishnan et al. present a novel contextual fear conditioning (CFC) paradigm using a virtual reality (VR) apparatus to evaluate whether conditioned context-induced freezing can be elicited in head-fixed mice. By combining this approach with two-photon imaging, the authors aim to provide high-resolution insights into the neural mechanisms underlying learning, memory, and fear. Their experiments demonstrate that head-fixed mice can discriminate between threat and non-threat contexts, exhibit fear-related behavior in VR, and show context-dependent variability during extinction. Supplemental analyses further explore alternative behaviors and the influence of experimental parameters, while hippocampal neuron remapping is tracked throughout the experiments, showcasing the paradigm's potential for studying memory formation and extinction processes.

      Strengths:

      Methodological Innovation: The integration of a VR-based CFC paradigm with real-time twophoton imaging offers a powerful, high-resolution tool for investigating the neural circuits underlying fear, learning, and memory.

      Versatility and Utility: The paradigm provides a controlled and reproducible environment for studying contextual fear learning, addressing challenges associated with freely moving paradigms.

      Potential for Broader Applications: By demonstrating hippocampal neuron remapping during fear learning and extinction, the study highlights the paradigm's utility for exploring memory dynamics, providing a strong foundation for future studies in behavioral neuroscience.

      Comprehensive Data Presentation: The inclusion of supplemental figures and behavioral analyses (e.g., licking behaviors and variability in extinction) strengthens the manuscript by addressing additional dimensions of the experimental outcomes.

      Weaknesses:

      Characterization of Freezing Behavior: The evidence supporting freezing behavior as the primary defensive response in VR is unclear. Supplementary videos suggest the observed behaviors may include avoidance-like actions (e.g., backing away or stopping locomotion) rather than true freezing. Additional physiological measurements, such as EMG or heart rate, are necessary to substantiate the claim that freezing is elicited in the paradigm.

      To strengthen our claim that freezing is a conditioned response in this task, we have taken three key steps:

      (1) We adjusted our freezing detection threshold from 1 cm/s to near 0 cm/s to capture only periods where the animal is virtually motionless on the treadmill. We validated this approach in Figure 2, particularly in the zoomed-in track position trace in Figure 2A, which clearly shows that the identified freezing epochs correspond to no change in track position. All analyses and figures have been updated to reflect this more stringent threshold.

      (2) We have added a no-shock control group in the revised manuscript (n = 7; Supplementary Figure 2A-B, H–I) where mice experienced the same protocol, including wearing a tail-coat, but received no shocks. These mice showed no increases in freezing behavior, which further demonstrates that the increased freezing we observe is a result of fear conditioning.

      (3) We have added a new supplementary video (Supplementary Video 2) that better illustrates the freezing behavior in our task.

      That said, we fully agree with the reviewer that freezing is not the only defensive response observed. Other behaviors—such as hesitation, backward movement, and slowing down—also emerge that are unique to our treadmill-based paradigm. We chose to focus on freezing in this manuscript to align with convention in freely moving fear conditioning studies and to facilitate direct comparisons. We agree that additional physiological measurements (e.g., EMG or heart rate) would provide further validation and could help distinguish between different forms of defensive responses. We view this as an important future direction and plan to incorporate such measures in upcoming studies. We highlight this in the results section (lines 175-179, 262-268) and in the discussion (lines 739-750).

      Analysis of Extinction: Extinction dynamics are only analyzed through between-group comparisons within each Recall day, without addressing within-group changes in behavior across days. Statistical comparisons within groups would provide a more robust demonstration of extinction processes.

      This is an important distinction and we have now added figures (Supplementary Figures 2H-I, 5C-D, 6C-D) showing within-VR behavior across Recall days, along with statistical comparisons and a description of the extinction process based on these results.

      Low Sample Sizes: Paradigm 1 includes conditions with very low sample sizes (N=1-3), limiting the reliability of statistical comparisons regarding the effects of shock number and intensity.

      Increasing sample sizes or excluding data from mice that do not match the conditions used in Paradigms 2 and 3 would improve the rigor of the analysis.

      While we included all conditions in Figure 2 for completeness, we have separated these conditions in Supplementary Figure 2 to ensure clarity. This allows researchers interested in this paradigm to see the approximate range of conditioned responses observed across different parameters. When comparing Paradigm 1 with Paradigms 2 and 3, we have only used data from 1mA, 6 shocks condition.

      Potential Confound of Water Reward: The authors critique the use of reward in conjunction with fear conditioning in prior studies but do not fully address the potential confound introduced by using water reward during the training phase in their own paradigm.

      We agree this is a point that needs discussion. We have now noted the limitation of using water rewards during training in the discussion section, particularly its effect on the animal’s motivation in the long term and on place cell activity (lines 814-820).

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      I suggest changing "3 paradigms" to "3 versions of a CFC paradigm," as the paradigm is fundamentally the same, but parameters were adjusted towards finding an optimal protocol.

      We have changed this phrasing where applicable.

      Figure S2: There appear to be different sets of shock parameters for different mice, most with an n of 1 or 2. This is not reliable for making a decision for optimal shock parameters and should not be discussed in that way until a full-powered comparison is completed. Also, the N adds up to 19, yet only 18 are described as being included in the study.

      We thank the reviewer for this important point. We agree that the current study is not powered to definitively identify optimal parameter settings. We have been careful not to interpret it in that way in the text. Rather, we adopted a commonly used starting point from the freely moving literature—1 mA with six shocks—as our initial condition (lines 196-199). To provide context for others interested in pursuing this work, we have presented a range of conditioned responses from different parameter combinations to illustrate potential variability. In most cases, these data are intended for illustrative purposes only and are not meant to support firm conclusions. We agree that a systematic and fully powered investigation of each parameter would be highly valuable, and we plan to pursue this in future work (and hope other labs contribute to this goal, too), much like the iterative optimizations performed in freely moving paradigms over time.

      We thank the reviewer for catching the sample size discrepancy and have now corrected it.

      The number of animals for the no-shock condition should be included.

      Thank you. We have now included this.

      A possible explanation for the lower fear and poorer discrimination in versions 2 and 3 could be that 10 min pre-exposure to the CFC context on day -1 led to latent inhibition. Shorter (or eliminated) pre-exposure may improve outcomes.

      We agree that the exposure time is a parameter that we should explore. We have highlighted this in the discussion (lines 729-736) as a parameter that is worth testing in the future.

      For analysis of extinction, it is best to establish this within condition - is freezing to the CFC context significantly reduced compared with initial recall and similar to pre-training freezing? By using discrimination as your index of extinction, increases in control context freezing/inactivity can eliminate context discrimination without the conditioned response of freezing actually undergoing extinction.

      This is a good point, and we have now included analysis and conclusions based on a within-VR comparison for the analysis of fear extinction (Supplementary Figures 2H-I, 5C-D, 6C-D).

      Reviewer #3 (Recommendations for the authors):

      Clarification of Treadmill Shape: The manuscript describes the treadmill as "spherical" throughout. However, based on representative images and videos, the treadmill appears cylindrical. This discrepancy should be clarified to ensure consistency between the text and visuals.

      The reviewer is correct that the treadmill is cylindrical, and this was an error on our part. We have corrected it throughout.

      Figure and Legend Labeling: To improve clarity, all figures and their legends should be explicitly labeled with the corresponding paradigm (1, 2, or 3) to facilitate interpretation.

      We have now added a label on all figures that clarifies which Paradigm the figures are referring to. We have also explicitly added this to the figure legends.

      Objective Language: Subjective language, such as "since we wanted animals to" (Line 850), should be revised to reflect an objective tone (e.g., "to allow animals to"). Similarly, phrases like "We believe" (Line 896) should be avoided to maintain an unbiased presentation.

      We have removed subjective language from our text.

      Placement of Future Directions: Speculations on future experimental plans, such as the use of sex as a biological variable (Lines 895-903), should be included in the Discussion section rather than the Methods. Additionally, remarks about the responsiveness of female mice to tail shocks should be moved to the main text for proper contextualization.

      We have moved these lines as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthen the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail, and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      The authors have made some changes in the revised version. However, many of the changes were superficial, and some concerns still need to be addressed. Important details are still missing from the description of some experiments. Authors should carefully revise the manuscript to ascertain that all details that could affect interpretation of their results are presented clearly. For instance, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned, etc? These are important details, which need to be included. Even in cases where additional information was included, the authors did not discuss how the specific way in which certain experiments were performed could affect interpretation of their results. One example is the potential for compound carryover in their experiments. Another important one is the fact that CAPE affects bacterial growth and sporulation. Therefore, it is critical that authors acknowledge that they cannot discard the possibility that other factors besides compound interactions with the toxin are involved in their phenotypes. As stated previously, authors should also be careful when drawing conclusions from the analysis of microbiota composition data, and changes to the manuscript should be made to reflect this. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Again, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #2 (Public review):

      I appreciate the author's responses to my original review. This is a comprehensive analysis of CAPE on C. difficile activity. It seems like this compound affects all aspects of C. difficile, which could make it effective during infection but also make it difficult to understand the mechanism. Even considering the authors responses, I think it is critical for the authors to work on the conclusions regarding the infection model. There is some protection from disease by CAPE but some parameters are not substantially changed. For instance, weight loss is not significantly different in the C. difficile only group versus the C. difficile + CAPE group. Histology analysis still shows a substantial amount of pathology in the C. difficile + CAPE group. This should be discussed more thoroughly using precise language.

      Thanks for your constructive suggestion. We have carefully revised the manuscript according to your suggestions.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI

      Strengths:

      Results are really good, and the CAPE shows a good and promising alternative for treating CDI.

      Weaknesses:

      Some references are too old or missing.

      Comments on revisions:

      I have read your study after comments made by all referees, and I noticed that all questions and suggestions addressed to the authors were answered and well explained. Some of the minor and major issues related to the article were also solved. I am satisfied with all the effort given by the authors to improve their manuscript.

      Thanks again for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The legend of Figure 3SB is incorrect. It should read "Growth curves of C. difficile BAA-1870 in the presence of varying concentrations of CAPE (0-64 µg/mL)". Also, there is something wrong with the symbols in this figure. I suspect what is happening is that the symbols for the concentrations of 32 and 64 µg/mL are superimposing, but this is a problem because the lower line looks like a closed circle, which is supposed to represent the condition where no CAPE was added. The authors should change the symbols to allow clear distinction between each of the conditions.

      Thanks for your constructive suggestion. We have modified the panel and figure legend in Figure 3SB. The concentrations of 32 μg/mL and 64 μg/mL are quite similar, which makes it challenging to differentiate between the corresponding data points on the graph. To enhance clarity, we have utilized distinct colors to help distinguish these closely valued lines as effectively as possible.

      Since the authors observed a significant effect of CAPE on both bacterial growth and spore production, their discussion and conclusions need to reflect the fact that the effects observed can no longer be attributed solely to toxin inhibition.

      Thanks for your comments. We have modified the corresponding description according to your suggestions.

      In lines 43-45, authors state that "CAPE treatment of C. difficile-challenged mice induces a remarkable increase in the diversity and composition of the gut microbiota (e.g., Bacteroides spp.)". It is still unclear to this reviewer why mention Bacteroides between parentheses. Does this mean that there was an increase in the abundance of Bacteroides? If that is the case this needs to be stated more clearly.

      Thanks for your comments. Treatment with CAPE indeed significantly increased the abundance of Bacteroides spp. in the gut microbiota (Figure 7H-J). However, to avoid ambiguity in the abstract, we have chosen to delete the specific mention of Bacteroides spp. within the parentheses.

      The modifications made to lines 132-135 still do not address my concern. Authors stated in the manuscript that "compounds that were not bound to TcdB were removed". But how was this done? This needs to be clearly explained in the manuscript. In the response to reviewers document, authors state that this was done through centrifugation. But given that the goal here is to separate excess of small molecule from a protein target, just stating that centrifugation was used is not enough. Did the authors use ultracentrifugation? What were the conditions employed. This is critical so that the reader can assess the degree of compound carryover that may have occurred. Also, authors need to clearly acknowledge the caveats of their experimental design by stating that they cannot rule out the contribution of compound carryover to their results.

      Thanks for your comments. We employed ultrafiltration centrifugal partition to remove the unbound small molecule compounds. Due to the large molecular weight of TcdB, approximately 270 kDa, we selected a 100 kDa molecular weight cutoff ultrafiltration membrane. The centrifugation was performed at 4000 g for 5 min to eliminate the compounds that did not bind to TcdB. We have incorporated the relevant methods and discussed the potential impacts on the respective sections of the manuscript.

      In line 142, authors added the molar concentration of caffeic acid, as requested. Although this helps, it is even more important that molar concentrations are added every time a compound concentration is mentioned. For instance, just 2 lines down there is another mention of a compound concentration. It would be informative if authors also added molar concentrations here and throughout the manuscript.

      Thanks for your comments. In our initial test design, we have utilized the concentration unit of μg/mL. However, during the conversion to μM using the dilution method, some values do not result in neat, whole numbers. For instance, the conversion of 32 μg/mL of caffeic acid phenyl ethyl ester yields 112.55 μM, which appears somewhat irregular when expressed in this manner.

      Line 277. For the sake of clarity, I would strongly suggest that authors use the term "control mice" instead of "model mice".

      Thanks for your comments. We have modified “model mice” to “control mice” throughout the manuscript.

      In line 302, the word taxa should not be capitalized. I capitalized it in my original comments simply to draw attention to it.

      Thanks for your comments. We have modified this word.

      In the section starting in line 318, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned? Etc, etc. These are important details, which need to be included.

      Thanks for your comments. We have added some metabolomics methods in the corresponding section.

      In line 338, the authors misunderstood my original comment. This sentence should read "...the final product of purine degradation, were markedly decreased in mice after...".

      Thanks for your comments. We have modified this sentence.

      Panels of figure 3 are still incorrectly labeled. The secondary structure predictions are shown in A and C, not A and B as is currently stated in the legend.

      Thanks for your comments. We have modified the figure legend in Figure 3.

      About Figure 5C, I think the authors for the clarification, but this explanation should be included in the figure legend.

      Thanks for your comments. We have added the relevant information to the figure legend.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility, and clarity

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized. The authors then study different aspects of Neisseria-endothelial interactions and benchmark the bacterial infection model against the best disease model available, a human skin xenograft mouse model, which is one of the great strengths of the paper. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and induces endothelial cytoskeleton rearrangements. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria. The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field, and I only have a few major comments and some minor.

      Major comments:

      Infection-on-chip. I would recommend the authors to change the terminology of "infection on chip" to better reflect their work. The term is vague and it decreases novelty, as there are multiple infection on chips models that recapitulate other infections (recently reviewed in https://doi.org/10.1038/s41564-024-01645-6) including Ebola, SARS-CoV-2, Plasmodium and Candida. Maybe the term "sepsis on chip" would be more specific and exemplify better the work and novelty. Also, I would suggest that the authors carefully take a look at the text and consider when they use VoC or to current term IoC, as of now sometimes they are used interchangeably, with VoC being used occasionally in bacteria perfused experiments.

      We thank Reviewer #1 for this suggestion. Indeed, we have chosen to replace the term "Infection-on-Chip" by "infected Vessel-on-chip" to avoid any confusion in the title and the text. Also, we have removed all the terms "IoC" which referred to "Infection-on-Chip" and replaced with "VoC" for "Vessel-on-Chip". We think these terms will improve the clarity of the main text.

      Fig 3 and Suppmentary 3: Permeability. The authors suggest that early 3h infection with Neisseria do not show increase in vascular permeability in the animal model, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. This seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research [1] has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection and these experiments were not carried out. As discussed in the manuscript, bacteria induced permeability in mouse occurs at later time points, 16h post infection as shown previoulsy [2]. As discussed in the manuscript, this difference between the xenograft model and the chip likely reflect the absence in the chip of various cell types present in the tissue parenchyma.

      The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65\% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      We thank the Reviewer #1 for this suggestion. - According to this recommendation, we imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. We have added the graph and the confocal images in Figure S4B and lines 418-419 of the revised manuscript. - We recently performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Fig. 1 of this response), it was not as obvious as other markers under these infected conditions and we did not include it in the main text. Interpretation of this result is not straight forward as for instance the substrate of the cells is different and it would require further studies on the behaviour of ERM proteins in these different contexts.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. I would suggest that the authors add a more extensive description of the protocol in methods. Could this technique be applied in other laboratories? If this is a major limitation, it should be listed in the discussion.

      Following the Reviewer's comment, we introduced more detailed explanations regarding the photoablation: - L157-163 (Results): "Briefly, the chosen design is digitalized into a list of positions to ablate. A pulsed UV-LASER beam is injected into the microscope and shaped to cover the back aperture of the objective. The laser is then focused on each position that needs ablation. After introducing endothelial cells (HUVEC) in the carved regions,.." - L512-516 (Discussion): "The speed capabilities drastically improve with the pulsing repetition rate. Given that our laser source emits pulses at 10kHz, as compared to other photoablation lasers with repetitions around 100 Hz, our solution could potentially gain a factor of 100. Also,..." - L1082-1087 (Materials and Methods): "…, and imported in a python code. The control of the various elements is embedded and checked for this specific set of hardware. The code is available upon request."

      Adding these three paragraphs gives more details on how photoablation works thus improving the manuscript.

      Minor comments:

      Supplementary Fig 2. The reference to subpanels H and I is swapped.

      The references to subpanels H and I have been correctly swapped back in the reviewed version.

      Line 203: I would suggest to delete this sentence. Although a strength of the submitted paper is the direct comparison of the VoC model with the animal model to better replicate Neisseria infection, a direct comparison with animal permeability is not needed in all vascular engineering papers, as vascular permeability measurements in animals have been well established in the past.

      The sentence "While previously developed VoC platforms aimed at replicating physiological permeability properties, they often lack direct comparisons with in vivo values." has been removed from the revised text.

      Fig 3: Bacteria binding experiments. I would suggest the addition of more methodological information in the main results text to guarantee a good interpretation of the experiment. First, it would be better that wall shear stress rather than flow rate is described in the main text, as flow rate is dependent on the geometry of the vessel being used. Second, how long was the perfusion of Neisseria in the binding experiment performed to quantify colony doubling or elongation? As per figure 1C, I would guess than 100 min, but it would be better if this information is directly given to the readers.

      We thank Reviewer #1 for these two suggestions that will improve the text clarity (e.g., L316). (i) Indeed, we have changed the flow rate in terms of shear stress. (ii) Also, we have normalized the quantification of the colony doubling time according to the first time-point where a single bacteria is attached to the vessel wall. Thus, early adhesion bacteria will be defined by a longer curve while late adhesion bacteria by a shorter curve. In total, the experiment lasted for 3 hours (modifications appear in L318 and L321-326).}

      Fig 4: The honeycomb structure is not visible in the 3D rendering of panel D. I would recommend to show the actin staining in the absence of Neisseria staining as well.

      According to this suggestion, a zoom of the 3D rendering of the cortical plaque without colony had been added to the figure 4 of the revised manuscript.

      Line 421: E-selectin is referred as CD62E in this sentence. I would suggest to use the same terminology everywhere.

      We have replaced the "CD62E" term with "E-selectin" to improve clarity.}

      Line 508: "This difference is most likely associated with the presence of other cell types in the in vivo tissues and the onset of intravascular coagulation". Do the authors refer to the presence of perivascular cells, pericytes or fibroblasts? If so, it could be good to mention them, as well as those future iterations of the model could include the presence of these cell types.

      By "other cell types", we refer to pericytes [3], fibroblasts [4], and perivascular macrophages [5], which surround endothelial cells and contribute to vessel stability. The main text was modified to include this information (Lines 548 and 555-570) and their potential roles during infection disussed.

      Discussion: The discussion covers very well the advantages of the model over in vitro 2D endothelial models and the animal xenograft but fails to include limitations. This would include the choice of HUVEC cells, an umbilical vein cell line to study microcirculation, the lack of perivascular cells or limitations on the fabrication technique regarding application in other labs (if any).

      We thank Reviewer #1 for this suggestion. Indeed, our manuscript may lack explaining limitations, and adding them to the text will help improve it: - The perspectives of our model include introducing perivascular cells surrounding the vessel and fibroblasts into the collagen gel as discussed previously and added in the discussion part (L555-570). - Our choice for HUVEC cells focused on recapitulating the characteristics of venules that respect key features such as the overexpression of CD62E and adhesion of neutrophils during inflammation. Using microvascular endothelial cells originating from different tissues would be very interesting. This possibility is now mentioned in the discussion lines 567-568. - Photoablation is a homemade fabrication technique that can be implemented in any lab harboring an epifluorescence microscope. This method has been more detailed in the revised manuscript (L1085-1087).

      Line 576: The authors state that the model could be applied to other systemic infections but failed to mention that some infections have already been modelled in 3D bioengineered vascular models (examples found in https://doi.org/10.1038/s41564-024-01645-6). This includes a capillary photoablated vascular model to study malaria (DOI: 10.1126/sciadv.aay724).

      Thes two important references have been introduced in the main text (L84, 647, 648).}

      Line 1213: Are the 6M neutrophil solution in 10ul under flow. Also, I would suggest to rewrite this sentence in the following line "After, the flow has been then added to the system at 0.7-1 μl/min."

      We now specified that neutrophils are circulated in the chip under flow conditions, lines 1321-1322.

      Significance

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Its main limitations is the brief description of the photoablation methodology and more clarity is needed in the description of bacteria perfusion experiments, given their complexity. The manuscript will be of interest for the general infection community and to the tissue engineering community if more details on fabrication methods are included. My expertise is on infection bioengineered models.

      Reviewer #2

      Evidence, reproducibility, and clarity

      Summary The authors develop a Vessel-on-Chip model, which has geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells and flow perfusion to induce mechanical cues. This vessel could be infected with Neisseria meningitidis, as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse, which is the current gold standard for these studies, and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      Major comments:

      I have no major comments. The claims and the conclusions are supported by the data, the methods are properly presented and the data is analyzed adequately. Furthermore, I would like to propose an optional experiment could improve the manuscript. In the discussion it is stated that the vascular geometry might contribute to bacterial colonization in areas of lower velocity. It would be interesting to recapitulate this experimentally. It is of course optional but it would be of great interest, since this is something that can only be proven in the organ-on-chip (where flow speed can be tuned) and not as much in animal models. Besides, it would increase impact, demonstrating the superiority of the chip in this area rather than proving to be equal to current models.

      We have conducted additional experiments on infection in different vascular geometries now added these results figure 3/S3 and lines 288-305. We compared sheared stress levels as determined by Comsol simulation and experimentally determined bacterial adhesion sites. In the conditions used, the range of shear generated by the tested geometries do not appear to change the efficiency of bacterial adhesion. These results are consistent with a previous study from our group which show that in this range of shear stresses the effect on adhesion is limited [6] . Furthermore, qualitative observations in the animal model indicate that bacteria do not have an obvious preference in terms of binding site.

      Minor comments:

      I have a series of suggestions which, in my opinion, would improve the discussion. They are further elaborated in the following section, in the context of the limitations.

      • How to recapitulate the vessels in the context of a specific organ or tissue? If the pathogen is often found in the luminal space of other organs after disseminating from the blood, how can this process be recapitulated with this mode, if at all?

      • For reasons that are not fully understood, postmortem histological studies reveal bacteria only inside blood vessels but rarely if ever in the organ parenchyma. The presence of intravascular bacteria could nevertheless alter cells in the tissue parenchyma. The notable exception is the brain where bacteria exit the bacterial lumen to access the cerebrospinal fluid. The chip we describe is fully adapted to develop a blood brain barrier model and more specific organ environments. This implies the addition of more cell types in the hydrogel. A paragraph on this topic has been added (Lines 548 and 552-570).

      • Similarly, could other immune responses related to systemic infection be recapitulated? The authors could discuss the potential of including other immune cells that might be found in the interstitial space, for example.

      • This important discussion point has been added to the manuscript (L623-636). As suggested by Reviewer #2, other immune cells respond to N. meningitis and can be explored using our model. For instance, macrophages and dendritic cells are activated upon N. meningitis infection, eliminate the bacteria through phagocytosis, produce pro-inflammatory cytokines and chemokines potentially activating lymphocytes [7]. Such an immune response, yet complex, would be interesting to study in our model as skin-xenograft mice are deprived of B and T lymphocytes to ensure acceptance of human skin grafts.

      • A minor correction: in line 467 it should probably be "aspects" instead of "aspect", and the authors could consider rephrasing that sentence slightly for increased clarity.

      • We have corrected the sentence with "we demonstrated that our VoC strongly replicates key aspects of the in vivo human skin xenograft mouse model, the gold standard for studying meningococcal disease under physiological conditions." in lines 499-503.

        Strengths and limitations

      The most important strength of this manuscript is the technology they developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date, but allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis. However, the authors do not seem to present any new mechanistic insights obtained using this model. All the findings obtained in the infection-on-chip demonstrate that the model is equivalent to the human skin xenograft mouse model, and can offer superior resolution for microscopy. However, the advantages of the model do not seem to be exploited to obtain more insights on the pathogenicity mechanisms of N. meningitidis, host-pathogen interactions or potential applications in the discovery of potential treatments. For example, experiments to elucidate the role of certain N. meningiditis genes on infection could enrich the manuscript and prove the superiority of the model. However, I understand these experiments are time-consuming and out of the scope of the current manuscript. In addition, the model lacks the multicellularity that characterizes other similar models. The authors mention that the pathogen can be found in the luminal space of several organs, however, this luminal space has not been recapitulated in the model. Even though this would be a new project, it would be interesting that the authors hypothesize about the possibilities of combining this model with other organ models. The inclusion of circulating neutrophils is a great asset; however it would also be interesting to hypothesize about how to recapitulate other immune responses related to systemic infection.

      We thank Reviewer #2 for his/her comment on the strengths and limitations of our work. The difficulty is that our study opens many futur research directions and applications and we hope that the work serves as the basis for many future studies but one can only address a limited set of experiments in a single manuscript. - Experiments investigating the role of N. meningitidis genes require significant optimization of the system. Multiplexing is a potential avenue for future development, which would allow the testing of many mutants. The fast photoablation approach is particularly amenable to such adaptation. - Cells and bacteria inside the chambers could be isolated and analyzed at the transcriptomic level or by flow cytometry. This would imply optimizing a protocol for collecting cells from the device via collagenase digestion, for instance. This type of approach would also benefit from multiplexing to enhance the number of cells. - As mentioned above, the revised manuscript discusses the multicellular capabilities of our model, including the integration of additional immune cells and potential connections to other organ systems. We believe that these approaches are feasible and valuable for studying various aspects of N. meningitidis infection.

      Advance

      The most important advance of this manuscript is technical: the development of a model that proves to be equivalent to the most complex model used to date to study meningococcal systemic infections. The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Infection-on-chip model is completely in vitro, can be produced quickly, and allows to precisely tune the vessel's geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area.

      Other vessel-on-chip models can recapitulate an endothelial barrier in a tube-like morphology, but do not recapitulate other complex geometries, that are more physiologically relevant and could impact infection (in addition to other non-infectious diseases). However, in the manuscript it is not clear whether the different morphologies are necessary to study or recapitulate N. meningitidis infection, or if the tubular morphologies achieved in other similar models would suffice.

      We thank Reviewer #2 for his/her comment, also raised by reviewer 1. To answer this question, we have now infected vessel-on-chips of different geometries, to dissect the impact of flow distribution in N. meningitidis infection (Figures 3 and S3, explained in lines 288-307). In this range of shear stress, we show that bacterial infection is not strongly affected by geometry-induced shear stress variation. These observations are constistent with observations in flow chambers and qualitative observations of human cases and in the xenograft model [6].

      Audience

      This manuscript might be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. Thus, the tool presented (vessel-on-chip) can have great applications for a broad audience. However, even when the method might be faster and easier to use than other equivalent methods, it could still be difficult to implement in another laboratory, especially if it lacks expertise in bioengineering. Therefore, the method could be more of interest for laboratories with expertise in bioengineering looking to expand or optimize their toolbox. Alternatively, this paper present itself as an opportunity to begin collaborations, since the model could be used to test other pathogen or conditions.

      Field of expertise: Infection biology, organ-on-chip, fungal pathogens.

      I lack the expertise to evaluate the image-based analysis.

      References:

      1. Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      2. Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi:10.1038/s41467-021-24797-z.

      3. Mats Hellström, Holger Gerhardt, Mattias Kalén, Xuri Li, Ulf Eriksson, Hartwig Wolburg, and Christer Betsholtz. Lack of pericytes leads to endothelial hyperplasia and abnormal vascular morphogenesis. Journal of Cell Biology, 153(3):543–554, Apr 2001. ISSN 0021-9525. doi: 10.1083/jcb.153.3.543.

      4. Arsheen M. Rajan, Roger C. Ma, Katrinka M. Kocha, Dan J. Zhang, and Peng Huang. Dual function of perivascular fibroblasts in vascular stabilization in zebrafish. PLOS Genetics, 16(10):1–31, 10 2020. doi: 10.1371/journal.pgen.1008800.

      5. Huanhuan He, Julia J. Mack, Esra Güç, Carmen M. Warren, Mario Leonardo Squadrito, Witold W. Kilarski, Caroline Baer, Ryan D. Freshman, Austin I. McDonald, Safiyyah Ziyad, Melody A. Swartz, Michele De Palma, and M. Luisa Iruela-Arispe. Perivascular macrophages limit permeability. Arteriosclerosis, Thrombosis, and Vascular Biology, 36(11):2203–2212, 2016. doi: 10.1161/ATVBAHA. 116.307592.

      6. Emilie Mairey, Auguste Genovesio, Emmanuel Donnadieu, Christine Bernard, Francis Jaubert, Elisabeth Pinard, Jacques Seylaz, Jean-Christophe Olivo-Marin, Xavier Nassif, and Guillaume Dumenil. Cerebral microcirculation shear stress levels determine Neisseria meningitidis attachment sites along the blood–brain barrier . Journal of Experimental Medicine, 203(8):1939–1950, 07 2006. ISSN 0022-1007. doi: 10.1084/jem.20060482.

      7. Riya Joshi and Sunil D. Saroj. Survival and evasion of neisseria meningitidis from macrophages. Medicine in Microecology, 17:100087, 2023. ISSN 2590-0978. doi: https://doi.org/10.1016/j.medmic.2023.100087.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      (1) “It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.”

      We agree with the reviewer that metabolic changes may differ ex vivo versus in vivo. We now state: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (2) “The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.”

      We have clarified that the metabolic changes may be in RPCs or in other retinal cell types on lines 149-152: “Since these measurements were performed in bulk, and the ratio of RPCs to differentiated cells declines as development proceeds, it is not clear whether glycolytic activity is temporally regulated within RPCs or in other retinal cell types.”

      However, since we mined a single cell (sc) RNA-seq dataset, we are able to attribute gene expression specifically within RPCs (Figure 1).

      (3) “The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.”

      We have added the information and references brought up by the reviewer in our discussion (lines 529-549 and 570-574). We have also suggested future experiments to further analyse our system in line with the studies now referenced (lines 580-589).

      (4) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      We have expanded the list of glycolytic genes analysed, in modified Figure 1B, and expanded the description of these results on lines 156-166.

      (5) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We added a comment to this effect to the discussion: “It is possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation, which we could assess in the future.“ (lines 600-603).

      (6) “Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).”

      We have added the acetate data to main Figure 7E.

      We added a supplemental data table that was inadvertently not included in our last submission. Figure 2– Data supplement 1.

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) Assuming that increased glycolysis gets RPCs to exit from the proliferative stage earlier, the total number of retinal cells, notably that of the rod photoreceptors, should be reduced since the pool of proliferating cells is depleted earlier. Is that really the case for a mature retina? To address this question, the authors should perform quantifications of photoreceptors at a stage where most developmental cell death has concluded (i.e. at P14 or later; Young, J. Comp. Neurol. 229:362-373, 1984) and check whether or not there are more or less photoreceptors present.

      We have previously quantified numbers of each cell type in Pten RPC-cKO retinas, and as suggested by the reviewer, there are fewer rod photoreceptors at P7 (Tachibana et al. 2016. J Neurosci 36 (36) 9454-9471) and P21 (Hanna et al. 2025. IOVS. Mar 3;66(3):45). We have edited the following sentence: “Using cellular birthdating, we previously showed that Pten-cKO RPCs are hyperproliferative and differentiate on an accelerated schedule between E12.5 and E18.5, yet fewer rod photoreceptors are ultimately present in P7 (Tachibana et al., 2016) and P21 (Hanna et al., 2025) retinas, suggestive of a developmental defect. (lines 184-187).

      (2) Figure 1B, 1H: On what data are these two figures based? The plots suggest that a high-density time series of gene expression and rod photoreceptor birth was performed, yet it is not clear where and how this was done. The authors should provide the data, plot individual data points, and, if applicable perform a statistical analysis to support their idea that glycolytic gene expression (as a surrogate for glycolysis) overlaps in time with rod photoreceptor birth (Figure 1B) and that in Pten KO the glycolytic gene expression is shifted forward in time (Figure 1H). If the data required to construct these plots (min. 5 data points, min 3 repeats each) does not exist or cannot be generated (e.g. from reanalysis of previously published datasets), then these graphs should be removed.

      We have removed the previous Figure 1B and Figure 1H.

      (3) Figure 2E: Which PKM isozyme was analyzed here? Does the genetic analysis allow us to distinguish between PKM1 and PKM2? Since PKM governs the key rate-limiting step of glycolysis but was not significantly upregulated, does this not contradict the authors' main hypothesis? If PKM at some point was inhibited (see also below comment to Figure 5) one would expect an accumulation of glycolytic intermediates, including phosphoenolpyruvate. Was such an effect observed?

      The data in Figure 2E is bulk RNA-seq data. Since there is only a single Pkm gene that is alternatively spliced, the RNA-sequencing data cannot distinguish between the four PK isozymes that arise from alternative splicing. Specifically, we used Illumina NextSeq 500 for sequencing of 75bp Single-End reads that will sequence transcripts for alternatively spliced Pkm1 and Pkm2 mRNAs, which carry a common 3’end. We added a statement to this effect: “However, since we employed 75 bp single-end sequencing, we could not distinguish between alternatively spliced Pkm1 and Pkm2 mRNAs.“ (lines 215-216).

      We have not performed metabolic analyses of glycolytic intermediates, but we have proposed such a strategy as an important avenue of investigation for future studies in the Discussion: “Lastly, an important caveat to our study is that metabolism changes ex vivo versus in vivo, and thus, in the future, in vivo studies can be performed to assess metabolic changes.” (lines 591-593).

      (4) Figure 3 and materials & methods: For the retinal explant cultures, was the RPE included in the cultured explants? If so, how can the authors distinguish drug effects on neuroretina and RPE? If the RPE was not included, then the authors should discuss how the missing RPE - neuroretina interaction could have influenced their results.

      We remove the RPE from the retinal explants, as indicated in the Methods section. The RPE is a metabolic hub that allows transport of nutrients for the retina, so in the absence of the RPE, there is not an immediate source of energy, such as glucose, to the retina. However, the media (DMEM) contains 25 mM glucose to replace the RPE as an energy source, and we now show that RPCs express GLUT1, which allows uptake of glucose (see new Figure 3A).

      We added the following sentence “P0 explants were mounted on Nucleopore membranes and cultured on top of retinal explant media, providing a source of nutrients, growth factors and glucose. “(lines 241-243).

      (5) Figure 3: It seems rather odd that, if glycolysis was so important for retinal proliferation, differentiation, and metabolism in general, the inhibition of glycolysis with 2DG should not produce a strong degeneration. However, since 2DG competes with glucose, and must be used at nearly equimolar concentration to block glycolysis in a meaningful way, it is possible that the 2DG concentration used simply was not high enough to substantially inhibit glycolysis. Since the inhibitory effect of 2DG depends on the glucose concentration, the authors should measure and provide the concentration of glucose in the explant culture medium. This value should be given either in results or materials and methods.

      We recently published a manuscript showing that 2DG treatments at the same concentrations employed in this study are effective at reducing lactate production in the developing retina in vivo, which is the expected effect of reduced glycolysis (Hanna et al. 2025. IOVS). However, in this study, we did not observe an impact on cell survival.

      We do not agree that it is necessary to measure glucose in the media since the anti-proliferative effect of 2DG is well known, and we are working in the effective range established by multiple groups. We have clarified that we are in the effective range by adding the following sentences: “2DG is typically used in the range of 5-10 mM in cell culture studies and in general, has anti-proliferative effects. To test whether 2DG treatment was in the effective range, explants were exposed to BrdU, which is incorporated into S-phase cells, for 30 minutes prior to harvesting. 2DG treatment resulted in a dose-dependent inhibition of RPC proliferation as evidenced by a reduction in BrdU<sup>+</sup> cells (Figure 3D), indicating that our treatment was in the effective range.” (lines 246-251).

      (6) Figure 3F: The authors use immunostaining for cleaved, activated caspase-3 to assess the amount of apoptotic cell death. However, there are many different possible mechanisms for neuronal cells to die, the majority of which are caspase-independent. To assess the amount of cell death occurring, the authors should perform a TUNEL assay (which labels apoptotic and non-apoptotic forms of cell death; Grasl-Kraupp et al., Hepatology 21:1465-8, 1995), quantify the numbers of TUNEL-positive cells in the retina, and compare this to the numbers of cells positive for activated caspase-3.

      We agree with the reviewer that there are more ways for a cell to die than just apoptosis, and TUNEL would pick up dying cells that may undergo apoptosis or necrosis, for example, our data with cleaved caspase-3, an executioner protease for apoptosis, provides us with clear evidence of cell death in our different conditions. Since this manuscript is not focused on cell death pathways, we have not performed the additional TUNEL assay.

      (7) Figure 4F and 4I: At post-natal day P7 the rod outer segments (OSs) only just start to grow out and the characteristic, rhodopsin-filled disk stacks are not yet formed. To test whether the PFKB3 gain-of function or the Pten KO has a marked effect on OS formation and length, the authors should perform the same tests on older, more mature retina at a time when rod OS show their characteristic disk structures (e.g. somewhere between P14 to P30). The same applies to the 2DG inhibition on the Pten KO retina.

      The precocious differentiation of rod outer segments observed in P7 Pten-cKO retinas does not persist in adulthood, and instead reflects a developmental acceleration. Indeed, we found that in Pten cKO retinas at 3-, 6- and 12-months of age, rod and cone photoreceptors degenerate, and cone outer segments are shorter (Hanna et al., 2025; Tachibana et al., 2016). These data demonstrate that Pten is required to support rod and cone survival.

      (8) Figure 5: Lowering media pH is a rather coarse and untargeted intervention that will have multiple metabolic consequences independent of PKM2. It is thus hardly possible to attribute the effects of pH manipulation to any specific enzyme. To assess this and possibly confirm the results obtained with low pH, the authors should perform a targeted inhibition experiment, for instance using Shikonin (Chen et al., Oncogene 30:4297-306, 2011), to selectively inhibit PKM2. If the retinal explant cultures contained the RPE, an additional question would be how the changes in RPE would alter lactate flux and metabolization between RPE and neuroretina (see also question 4 above).

      We have reframed the rationale for the pH manipulation experiments, highlighting the importance of pH in cell fate specification, and indicating that the aggregation of PKM2 is only one possible effect of lower pH.

      We wrote: “Given that altered glycolysis influences intracellular pH, which in turn controls cell fate decisions, we set out to assess the impact of manipulating pH on cell fate selection in the retina. One of the expected impacts of lowering pH was the aggregation of PKM2, a rate-limiting enzyme for glycolysis, which aggregates in reversible, inactive amyloids (Cereghetti et al., 2024).” (lines 362-366). 

      We have also added a discussion point “Whether pH manipulations also impact the stability of other retinal proteins, such as PKM2, can be further investigated in the future using specific PKM2 inhibitors, such as Shikonin (Chen et al., 2011). (lines 545-547).

      (9) Figure 5G: As for Figure 3F, the authors should perform TUNEL assays to assess the number of cells dying independent of caspase-3.

      Please see response to point 6.

      (10) Figure 7E: In the figure legend "K" should read "E". From the figure and the legend, it is not clear to which cell type this diagram should refer. This must be specified. Importantly, the insulin-dependent glucose-transporter 4 (GLUT4) highlighted in Figure 7E, while expressed on inner retinal vasculature endothelial cells, is not expressed in retinal neurons. What GLUTs exactly are expressed in what retinal neurons may still be to some extent contentious (cf. Chen et al., elife, https://doi.org/10.7554/eLife.91141.3 ; and reviewer comments therein), yet RPE cells clearly express GLUT1, photoreceptors likely express GLUT3, Müller glia cells may express GLUT1, while horizontal cells likely express GLUT2 (Yang et al., J Neurochem. 160:283-296, 2022).’

      We have removed this summary schematic for simplicity.

      (11) Materials and methods: The retinal explant culture system must be described in more detail. Important questions concern the use of medium and serum for which the providers, order numbers, and batch/lot numbers (whichever is applicable) must be given. The glucose concentration in the medium (including the serum content) should be measured. A key concern is whether the explants were cultivated submerged into the medium - this would prevent sufficient oxygenation and drive metabolism towards glycolysis (i.e. the Pasteur effect) - or whether they were cultivated on top of the liquid medium, at the interface between air and liquid (i.e. a situation that would favor OXPHOS).

      We have added further detail to the methods section for the explant assay (lines 686-689). We cultured the retinal explants on membranes on top of the media, which is the standard methodology in the field and in our laboratory (Cantrup et al., 2012; Tachibana et al., 2016; Touahri et al., 2024). Typically, RPCs undergo aerobic glycolysis, meaning that even in the presence of oxygen, they still prefer glycolysis rather than OXPHOS. We demonstrated that 2DG blocks RPC proliferation when treated with 2DG, indicating that RPCs are indeed favoring glycolysis in our assay system.

      (12) A point the authors may want to discuss additionally is the potential relevance of their data for the pathogenesis of human diseases, especially early developmental defects such as they occur in oxygen-induced retinopathy of prematurity.

      We would like to thank the reviewer for their valuable comment. Given that retinopathy of prematurity (ROP) is primarily vascular in nature, and we have not investigated vascular defects in this study, we have elected not to add a discussion of ROP to our manuscript.

      Minor points

      (1) Please add a label indicating the ages of the retina to images showing the entire retina (i.e. "P7"; e.g. in Figures 1F, 3, 4D, 5, etc.).

      Figure 1:

      1D: E18.5 indicated at the bottom of the two panels

      1F – P0 is indicated at the bottom of the two panels.

      Figure 3C-H: P0 explant stage and days of culture indicated

      Figure 4D: E12.5 BrdU and P7 harvest date indicated

      Figure 5C-H: P0 explant stage and days of culture indicated

      Figure 7A-E: P0 explant stage and days of culture indicated

      (2) The term Ctnnb1 should be introduced also in the abstract.

      We now state that Ctnnb1 encodes for b-catenin in the abstract.

      (3) Line 249: "...remaining..." should probably read "...remained...".

      Changed (now line 260).

      (4) Line 381: The sentence "...correlating with the propensity of some RPCs to continue to proliferate while others to differentiate.", should probably be rewritten to something like "...correlating with the propensity of some RPCs to continue to proliferate while others differentiate.".

      We have corrected this sentence.

      (5) The structure of the discussion might benefit from the introduction of subheadings.

      We have introduced subheadings.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1H shows the kinetics of rod photoreceptor production as accelerated, but does not represent the fact that fewer rods are ultimately produced, which appears to be the case from the data. If so, the Pten cKO curve should probably be lower than WT to reflect that difference.

      We have removed this graph (as per Reviewer #2, point 2).

      (2) KEGG analysis also showed that the HIF-1 signaling pathway is altered in the Pten cKO retina. What is the significance of that, and is it related to metabolic dysregulation? It has been shown that lactate can promote vessel growth, which initiates at birth in the mouse retina.

      We have added some information on HIF-1 to the Discussion. “The increased glycolytic gene expression in Pten-cKO retinas is likely tied to the increased expression of hypoxia-induced-factor-1-alpha (Hif1a), a known target of mTOR signaling that transcriptionally activates Slc1a3 (GLUT1) and glycolytic genes (Hanna et al., 2022). Indeed, mTOR signaling is hyperactive in Pten-cKO retinas (Cantrup et al., 2012; Tachibana et al., 2016; Tachibana et al., 2018; Touahri et al., 2024), and likewise, in Tsc1-cKO retinas, which also increase glycolysis via HIF-1A (Lim et al., 2021).” (lines 489-494).

      Cantrup, R., Dixit, R., Palmesino, E., Bonfield, S., Shaker, T., Tachibana, N., Zinyk, D., Dalesman, S., Yamakawa, K., Stell, W. K., Wong, R. O., Reese, B. E., Kania, A., Sauve, Y., & Schuurmans, C. (2012). Cell-type specific roles for PTEN in establishing a functional retinal architecture. PLoS One, 7(3), e32795. https://doi.org/10.1371/journal.pone.0032795

      Cereghetti, G., Kissling, V. M., Koch, L. M., Arm, A., Schmidt, C. C., Thüringer, Y., Zamboni, N., Afanasyev, P., Linsenmeier, M., Eichmann, C., Kroschwald, S., Zhou, J., Cao, Y., Pfizenmaier, D. M., Wiegand, T., Cadalbert, R., Gupta, G., Boehringer, D., Knowles, T. P. J., Mezzenga, R., Arosio, P., Riek, R., & Peter, M. (2024). An evolutionarily conserved mechanism controls reversible amyloids of pyruvate kinase via pH-sensing regions. Dev Cell. https://doi.org/10.1016/j.devcel.2024.04.018

      Chen, J., Xie, J., Jiang, Z., Wang, B., Wang, Y., & Hu, X. (2011). Shikonin and its analogs inhibit cancer cell glycolysis by targeting tumor pyruvate kinase-M2. Oncogene, 30(42), 4297-4306. https://doi.org/10.1038/onc.2011.137

      Hanna, J., Touahri, Y., Pak, A., David, L. A., van Oosten, E., Dixit, R., Vecchio, L. M., Mehta, D. N., Minamisono, R., Aubert, I., & Schuurmans, C. (2025). Pten Loss Triggers Progressive Photoreceptor Degeneration in an mTORC1-Independent Manner. Invest Ophthalmol Vis Sci, 66(3), 45. https://doi.org/10.1167/iovs.66.3.45

      Tachibana, N., Cantrup, R., Dixit, R., Touahri, Y., Kaushik, G., Zinyk, D., Daftarian, N., Biernaskie, J., McFarlane, S., & Schuurmans, C. (2016). Pten Regulates Retinal Amacrine Cell Number by Modulating Akt, Tgfbeta, and Erk Signaling. J Neurosci, 36(36), 9454-9471. https://doi.org/10.1523/JNEUROSCI.0936-16.2016

      Touahri, Y., Hanna, J., Tachibana, N., Okawa, S., Liu, H., David, L. A., Olender, T., Vasan, L., Pak, A., Mehta, D. N., Chinchalongporn, V., Balakrishnan, A., Cantrup, R., Dixit, R., Mattar, P., Saleh, F., Ilnytskyy, Y., Murshed, M., Mains, P. E., Kovalchuk, I., Lefebvre, J. L., Leong, H. S., Cayouette, M., Wang, C., Sol, A. D., Brand, M., Reese, B. E., & Schuurmans, C. (2024). Pten regulates endocytic trafficking of cell adhesion and Wnt signaling molecules to pattern the retina. Cell Rep, 43(4), 114005. https://doi.org/10.1016/j.celrep.2024.114005

    1. (cis-normativity, or the assumption that all people have a gender identity that is consistent with the sex they were assigned at birth) that has been built into the scanner, through the combination of user interface (UI) design.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jonathan Calzada, scanning technology, binary-gendered body-shape data constructs, and risk detection algorithms, as well as the socialization, training, and experience of the TSA agen

      I agree with what this part are talking about the design is exclude specific groups. But think about from the designer side, it's hard to include every user group, determine gender identity consistent with the sex they were assigned at birth might be hard for management. But yeah, i agree design should try to be as inclusive as possible, although the reality may be it's hard to design for everyone, we should still try to reach the goal.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewer for their constructive comments and the fair and interesting discussion between reviewers.

      __Reviewer #1 __

      We are delighted to read that the reviewer finds the manuscript “very clear and of immediate impact […] and ready for publication” regarding this aspect. We have toned down the conclusion, proposing rather than concluding that “the incapacitation of Cmg2[KO] intestinal stem cells to function properly […] is due to their inability to transduce Wnt signals”.

      We have addressed the 3 points that were raised as well as the minor comments.

      Point #1

      The mouse mutant is just described as 'KO', referring to the previous work by the authors. The cited work simply states that this is a zygotic deletion of exon 3, which somehow leads to a decrease in protein abundance that is almost total in the lung but not so clear in the uterus. Exon 3 happens to be 72 bp long [https://www.ncbi.nlm.nih.gov/nuccore/NM_133738], so its deletion (assuming there are no cryptic splicing sites used) leads to an internal in-frame deletion of 24 amino acids. So, at best, this 'KO' is not a null, but a hypomorphic allele of context-dependent strength.

      Unfortunately, neither the previous work nor this paper (unless I have missed it!) contains information provided about the expression levels of Cmg2 in the intestine of KO mice - nor which cell types usually express it (see below). I think that using anti Cmg2 in WB and immunohistofluorescence of with ISC markers with intestine homogenate/sections of wild-type and mutant mice would be necessary to set the stage for the rest of the work.

      We now provide and explanation and characterization the Cmg2KO mice. Exon 3 indeed only encodes a short 24 amino acid sequence. This exon however encodes a ß-strand that is central to the vWA domain of CMG2, and therefore critical for the folding of this domain. As now shown in Fig. S1c, CMG2Dexon3 is produced in cells but cleared by the ER associated degradation pathway, therefore it is only detectable in cells treated with the proteasome inhibitor MG132, at a slightly lower molecular weight than the full-length protein. This is consistent, and was inspired by the fact that multiple Hyaline Fibromatosis missense mutations that map to the vWA domain lead to defective folding of CMG2, further illustrating that this domain is very vulnerable to modifications. In Fig. S1c, we moreover now show immunoprecipitation of Cmg2 from colonic tissue of wild-type (WT) and knockout (KO) mice, which confirm the absence of Cmg2 protein in Cmg2KO samples.

      Point #2

      Connected to the previous point, the expression pattern of Cmg2 in the intestine is not described. Maybe this is already established in the literature, but the authors do not refer to the data. This is important when considering that the previous work of the authors suggests that Cmg2 might contribute to Wnt signalling transduction through physical, cis interactions with the Wnt co-receptor LRP6. Therefore, one would expect that Cmg2 would be cell-autonomously required in the intestinal stem cells.

      The expression pattern of Cmg2 in the gut has not been characterized and is indeed essential to understanding its function. To address this gap, we now added a figure (Fig. 1) providing data from publicly available RNA-seq datasets and from our RNAscope experiments on Cmg2WT mice. Of note, we unfortunately have never managed to detect Cmg2 protein expression by immunohistochemistry of mouse tissue with any of the antibodies available, commercial or generated in the lab.

      In the RESULTS section we now mention:

      To investigate Cmg2 expression in the gut, we first analyzed publicly available spatial and scRNA-seq datasets to identify which cell types express Cmg2 across different gut regions. Spatial transcriptomic data from the mouse small intestine and colon revealed that Cmg2 is broadly expressed throughout the gut, including in the muscular, crypt, and epithelial layers (Fig. 1A–C). To validate these findings, we performed RNAscope in situ hybridization targeting Cmg2 in the duodenum and colon of wild-type mice. The expression pattern observed was consistent with the spatial transcriptomics data (Fig. 1D–E). We then analyzed scRNA-seq data from the same dataset to assess cell-type-specific expression in the mouse colon. Cmg2 was detected at varying levels across multiple cell types, including enterocytes and intestinal stem cells, as well as mesenchymal cells, notably fibroblasts.

      Of note for the reviewer, not mentioned in the manuscript, this wide-spread distribution of Cmg2 across the different cell types is not true for all organs. We have recently investigated the expression of Cmg2 in muscle and found that it is almost exclusively expressed in fibroblasts (so-called fibro-adipocyte progenitors) and very little in any other muscle cells, in particular fibers.

      Interestingly also, as now mentioned in the manuscript and shown in Fig. S1,the ANTXR1 protein, which is highly homologous to Cmg2 at the protein level and share its function of anthrax toxin receptor, displayed a much more restricted expression pattern, being confined primarily to fibroblasts and mural cells, and notably absent from epithelial cells. This differential expression highlights a potentially unique and epithelial-specific role for Cmg2 in maintaining intestinal homeostasis.

      Point #3

      The authors establish that the regenerating crypts of Cmg2[KO] mice are unable to transduce Wnt signalling, but it is not clear whether this situation is provoked by the DSS-induce injury or existed all along. Can Cmg2[KO] intestinal stem cells transduce Wnt signalling before the DSS challenge? If they were, it might suggest that the 'context-dependence' of the Cmg2 role in Wnt signalling is contextual not only because of the tissue, but because of the history of the tissue or its present structure. It would also suggest that Cmg2 mutant mice, unless reared in a germ-free facility for life, would eventually lose intestinal homeostasis, and maybe suggest the level of intervention/monitoring that HFS patients would require. It might also provide an explanation in case Cmg2 was not expressed in ISCs - if the state of the tissue was as important as the presence of the protein, then the effect on Wnt transduction could be indirect and therefore it might not be required cell-autonomously.

      We agree that understanding whether Cmg2KO intestinal stem cells are intrinsically unable to transduce Wnt signals, or whether this defect is contextually induced following injury (such as DSS treatment), is a critical point.

      As a first line of evidence, we show than under homeostatic condition, Wnt signaling appears largely intact in Cmg2KO crypts, with comparable levels of ß-catenin and expression levels of canonical Wnt target genes (e.g., Axin2, Lgr5) to those observed in WT animals (Figs. S1j-l and S3d-e). This indicates that Cmg2 is not essential for basal Wnt signaling under steady-state conditions.

      These findings thus support the idea that the requirement for Cmg2 in Wnt signal transduction is context-dependent—not only at the tissue level but also temporally, being specifically required during regenerative processes or in altered microenvironments such as during inflammation or epithelial damage. This context-dependence may reflect changes in the composition or accessibility of Wnt ligands, receptors, or matrix components during repair, where Cmg2 could play a scaffolding or stabilizing role.

      These aspects are now discussed in the text.

      I think points 1 and 2 are absolutely fundamental in a reverse genetics investigation. Point 3 would be nice to know but the outcome would not change the tenet of the paper. I believe that the work needed to deal these points can be performed on archival material. I do not think the mechanism proposed can be taken from 'plausible' to 'proven' without proposing substantial additional investigation, so I will not suggest any of it, as it could well be another paper.

      We have addressed points 1 and 2, and provided evidence and discussion for Point 3.

      __Minor points __

      1- Figure 1 legend says "In (c), results are mean {plus minus} SEM" - this seems applicable to (d) as (c) does not show error whiskers.

      We thank the reviewer for picking up this error. We modified : “In (c), results are median” and “In (d, f and g) Results are mean ± SEM.”

      2- Figure 1 legend says "(d) Body weight loss, (f) the aspect of the feces and presence of occult blood were monitored and used for the (e) DAI. Results are mean {plus minus} SEM. Each dot represents the mean of n = 12 mice per genotype". This part looks like has suffered some rearrangement of words. The first instance of (f) should be (e), I guess, and I am not sure what "(e) DAI" means. And for (e), "mean {plus minus} SEM" does not seem applicable. This needs some light revision.

      The legend was clarified as followed : “(d) __Body weight loss, and (e) aspect of the feces and presence of occult blood were monitored and used to evaluate Disease activity index in (f).__

      3 - Figure 1H legend does not say which statistical test was made in the survival experiment in (h) - presumably log-rank? A further comment on the survival statistics: euthanised animals should not be counted towards true mortality when that is what is recorded as an 'event'. They should be right-censored. However, in this case, reaching the euthanasia criterion is just as good an indicator of health as mortality itself. So, simply by changing the Y axis from 'survival' to 'event-free survival' (or something to that effect), where 'events' are either death or reaching the euthanasia criterion, leaves the analysis as it is, and authors do not need to clarify that figure 1H shows "apparent mortality", as it is straightforward "complication-free survival" (just not entirely orthogonal to weight loss).

      The Y axis was changed from 'survival' to “percentage of mice not reaching the euthanasia criterion”.

      4 - Some density measurements are made unnecessarily on arbitrary units (per field of view) - this should be simple to report in absolute measures (i.e. area of tissue screened or, better still, length of epithelium screened).

      Because the aera of tissue can vary significantly between damages, regenerating and undamaged tissue, we reported the length of epithelium screened as suggested : “per 800um tissue screened” in Fig S1c and Fig 2b.

      5 - Figure 2E should read "percent involvement"

      This has been corrected.

      6 - Figure 2J should read "lipocalin..."

      This has been corrected.

      7 - In section "CMG2 Is Dispensable for YAP/TAZ-Mediated Reprogramming to Fetal-Like Stem Cells", the authors write ""We measured the mRNA levels of two additional YAP target genes, Cyr61 and CTGF...". I presume the "additional" is because Ly6a is also a target of YAP/TAZ, but if the reader does not know, it is puzzling. I would suggest to make this link explicit.

      We added : “In addition to the fetal-like stem cell marker Ly6a, which is a YAP/TAZ target gene, we measured the mRNA levels of two others YAP target genes, Cyr61 and CTGF”

      8 - In Figures S2, 3 and S3, I think that the measures expressed as "% of homeostatic X in WT" really mean "% of average homeostatic X in WT". This should be made clear somewhere.

      We added: “Dotted line represents the average homeostatic levels of Cmg2 WT” in figure legends

      9 - In panel C, the nature of the data is not entirely clear. First, the corresponding part of the legend says "Representative images of n=4 mice per genotype" which I presume should refer to panel B. Then, the graph plots 4 data points, which suggests that they correspond to 4 mice - but how many fields of view? Also, the violin plot outline is not described - I presume it captures all the data points from the coarse-grained pixel analysis, but it should be clarified.

      It was modified as suggested : “(c) Results are presented as violin plot of the Ly6a mean intensity of all data points from the coarse-grain analysis. Each symbol represents the mean per mice of n=4 mice per condition. Results are mean ± SEM. Dotted line represents the average homeostatic levels of Cmg2WT. P values obtained by two-tailed unpaired t test.”

      10 - In Figure 3H and 3I, I would suggest to add the 7+3 timepoint where the data come from.

      We unfortunately do not understand the suggestion of the reviewer, given that these panels show the 7+3 time point.

      11 - In section "CMG2 Is Critical for Restoring the Lgr5+ Intestinal Stem Cell Pool", the authors say "...The mRNA levels of ... LRP6, β-catenin (Fig. S3a-b), and Wnt ligands (Wnt5a, 5b, and 2b) were comparable between the colons of Cmg2WT and Cmg2KO mice (Fig. S3c)..." without clarifying in which context - one needs to read the figure legend to realise this is "timepoint 7+3". I suggest to add "in the recovery phase" or "in regenerating colons" or something shorter, just to guide the reader.

      We added : “Initially, we quantified the expression of key molecular components involved in Wnt signaling in mice colon 3 days after DSS withdrawal using qPCR.”

      12 - Like with the previous point, it is not clear when the immunohistofluorescence of B-catenin is made - not even in the legend, as far as I could see. The only hint is that authors say "the nuclei of cells in the atrophic crypts of Cmg2KO..." with 'atrophic' probably indicating again the 7+3 timepoint.

      We have changed the text and now mention “Next, we analyzed β-catenin activation in the colon of Cmg2WT and Cmg2KO mice during the recovery phase.”

      13 - A typo in the discussion: tunning for tuning.

      This has been corrected.

      14 - In the discussion, the authors talk about the 'CMG2' protein (all caps - formatting convention for human proteins) but before they were referring to 'Cmg2' (formatting convention for mouse proteins). That is fine but some of the statements where "CMG2" is used clearly refer to observations made in the mouse.

      We have now used Cmg2, whenever referring to the mouse protein.

      15 - Typos in methods: "antigen retrieval by treating [with] Proteinase K"; "Image acquisition and analyze [analysis]"; "All details regarding code used for immunofluorescence analysis”.

      This has been corrected.

      __Reviewer #2 __

      We are very pleased to read that the reviewer found the study “overall well designed, meticulously carried out, and with clear and convincing results that are most reasonably and thoughtfully interpreted”.

      For this reader, one additional thought comes to mind. If I understand the field correctly it would be informative to know with greater confidence where - in what cell type, epithelial or mesenchymal - the CMG2-LRP6-WNT interaction occurs.

      This point was also raised by Reviewer I, and we have now added a new Figure 1, that describes Cmg2 expression in the gut, based both on from publicly available RNA-seq datasets and our RNAscope experiments on Cmg2WT mice. Of note, we unfortunately have never managed to detect Cmg2 protein expression by immunohistochemistry of mouse tissue with any of the antibodies available, commercial or generated in the lab.

      After injury the CMG2-KO mouse epithelium exhibits defective WNT signal transduction - as evidenced by failure of b-catenin to translocate into the nucleus. At first glance, this result is a disconnect with the paper by van Rijin that claims the defect in Hyaline Fibromatosis Syndrome cannot be due to loss of CMG2 expression/function in the barrier epithelial cell - a claim based on the mostly normal phenotypes of human CMG2 KO duodenal organoids. But the human organoids studied in the van Rijin paper, like all others, are established and cultured in very high WNT conditions, perhaps obscuring the lack of the CMG2-LRP6-WNT interaction. And in fact, the phenotypes of these human CMG2-KO duodenoids were not entirely normal - the CMG2-KO stem-like organoids (even when cultured in high WNT/R-spondin conditions) developed abnormal intercellular blisters consistent with a defect in epithelial structure/function - of unknown cause and not investigated.

      We thank the reviewer for raising this point and we fully agree. We now specify in the text that the human CMG2-KO duodenoids showed blisters, indeed consistent with a defect in epithelial structure/function, and that they were grown on high Wnt media which likely obscure the CMG2 requirement.

      I think it would be informative to prepare colon organoids (and duodenoids) from WT and CMG2-KO mice to quantify their WNT dependency during establishment and maintenance of the stem-like (and WNT-dependent) state. If CMG2 acts within the epithelial cell to affect WNT signaling (regardless of WNT source), organoids prepared from colons of CMG2-KO mice would require more WNT in culture media to establish and maintain the stem cell proliferative state - when compared to organoids prepared from WT mice. This can be quantified (and confirmed molecularly by transgene expression if successful). Enhanced dependency of high concentrations of exogenous WT would be evidence for a primary defect in WNT-(LRP2)-CMG2 signal transduction localized to the epithelial barrier cell - thus addressing the apparent discrepancy with the van Rijin paper - and for my part, advancing the field. And the discovery of a defect in the epithelium itself for WNT signal transduction would implicate a biologically most plausible mechanism for development of protein losing enteropathy.

      By no means do I consider these experiments to be required for publication (especially if considered to be incremental or already defined - WNT-CMG2 is not my field of research). This study already makes a meaningful contribution to the field as I state above. But in the absence of new experimentation, the issue should probably be discussed in greater depth.

      We are working out conditions to grow colon organoids that from WT and Cmg2 KO mice, indeed playing around with the concentrations of Wnt in the various media to identify those that would best mimic the regeneration conditions. This is indeed a study in itself. We have however included a discussion on this point in the manuscript as suggested.

      __Reviewer #3: __

      We thank the reviewer for her/his insightful comments.

      The premise is that the causative germline mutated gene, CMG2/ANTRX2, may have a functional role in colonic epithelium in addition to controlling the ECM composition. There is little background information but one study has shown no primary defect in epithelial organoids grown from patients with the syndrome. This leads the authors to wonder if non-homeostatic, conditions might reveal a function role for the gene in regeneration.

      Reviewer 2 commented on the fact that “human organoids studied in the van Rijin paper, like all others, are established and cultured in very high WNT conditions, perhaps obscuring the lack of the CMG2-LRP6-WNT interaction. And in fact, the phenotypes of these human CMG2-KO duodenoids were not entirely normal - the CMG2-KO stem-like organoids (even when cultured in high WNT/R-spondin conditions) developed abnormal intercellular blisters consistent with a defect in epithelial structure/function - of unknown cause and not investigated”.

      We have now added a discussion on this point in the manuscript.

      The authors' approach to test the hypothesis is to use a mouse germline knockout model and to induce colitis and regeneration by the established protocol of introducing dextran sodium sulfate (DSS) into the drinking water for five days. In brief there is no phenotype apparent in the untreated knockout (KO) but these animals show a more severe response to DSS that requires them to be killed by 10 days after the start of treatment. This effect following phenotypic characterisation of the colonic epithelium is interpreted as showing the CMG2 is a Wnt modifier required for the restoration of the intestinal stem cell population in the final stages of repair.

      The experiment and analysis seem reasonably well executed - although a few specific comments follow below. The narrative is simple and easy to understand. However, there are significant caveats that cast doubts on the interpretation made that loss of CMG2 impairs the transition of colonic epithelial cells from a fetal like state to adult ISCs.

      First there is only a single approach and single type of experiment performed. There is a lack of independent validation of the phenotype and how it is mediated.

      We do not fully understand what type of independent validation of the phenotype the reviewer would have liked to see. Is it the induction of intestinal damage using a stress other than DSS?

      The DSS dose in this kind of experiment is often determined empirically in individual units. Here the 3% used is within published range but at upper end. The control animals show a typical response with symptoms of colitis worsening for 2-3 days after the removal of DSS and then recovery commonly over another 5-7 days. Here the CMG2 KO mice fail to recover and are killed by 9 or 10 days. The authors attempt to exploit the time course by identifying normal initial (7days) and defective late (10days) repair phases in KO animals when compared to controls. It is from this comparison that conclusions are drawn. However, the alternative interpretation might be that the epithelium of KO animals is so badly damaged, and indeed non-existent (from viewing Fig2a), that it is incapable of mounting any other response other than death and that the profiling shown is of an epithelium in extremis. The repair capability and dynamics of the KO would have been better tested under more moderate DSS challenge, if this experiment had been regarded as a pilot rather than as definitive.

      The choice of 3% DSS was in fact based on a pilot experiment. As now shown in Fig. S4, we tested different concentrations and found that 3% DSS was the lowest concentration that reliably induced the full spectrum of colitis-associated symptoms, including significant body weight loss, diarrhea, rectal bleeding (summarized in the Disease Activity Index), as well as macroscopic signs such as colon shortening and spleen enlargement. Based on these criteria, we selected 3% DSS for the study described in the manuscript.

      In this model, WT mice showed a typical progression: body weight stabilized rapidly after DSS withdrawal, with resolution of diarrhea and rectal bleeding. Histological analysis at day 9 revealed signs of epithelial regeneration, including hypertrophic crypts and increased epithelial proliferation.

      In contrast, Cmg2KO mice failed to initiate this recovery phase. Clinical signs such as weight loss, diarrhea, and bleeding persisted after DSS withdrawal, ultimately necessitating euthanasia at day 9–10 due to humane endpoint criteria. Unfortunately, this prevented us from exploring later timepoints to determine whether regeneration was delayed or completely abrogated in the absence of Cmg2.

      Regarding the severity of epithelial damage, as raised by Reviewer 1, we now provide detailed histological scoring in the supplementary data. This analysis shows that the severity of inflammation and crypt damage was similar between WT and KO animals, as were inflammatory markers such as Lipocalin-2. The key difference lies in the extent of tissue involvement. While the lesions in WT mice were more localized, Cmg2KO mice displayed widespread and diffuse damage with no sign of regeneration as shown by the absence of hypertrophic crypts and a marked reduction in both epithelial coverage and proliferative cells. Importantly, at day 7, the percentage of epithelial and proliferating cells was comparable between genotypes, further supporting the idea that Cmg2KO mice failed to initiate this recovery phase and present a defective repair response.

      The animals used were young (8 weeks) and lacked any obvious defect in collagen deposition. Does this change with treatment? Even if not, is it possible that there is a defect in peristalsis or transit time of gut contents, resulting in longer dwell times and higher effective dose of DSS to the KO epithelium?

      Collagen deposition, particularly of collagen VI, is known to increase in response to intestinal injury and plays a critical role in promoting tissue repair following DSS-induced damage (Molon et al., PMID: 37272555). As suggested, we investigated whether Cmg2KO mice exhibit abnormal collagen VI accumulation following DSS treatment.

      Our results show that, consistent with published data, WT mice exhibit a marked increase in collagen VI expression during the acute phase of colitis, with levels returning toward baseline following DSS withdrawal. A similar expression pattern was observed in Cmg2KO mice, with no significant differences in Col6a1 mRNA levels between WT and KO animals throughout the entire time course of the experiment. This observation was further confirmed at the protein level by western blot and immunohistochemistry analyses, suggesting that the impaired regenerative capacity observed in Cmg2KO mice is independent of Collagen VI.

      Regarding the possibility of altered peristalsis or intestinal transit time contributing to increased DSS exposure in KO mice, this is indeed a possibility. Although we did not directly measure gut motility in this study, we did not observe any signs of intestinal obstruction or fecal retention in Cmg2KO mice. Indeed, during the experiment, animals were single caged for 30min in order to collect feces and no difference in the amount of feces collected was observed between WT and KO mice, arguing against a substantial difference in transit time (see figure below). The possible altered peristalsis and these observations are now mentioned in the discussion.

      Is CMG2 RNA and protein expressed in the colonic epithelium? It is not indicated or tested in the submitted manuscript. This reviewer struggled to find evidence, notably it did not seem to be referenced in the organoid paper they reference in introduction (ref 13).

      This very valid point was also raised by Reviewers 1 and 2. The expression pattern of Cmg2 in the gut has indeed not been characterized and is essential to understanding its function. To address this gap, we added a figure (Fig. 1) providing data from publicly available RNA-seq datasets and from our RNAscope experiments on Cmg2WT mice. Of note, we unfortunately have never managed to detect Cmg2 protein expression by immunohistochemistry of mouse tissue with any of the antibodies available, commercial or generated in the lab.

      __Specific comments: __

      Figure 3 c-e and associated text are confusing. In c the Y scale seems inappropriate to show percentages up to 15,000%.

      In this graph values are normalized to homeostatic level of WT mice which represent 100%

      In d and e the use of percentages may by correct. However, it is claimed in text that Cty61 and CTFG are upregulated in the KO. That is not what the plots appear to show as the compare to WT untreated cells, in which case the KO have not downregulated these genes in the way the controls have.

      As clarified in the text, under regenerative conditions, a transient activation of YAP signaling is crucial to induce a fetal-like reversion of intestinal stem cells. However, in a subsequent phase, the downregulation of YAP and the reactivation of Wnt signaling are necessary to complete intestinal regeneration. Several studies have highlighted a strong interplay between the Wnt and YAP pathways, suggesting that their coordinated regulation is essential for effective gut repair. Nevertheless, the precise mechanisms governing this interaction remain incompletely understood.

      In our model, this critical transition—YAP downregulation and Wnt reactivation—appears to be impaired. CMG2 may either hinder Wnt reactivation directly, or lead to sustained YAP signaling, which in turn suppresses activation of the Wnt pathway. Further studies, using in-vivo model and organoid models, will be necessary to understand the mechanistic role of Cmg2 in this regulatory process.

      A precision of the figure has been updated as followed: both of which were significantly upregulated in the injured colons of Cmg2KO mice compared to DSS-injured Cmg2WT mice

      __**Referees cross-commenting** __

      Rev2 Points 1 and 2 made by Referee 1 (and point 4 of Referee 3) appear most reasonable, and if not already done should be.

      We have indeed addressed these 2 points.

      I also noted the more severe morphology of DSS damaged epithelium shown in Fig 2a noted by Referee 3 - and this I agree is a confounding factor. […] For my part, the concern is understandable but likely not operating in a confounding way. And the evidence for the reprogramming of the damaged epithelium into "fetal-like stem cells" (the 1st step in restitution of lost stem cells) occurs in both WT and KO mice - and these data are strong. For this reader, the block convincingly shows up for KO mouse at the WNT dependent step

      The representative image has been updated, and a transverse section has been added to better illustrate that, although both epithelium and crypt structures can be present, the epithelial morphology differs significantly. Indeed, the regenerating epithelium of Cmg2WT mice displays a thick epithelial layer with well-polarized epithelial cells, whereas in cmg2KO mice, the epithelium appears atrophic, characterized by a thinner epithelial layer and elongated epithelial cells.

      __Rev 3 __

      This reviewer remains sceptical. I agree the authors performed the experiment well to confirm that DSS dosing was as equivalent as possible across the study. But DSS acts to induce colitis because it is concentrated in the colonic lumen as water is absorbed. Also ECM responses and remodelling are a central part of colitis models. And my concern is that the actual exposure in the KO group is influenced by transit of faeces/DSS is secondary to the known action of CMG2 on collagen deposition. The consequence of this being a protracted damage phase in which a restoration of adult stem cells would not be expected and leading to epithelial failure.

      However, we differ. I might propose that the authors are asked to investigate and confirm expression of CMG2 in the epithelium and to repeat the analysis of collagen levels they performed on untreated CMG2 KO mice on colons from CMG2 KO mice having received DSS to see if these differ from controls.

      This has now been done.

      __Rev 1 __

      Both reviewer #2 and reviewer #3 make relevant points, from the point of view of extracting as much biological knowledge as we can from the observations reported in the manuscript.

      Reviewer #2 suggestion to use Cmg2[KO] organoids to investigate the dependence of Wnt transduction on Cmg2 is the type of experiments I refrained to propose. However, I think the "skeleton" of the mechanism is there and is reasonably solid. Fleshing it out may well be another paper.

      I agree with Reviewer #3 objections to the timing and severity of the DSS damage. However, I am not sure how much they invalidate the main tenet of the paper:

      • DSS may affect Cmg2[KO] more severely, but the overall disease score is comparable during the DSS treatment. If this severity was enough to be the main driver of the phenotype, it should have left a mark in the Histological and Disease activity scores. In this regard, I think it would be helpful if the authors provided an expanded version of Figure 2A with examples of the different levels of "Crypt damage" scored, and the proportions for each. This could be in the supplementary material and would balance the impressions induced by a single image.

      As suggested, we included a detail of histological score including the crypt damage score in Supplementary Fig 3i showing no significant differences in crypt damage between Cmg2WT and Cmg2KO mice.

      • If DSS affected the recovery, this would also be compatible with having a more severe histological phenotype (which is not shown overall, just in Fig 2A) because one would also expect the tissue to attempt regeneration during the 7 days of DSS treatment.

      This is an interesting point, and we now allude to this aspect in the manuscript.

      • The only objection that I find difficult to argue is the effective duration of the treatment. If indeed peristalsis is affected, it may be that during the 'recovery' phase there is still DSS in the intestine. This could be perhaps verified using a DS detection assay (e.g. https://arxiv.org/pdf/1703.08663) on the intestinal contents or the faeces of the mice during the 3-day recovery period.

      We have attempted to obtain and purchase Heparin Red to perform this assay. Unfortunately, we have not obtained the reagent, which has never been delivered. We now also mention the following in the Discussion:

      One could envision that Cmg2KO mice have a defect in peristalsis resulting in longer dwell times and possibly higher effective dose of DSS to the KO epithelium. We however did not observe any signs of intestinal obstruction or fecal retention in Cmg2KO mice. Animals were single-caged for 30 min to collect feces. We did not observe any difference in amounts collected from WT and KO mice, arguing against a substantial difference in transit time of gut contents. Moreover, if DSS affected the recovery, one would have expected a more severe histological phenotype in the colon of Cmg2KO since the tissue likely already attempts regeneration during the 7 days of DSS treatment. But this was not the case. Therefore, while we cannot formally rule out the presence of residual DSS in Cmg2KO mice during the DSS withdrawal phase, there is currently no indication that this was the case.

      I think of what the aim of scholarly publication is, with this paper, and I find myself going back to a statement of the authors' discussion - that this work suggests that infants risking death may be offered (compassionate, I guess) IBD treatment. What does this hinge upon? I think, on the basic observation that diarrhoea (in the mouse model) is not intrinsic but caused by an inflammation-promoting insult. Is this substantiated? I think it is. Could we learn more biology from this disease model, about Wnt and about how ECM affects tissue regeneration? Certainly. Can this learning wait? I believe it can.

      We thank the reviewer for this statement.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this work, Bracq and colleagues provide clear evidence that the persistent diarrhoea seen in a mouse model of Hyaline Fibromatosis Syndrome is related to the inability of their intestinal epithelium to properly regenerate. This is very clear and of immediate impact. This aspect of the paper, I think, is ready for publication, and would merit immediate dissemination on its own. It is great that the manuscript is in bioRxiv already.

      I am not so thoroughly convinced about the mechanism that the author propose to explain the incapacitation of Cmg2[KO] intestinal stem cells to function properly. The authors propose that it is due to their inability to transduce Wnt signals, and while this is plausible, I think there are few things that the paper should contain before this can be proposed firmly:

      Point #1

      The mouse mutant is just described as 'KO', referring to the previous work by the authors. The cited work simply states that this is a zygotic deletion of exon 3, which somehow leads to a decrease in protein abundance that is almost total in the lung but not so clear in the uterus. Exon 3 happens to be 72 bp long [https://www.ncbi.nlm.nih.gov/nuccore/NM_133738], so its deletion (assuming there are no cryptic splicing sites used) leads to an internal in-frame deletion of 24 amino acids. So, at best, this 'KO' is not a null, but a hypomorphic allele of context-dependent strength. Unfortunately, neither the previous work nor this paper (unless I have missed it!) contains information provided about the expression levels of Cmg2 in the intestine of KO mice - nor which cell types usually express it (see below). I think that using anti Cmg2 in WB and immunohistofluorescence of with ISC markers with intestine homogenate/sections of wild-type and mutant mice would be necessary to set the stage for the rest of the work.

      Point #2

      Connected to the previous point, the expression pattern of Cmg2 in the intestine is not described. Maybe this is already established in the literature, but the authors do not refer to the data. This is important when considering that the previous work of the authors suggests that Cmg2 might contribute to Wnt signalling transduction through physical, cis interactions with the Wnt co-receptor LRP6. Therefore, one would expect that Cmg2 would be cell-autonomously required in the intestinal stem cells.

      Point #3

      The authors establish that the regenerating crypts of Cmg2[KO] mice are unable to transduce Wnt signalling, but it is not clear whether this situation is provoked by the DSS-induce injury or existed all along. Can Cmg2[KO] intestinal stem cells transduce Wnt signalling before the DSS challenge? If they were, it might suggest that the 'context-dependence' of the Cmg2 role in Wnt signalling is contextual not only because of the tissue, but because of the history of the tissue or its present structure. It would also suggest that Cmg2 mutant mice, unless reared in a germ-free facility for life, would eventually lose intestinal homeostasis, and maybe suggest the level of intervention/monitoring that HFS patients would require. It might also provide an explanation in case Cmg2 was not expressed in ISCs - if the state of the tissue was as important as the presence of the protein, then the effect on Wnt transduction could be indirect and therefore it might not be required cell-autonomously.

      I think points 1 and 2 are absolutely fundamental in a reverse genetics investigation. Point 3 would be nice to know but the outcome would not change the tenet of the paper. I believe that the work needed to deal these points can be performed on archival material. I do not think the mechanism proposed can be taken from 'plausible' to 'proven' without proposing substantial additional investigation, so I will not suggest any of it, as it could well be another paper.

      A few minor points picked along the way:

      1. Figure 1 legend says "In (c), results are mean {plus minus} SEM" - this seems applicable to (d) as (c) does not show error whiskers.
      2. Figure 1 legend says "(d) Body weight loss, (f) the aspect of the feces and presence of occult blood were monitored and used for the (e) DAI. Results are mean {plus minus} SEM. Each dot represents the mean of n = 12 mice per genotype". This part looks like has suffered some rearrangement of words. The first instance of (f) should be (e), I guess, and I am not sure what "(e) DAI" means. And for (e), "mean {plus minus} SEM" does not seem applicable. This needs some light revision.
      3. Figure 1H legend does not say which statistical test was made in the survival experiment in (h) - presumably log-rank? A further comment on the survival statistics: euthanised animals should not be counted towards true mortality when that is what is recorded as an 'event'. They should be right-censored. However, in this case, reaching the euthanasia criterion is just as good an indicator of health as mortality itself. So, simply by changing the Y axis from 'survival' to 'event-free survival' (or something to that effect), where 'events' are either death or reaching the euthanasia criterion, leaves the analysis as it is, and authors do not need to clarify that figure 1H shows "apparent mortality", as it is straightforward "complication-free survival" (just not entirely orthogonal to weight loss).
      4. Some density measurements are made unnecessarily on arbitrary units (per field of view) - this should be simple to report in absolute measures (i.e. area of tissue screened or, better still, length of epithelium screened).
      5. Figure 2E should read "percent involvement"
      6. Figure 2J should read "lipocalin..."
      7. In section "CMG2 Is Dispensable for YAP/TAZ-Mediated Reprogramming to Fetal-Like Stem Cells", the authors write ""We measured the mRNA levels of two additional YAP target genes, Cyr61 and CTGF...". I presume the "additional" is because Ly6a is also a target of YAP/TAZ, but if the reader does not know, it is puzzling. I would suggest to make this link explicit.
      8. In Figures S2, 3 and S3, I think that the measures expressed as "% of homeostatic X in WT" really mean "% of average homeostatic X in WT". This should be made clear somewhere.
      9. In panel C, the nature of the data is not entirely clear. First, the corresponding part of the legend says "Representative images of n=4 mice per genotype" which I presume should refer to panel B. Then, the graph plots 4 data points, which suggests that they correspond to 4 mice - but how many fields of view? Also, the violin plot outline is not described - I presume it captures all the data points from the coarse-grained pixel analysis, but it should be clarified.
      10. In Figure 3H and 3I, I would suggest to add the 7+3 timepoint where the data come from.
      11. In section "CMG2 Is Critical for Restoring the Lgr5+ Intestinal Stem Cell Pool", the authors say "...The mRNA levels of ... LRP6, β-catenin (Fig. S3a-b), and Wnt ligands (Wnt5a, 5b, and 2b) were comparable between the colons of Cmg2WT and Cmg2KO mice (Fig. S3c)..." without clarifying in which context - one needs to read the figure legend to realise this is "timepoint 7+3". I suggest to add "in the recovery phase" or "in regenerating colons" or something shorter, just to guide the reader.
      12. Like with the previous point, it is not clear when the immunohistofluorescence of B-catenin is made - not even in the legend, as far as I could see. The only hint is that authors say "the nuclei of cells in the atrophic crypts of Cmg2KO..." with 'atrophic' probably indicating again the 7+3 timepoint.
      13. A typo in the discussion: tunning for tuning.
      14. In the discussion, the authors talk about the 'CMG2' protein (all caps - formatting convention for human proteins) but before they were referring to 'Cmg2' (formatting convention for mouse proteins). That is fine but some of the statements where "CMG2" is used clearly refer to observations made in the mouse.
      15. Typos in methods: "antigen retrieval by treating [with] Proteinase K"; "Image acquisition and analyze [analysis]"; "All details regarding code[s] used for immunofluorescence analysis"

      Referees cross-commenting

      *this session contains comments from ALL the reviewers"

      Rev2

      Points 1 and 2 made by Referee 1 (and point 4 of Referee 3) appear most reasonable, and if not already done should be.

      I also noted the more severe morphology of DSS damaged epithelium shown in Fig 2a noted by Referee 3 - and this I agree is a confounding factor. But overall, multiple lines of evidence were assembled to show that the KO mice and WT mice suffered DSS-induced colitis with equal severity - and with closely equal severity of damage to the intestinal epithelium (though the image in Fig 2a is disturbing). For my part, the concern is understandable but likely not operating in a confounding way. And the evidence for the reprogramming of the damaged epithelium into "fetal-like stem cells" (the 1st step in restitution of lost stem cells) occurs in both WT and KO mice - and these data are strong. For this reader, the block convincingly shows up for KO mouse at the WNT dependent step

      Rev 3 This reviewer remains sceptical. I agree the authors performed the experiment well to confirm that DSS dosing was as equivalent as possible across the study. But DSS acts to induce colitis because it is concentrated in the colonic lumen as water is absorbed. Also ECM responses and remodelling are a central part of colitis models. And my concern is that the actual exposure in the KO group is influenced by transit of faeces/DSS is secondary to the known action of CMG2 on collagen deposition. The consequence of this being a protracted damage phase in which a restoration of adult stem cells would not be expected and leading to epithelial failure.

      However, we differ. I might propose that the authors are asked to investigate and confirm expression of CMG2 in the epithelium and to repeat the analysis of collagen levels they performed on untreated CMG2 KO mice on colons from CMG2 KO mice having received DSS to see if these differ from controls.

      Rev 1 Both reviewer #2 and reviewer #3 make relevant points, from the point of view of extracting as much biological knowledge as we can from the observations reported in the manuscript.

      Reviewer #2 suggestion to use Cmg2[KO] organoids to investigate the dependence of Wnt transduction on Cmg2 is the type of experiments I refrained to propose. However, I think the "skeleton" of the mechanism is there and is reasonably solid. Fleshing it out may well be another paper.

      I agree with Reviewer #3 objections to the timing and severity of the DSS damage. However, I am not sure how much they invalidate the main tenet of the paper:

      • DSS may affect Cmg2[KO] more severely, but the overall disease score is comparable during the DSS treatment. If this severity was enough to be the main driver of the phenotype, it should have left a mark in the Histological and Disease activity scores. In this regard, I think it would be helpful if the authors provided an expanded version of Figure 2A with examples of the different levels of "Crypt damage" scored, and the proportions for each. This could be in the supplementary material and would balance the impressions induced by a single image.

      • If DSS affected the recovery, this would also be compatible with having a more severe histological phenotype (which is not shown overall, just in Fig 2A) because one would also expect the tissue to attempt regeneration during the 7 days of DSS treatment.

      • The only objection that I find difficult to argue is the effective duration of the treatment. If indeed peristalsis is affected, it may be that during the 'recovery' phase there is still DSS in the intestine. This could be perhaps verified using a DS detection assay (e.g. https://arxiv.org/pdf/1703.08663) on the intestinal contents or the faeces of the mice during the 3-day recovery period.

      I think of what the aim of scholarly publication is, with this paper, and I find myself going back to a statement of the authors' discussion - that this work suggests that infants risking death may be offered (compassionate, I guess) IBD treatment. What does this hinge upon? I think, on the basic observation that diarrhoea (in the mouse model) is not intrinsic but caused by an inflammation-promoting insult. Is this substantiated? I think it is. Could we learn more biology from this disease model, about Wnt and about how ECM affects tissue regeneration? Certainly. Can this learning wait? I believe it can.

      Significance

      In this work, Bracq and colleagues provide clear evidence that the persistent diarrhoea seen in a mouse model of Hyaline Fibromatosis Syndrome is related to the inability of their intestinal epithelium to properly regenerate. This is very clear and of immediate impact. For instance, the authors themselves point at the possibility of applying treatments for Inflammatory Bowel Disease to HFS patients. While what happens in a mouse model is not necessarily the same as in human patients, the fact that persistent diarrhoea is a life-threatening symptom in HFS make this proposal, at least in compassionate use of the therapies and until its efficacy is disproven, very plausible. This is a clear gap of knowledge that addresses an unmet medical need.

      I find that the work shows clearly that HFS mouse model subjects have normal intestinal function until challenged with a standard chemically-induced colitis. Then, the histological and health deterioration of the HFS mouse model is clear in comparison with normal mice, which can regenerate appropriately. This is shown with a multiplicity of orthogonal techniques spanning molecular, histological and organismal, which are standard and very well reported in the paper.

      The authors propose a specific cellular and molecular mechanism to explain the incapacity of the intestinal epithelium in the mouse model of HFS to regenerate. According to this mechanism, the protein Cmg2, whose mutation causes HFS in humans, would be necessary for intestinal stem cells to transduce the signal of Wnt ligands and therefore support their behaviour as regenerative cells. This mechanism is plausible, but more basic and advanced work would be needed to take it as proven.

      This work would be of interest to both the clinical, biomedical, and basic research communities interested in rare diseases, the gastrointestinal system, collagen and extracellular matrix, and Wnt signalling.

      My general expertise is in developmental and stem cell biology using reverse genetics, transgenesis and immunohistological and molecular methods of data production, and lineage tracing, digital imaging and bioinformatic analytical methods; I work with Drosophila melanogaster and its adult gastrointestinal system.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) The initial high accumulation by all cells followed by the emergence of a sub-population that has reduced its intracellular levels of tachyplesin is a key observation and I agree with the authors' conclusion that this suggests an induced response to the AMP is important in facilitating the bimodal distribution. However, I think the conclusion that upregulated efflux is driving the reduction in signal in the "low accumulator" subpopulation is not fully supported. Steady-state amounts of intracellular fluorescent AMP are determined by the relative rates of influx and efflux and a decrease could be caused by decreasing influx (while efflux remained unchanged), increasing efflux (while influx remained unchanged), or both decreasing influx and increasing efflux. Given the transcriptomic data suggest possible changes in the expression of enzymes that could affect outer membrane permeability and outer membrane vesicle formation as well as efflux, it seems very possible that changes to both influx and efflux are important. The "efflux inhibitors" shown to block the formation of the low accumulator subpopulation have highly pleiotropic or incompletely characterised mechanisms of action so they also do not exclusively support a hypothesis of increased efflux.

      We agree with the reviewer that the emergence of low accumulators after 30 min in the presence of extracellular tachyplesin-NBD (Figure 4A) could be due to either decreased influx while efflux remained unchanged, increased efflux while influx remained unchanged, or both decreasing influx and increasing efflux. Increased proteolytic activity or increased secretion of OMVs could also play a role.

      We have now acknowledged that “Reduced intracellular accumulation of tachyplesin-NBD in the presence of extracellular tachyplesin-NBD could be due to decreased drug influx, increased drug efflux, increased proteolytic activity or increased secretion of OMVs.” (lines 313-315).

      However, the emergence of low accumulators after 60 min in the absence of extracellular tachyplesin-NBD in our efflux assays (Figure 4C) cannot be due to decreased influx while efflux remained unchanged because of the absence of extracellular tachyplesin-NBD. We acknowledge that in our original manuscript we did not explicitly state that the efflux assays reported in Figure 4C-D were performed in the absence of tachyplesin-NBD in the extracellular environment. We have now clarified this point in our manuscript, we have added illustrations in Figure 4A, 4C-D and we have also carried out efflux assays using ethidium bromide (EtBr) to further support our conclusions about the primary role played by efflux in reducing tachyplesin accumulation in low accumulators. We have added the following paragraphs to our revised manuscript:

      “Next, we performed efflux assays using ethidium bromide (EtBr) by adapting a previously described protocol [62]. Briefly, we preloaded stationary phase E. coli with EtBr by incubating cells at a concentration of 254 µM EtBr in M9 medium for 90 min. Cells were then pelleted and resuspended in M9 to remove extracellular EtBr. Single-cell EtBr fluorescence was measured at regular time points in the absence of extracellular EtBr using flow cytometry. This analysis revealed a progressive homogeneous decrease of EtBr fluorescence due to efflux from all cells within the stationary phase E. coli population (Figure S13A). In contrast, when we performed efflux assays by preloading cells with tachyplesin-NBD (46 μg mL<sup>-1</sup> or 18.2 μM), followed by pelleting and resuspension in M9 to remove extracellular tachyplesin-NBD, we observed a heterogeneous decrease in tachyplesin-NBD fluorescence in the absence of extracellular tachyplesin-NBD: a subpopulation retained high tachyplesin-NBD fluorescence, i.e. high accumulators; whereas another subpopulation displayed decreased tachyplesin-NBD fluorescence, 60 min after the removal of extracellular tachyplesin-NBD (Figure 4B). Since these assays were performed in the absence of extracellular tachyplesin-NBD, decreased tachyplesin-NBD fluorescence could not be ascribed to decreased drug influx or increased secretion of OMVs in low accumulators, but could be due to either enhanced efflux or proteolytic activity in low accumulators.

      Next, we repeated efflux assays using EtBr in the presence of 46 μg mL<sup>-1</sup> (or 20.3 µM) extracellular tachyplesin-1. We observed a heterogeneous decrease of EtBr fluorescence with a subpopulation retaining high EtBr fluorescence (i.e. high tachyplesin accumulators) and another population displaying reduced EtBr fluorescence (i.e. low tachyplesin accumulators, Figure S14B) when extracellular tachyplesin-1 was present. Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].

      Taken together, our data demonstrate that in the absence of extracellular tachyplesin, stationary phase E. coli homogeneously efflux EtBr, whereas only low accumulators are capable of performing efflux of intracellular tachyplesin after initial tachyplesin accumulation. In the presence of extracellular tachyplesin, only low accumulators can perform efflux of both intracellular tachyplesin and intracellular EtBr. However, it is also conceivable that besides enhanced efflux, low accumulators employ proteolytic activity, OMV secretion, and variations to their bacterial membrane to hinder further uptake and intracellular accumulation of tachyplesin in the presence of extracellular tachyplesin.”

      These amendments can be found on lines 316-350 and in the new Figure S13 and Figure 4. We have also carried out more tachyplesin-NBD accumulation assays using single and double gene-deletion mutants lacking efflux components, please see Response 3 to reviewer 2 and the data reported in Figure 4B.

      (2) A conclusion of the transcriptomic analysis is that the lower accumulating subpopulation was exhibiting "a less translationally and metabolically active state" based on less upregulation of a cluster of genes including those involved in transcription and translation. This conclusion seems to borrow from well-described relationships referred to as bacterial growth laws in which the expression of genes involved in ribosome production and translation is directly related to the bacterial growth (and metabolic) rate. However, the assumptions that allow the formulation of the bacterial growth laws (balanced, steady state, exponential growth) do not hold in growth arrest. A non-growing cell could express no genes at all or could express ribosomal genes at a very low level, or efflux pumps at a high level. The distribution of transcripts among the functional classes of genes does not reveal anything about metabolic rates within the context of growth arrest - it only allows insight into metabolic rates when the constraint of exponential growth can be assumed. Efflux pumps can be highly metabolically costly; for example, Tn-Seq experiments have repeatedly shown that mutants for efflux pump gene transcriptional repressors have strong fitness disadvantages in energy-limited conditions. There are no data presented here to disprove a hypothesis that the low accumulators have high metabolic rates but allocate all of their metabolic resources to fortifying their outer membranes and upregulating efflux. This could be an important distinction for understanding the vulnerabilities of this subpopulation. Metabolic rates can be more directly estimated for single cells using respiratory dyes or pulsed metabolic labelling, for example, and these data could allow deeper insight into the metabolic rates of the two subpopulations. My main recommendation for additional experiments to strengthen the conclusions of the paper would be to attempt to directly measure metabolic or translational activity in the high- and low-accumulating populations. I do not think that the transcriptomic data are sufficient to draw conclusions about this but it would be interesting to directly measure activity. Otherwise, it might be reasonable to simply soften the language describing the two populations as having different activity levels. They do seem to have different transcriptional profiles, and this is already an interesting observation.

      We agree with the reviewer that it might be misleading to draw conclusions on bacterial metabolic states solely based on transcriptomic data. We have therefore removed the statement “low accumulators displayed a less translationally and metabolically active state”. We have instead stated the following: “Our transcriptomics analysis showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression processes compared to high accumulators”. Moreover, we have employed the membrane-permeable redox-sensitive dye C<sub>12</sub>-resazurin, which is reduced to the fluorescent C<sub>12</sub>-resorufin in metabolically active cells, to obtain a more direct estimate of the metabolic state of low and high accumulators of tachyplesin. We have added the following paragraph reporting our new data:

      “Our transcriptomics analysis also showed that low tachyplesin accumulators downregulated protein synthesis, energy production, and gene expression compared to high accumulators. To gain further insight on the metabolic state of low tachyplesin accumulators, we employed the membrane-permeable redox-sensitive dye, resazurin, which is reduced to the highly fluorescent resorufin in metabolically active cells. We first treated stationary phase E. coli with 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD for 60 min, then washed the cells, and then incubated them in 1 μM resazurin for 15 min and measured single-cell fluorescence of resorufin and tachyplesin-NBD simultaneously via flow cytometry. We found that low tachyplesin-NBD accumulators also displayed low fluorescence of resorufin, whereas high tachyplesin-NBD accumulators also displayed high fluorescence of resorufin (Figure S16), suggesting lower metabolic activity in low tachyplesin-NBD accumulators.”

      These amendments can be found on lines 398-408 and in Figure S16.

      (3) The observation that adding nutrients to the stationary phase cultures pushes most of the cells to the "high accumulator" state is presented as support of the hypothesis that the high accumulator state is a higher metabolism/higher translational activity state. However, it is important to note that adding nutrients will cause most or all of the cells in the population to start to grow, thus re-entering the familiar regime in which bacterial growth laws apply. This is evident in the slightly larger cell sizes seen in the nutrient-amended condition. In contrast to stationary phase cells, growing cells largely do not exhibit the bimodal distribution, and they are much more sensitive to tachyplesin, as demonstrated clearly in the supplement. Growing cells are not necessarily the same as the high-accumulating subpopulation of non-growing cells.

      Following the reviewer’s suggestion, we are no longer using the nutrient supplementation data to support the hypothesis that high accumulators possess higher metabolism or translational activity.

      The nutrient supplementation data is now only used to investigate whether tachyplesin-NBD accumulation and efficacy can be increased, and not to show that high tachyplesin-NBD accumulators are more metabolically or translationally active.

      Furthermore, our previous statement “Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enhanced survival to antibiotic treatment.” has now been removed from the discussion.

      (4) It might also be worth adding some additional context around the potential to employ efflux inhibitors as therapeutics. It is very clear that obtaining sufficient antimicrobial drug accumulation within Gram-negative bacteria is a substantial barrier to effective treatments, and large concerted efforts to find and develop therapeutic efflux pump inhibitors have been undertaken repeatedly over the last 25 years. Sufficiently selective inhibitors of bacterial efflux pumps with appropriate drug-like properties have been challenging to find and none have entered clinical trials. Multiple psychoactive drugs have been shown to impact efflux in bacteria but usually using concentrations in the 10-100 uM range (as here). Meanwhile, the Ki values for their human targets are usually in the sub- to low-nanomolar range. The authors rightly note that the concentration of sertraline they have used is higher than that achieved in patients, but this is by many orders of magnitude, and it might be worth expanding a bit on the substantial challenge of finding efflux inhibitors that would be specific and non-toxic enough to be used therapeutically. Many advances in structural biology, molecular dynamics, and medicinal chemistry may make the quest for therapeutic efflux inhibitors more fruitful than it has been in the past but it is likely to remain a substantial challenge.

      We agree with this comment and we have now added the following statement:

      “This limitation underscores the broader challenge of identifying EPIs that are both effective and minimally toxic within clinically achievable concentrations, while also meeting key therapeutic criteria such as broad-spectrum efficacy against diverse efflux pumps, high specificity for bacterial targets, and non-inducers of AMR [117]. However, advances in biochemical, computational, and structural methodologies hold the potential to guide rational drug design, making the search for effective EPIs more promising [118]. Therefore, more investigation should be carried out to further optimise the use of sertraline or other EPIs in combination with tachyplesin and other AMPs.”

      This amendment can be found on lines 535-542.

      (5) My second recommendation is that the transcriptomic data should be made available in full and in a format that is easier for other researchers to explore. The raw data should also be uploaded to a sequence repository, such as the NCBI Geo database or the EMBL ENA. The most useful format for sharing transcriptomic data is a table (such as an excel spreadsheet) of transcripts per million counts for each gene for each sample. This allows other researchers to do their own analyses and compare expression levels to observations from other datasets. When only fold change data are supplied, data cannot be compared to other datasets at all, because they are relative to levels in an untreated control which are not known. The cluster analysis is one way of gaining insight into biological function revealed by transcriptional profile, but it can hide interesting additional complexities. For example, rpoS is named as one of the transcription-associated genes that are higher in the high accumulator subpopulation and evidence of generally increased activity. But RpoS is the stress sigma factor that drives much lower levels of expression generally than the housekeeping sigma factor RpoD, even though it recognises many of the same promoters (and some additional stress-specific promoters). Therefore, increased RpoS occupancy of RNAP would be expected to result in overall lower levels of transcription. However, it is also true that the transcript level for the rpoS gene is a particularly poor indicator of expression - rpoS is largely post-transcriptionally regulated. More generally, annotations are always evolving and key functional insights related to each gene might change in the future, so the results are a more durable resource if they are presented in a less analysed form as well as showing the analysis steps. It can also be important to know which genes were robustly expressed but did not change, versus genes that were not detected.

      Sequencing data associated with this study have now been uploaded and linked under NCBI BioProject accession number PRJNA1096674 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1096674).

      We have added this link to the methods under subheading “Accession Numbers” on lines 858-860. Additionally, transcripts per million counts for each gene for each sample have been added to the Figure 3 - Source Data file as requested by the reviewer.

      (6) In the introduction, the susceptibility of AMP efficacy to resistance mechanisms is discussed:

      "However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance, with polymyxin-B being a notable exception 7, 8. Moreover, mobile resistance genes against AMPs are relatively rare, and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria9, again with plasmid-transmitted polymyxin resistance being a notable exception."

      It seems worth pointing out that polymixins are the only AMPs that can reasonably be compared with small molecule antibiotics in terms of resistance acquisition since they are the only AMPs that have been widely used as drugs and therefore had similar chances to select for resistance among diverse global microbial populations.

      We have now clarified that we are referring to laboratory evolutionary analyses of resistance towards small molecule antibiotics and AMPs (Spohn et al., 2019) and that polymyxins are the only AMPs that have been used in antibiotic treatment to date.

      We have added the following statement to address this point:

      “Bacteria have developed genetic resistance to AMPs, including proteolysis by proteases, modifications in membrane charge and fluidity to reduce affinity, and extrusion by AMP transporters. However, compared to small molecule antimicrobials, AMP resistance genes typically confer smaller increases in resistance in experimental evolution analyses, with polymyxin-B and CAP18 being notable exceptions [8]. Moreover, mobile resistance genes against AMPs are relatively rare and horizontal acquisition of AMP resistance is hindered by phylogenetic barriers owing to functional incompatibility with the new host bacteria [9]. Plasmid-transmitted polymyxin resistance constitutes a notable exception [10], possibly because polymyxins are the only AMPs that have been in clinical use to date [9].”

      This amendment can be found on lines 57-65.

      (7) In the description of Figure 4, " tachyplesin monotherapy" is mentioned. It is not really appropriate to describe the treatment of a planktonic culture of bacteria in a test tube as a therapy since there is no host that is benefitting.

      We have now replaced “tachyplesin monotherapy” with “tachyplesin treatment”.

      (8) In the discussion, it is stated that " tachyplesin accumulates intracellularly only in bacteria that do not survive tachyplesin exposure" but this is clearly not true. All bacteria accumulate tachyplesin intracellularly initially, but if the bacteria are non-growing during the exposure, some of them are able to reduce their intracellular levels. The fraction of survivors is roughly correlated with the fraction of bacteria that do not maintain high intracellular levels of tachyplesin and that do not stain with propidium iodide, but for any given cell it seems that there is no clear point at which a high intracellular level of tachyplesin means that it will definitely not survive.

      We have now clarified this statement as follows: “We show that after an initial homogeneous tachyplesin accumulation within a stationary phase E. coli population, tachyplesin is retained intracellularly by bacteria that do not survive tachyplesin exposure, whereas tachyplesin is retained only in the membrane of bacteria that survive tachyplesin exposure.”

      This amendment can be found on lines 443-446.

      (9) Also in the discussion: " Our data suggests that such slower-growing subpopulations might display lower antibiotic accumulation and thus enchanced [sic] survival to antibiotic treatment." This does not really relate to the results here because the bimodal distributions were primarily studied in the absence of growth. In the LB/exponential growth situations where the population was growing but a very small subpopulation of low accumulators was observed, no measurements were made to indicate subpopulation growth rates.

      We have now removed this statement from the manuscript.

      (10) In discussion, L-Ara4N appears to be referred to as both positively charged and negatively charged; this should be clarified.

      We have now clarified that L-Ara4N is positively charged.

      This amendment can be found on line 496.

      (11) Discussion of TF analysis seems to overstate what is supported by the evidence. The correlation of up- and downregulated genes with previously described TF regulons (probably measured in very different conditions) does not really demonstrate TF activity. This could be measured directly with additional experiments but in the absence of those experiments claims about detecting TF activity should probably be avoided. The attempts to directly demonstrate the importance of those transcription factors to the observed accumulation activity were not successful.

      We have now removed from the discussion the previous paragraph related to the TF analysis. We have also modified the results section reported the TF analysis as follows: “Next, we sought to infer transcription factor (TF) activities via differential expression of their known regulatory targets [61]. A total of 126 TFs were inferred to exhibit differential activity between low and high accumulators (Data Set S4). Among the top ten TFs displaying higher inferred activity in low accumulators compared to high accumulators, four regulate transport systems, i.e. Nac, EvgA, Cra, and NtrC (Figure S12). However, further experiments should be carried out to directly measure the activity of these TFs.”

      Finally, we have also moved the TFs’ data from Figure 3 to Figure S12 in the Supplementary information.

      These amendments can be found on lines 288-293.

      (12) When discussing the possibility of nutrient supplementation versus efflux inhibition as a potential therapeutic strategy, it could be noted that nutrient supplementation cannot be done in many infection contexts. The host immune system and host/bacterial cell density control nutrient access.

      We have now added the following statement: “Moreover, nutrient supplementation as a therapeutic strategy may not be viable in many infection contexts, as host density and the immune system often regulate access to nutrients [3]”.

      These amendments can be found on lines 553-555.

      Reviewer 2:

      (1) Some questions regarding the mechanism remain. One shortcoming of the setup of the transcriptomics experiment is that the tachyplesin-NBD probe itself has antibiotic efficacy and induces phenotypes (and eventually cell death) in the ´high accumulator´cells. This makes it challenging to interpret whether any differences seen between the two groups are causative for the observed accumulation pattern or if they are a consequence of differential accumulation and downstream phenotypic effects.

      We agree with the reviewer and we have now acknowledged that “tachyplesin-NBD has antibiotic efficacy (see Figure 2) and has an impact on the E. coli transcriptome (Figure 3). Therefore, we cannot conclude whether the transcriptomic differences reported between low and high accumulators of tachyplesin-NBD are causative for the distinct accumulation patterns or if they are a consequence of differential accumulation and downstream phenotypic effects.”

      These amendments can be found on lines 283-287.

      (2) It would be relevant to test and report the MIC of sertraline for the strain tested, particularly since in Figure 4G an initial reduction in CFUs is observed for sertraline treatment, which suggests the existence of biological effects in addition to efflux inhibition.

      We have now measured the MIC of sertraline against E. coli BW25113 finding the MIC value to be 128 μg mL<sup>-1</sup> (418 µM). This value is more than four times higher compared to the sertraline concentration employed in our study, i.e. 30 μg mL<sup>-1</sup> (98 μM).

      These amendments can be found on lines 389-391 and data has been added to Figure 4 – Source Data.

      (3) The role of efflux systems is further supported by the finding that efflux pump inhibitors sensitize E. coli to tachyplesin and prevent the occurrence of the tolerant ´low accumulator´ subpopulations. In principle, this is a great way of validating the role of efflux pumps, but the limited selectivity of these inhibitors (CCCP is an uncoupling agent, and for sertraline direct antimicrobial effects on E. coli have been reported by Bohnert et al.) leaves some ambiguity as to whether the synergistic effect is truly mediated via efflux pump inhibition. To strengthen the mechanistic angle of the work analysis of tachyplesin-NBD accumulation in mutants of the identified efflux components would be interesting.

      We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant (Figure 4B). Considering that the AcrAB-TolC tripartite RND efflux system is known to confer genetic resistance against AMPs like protamine and polymyxin-B [29,30] and that the quorum sensing regulators qseBC might control the expression of acrA [64] , these data further corroborate the hypothesis that low accumulators can efflux tachyplesin and survive treatment with this AMP.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14.

      Moreover, we have also carried out further efflux assays with both ethidium bromide and tachyplesin-NBD to further demonstrate the role of efflux in reduced accumulation of tachyplesin as well as acknowledging that other mechanisms (i.e reduced influx, increased protease activity or increased secretion of OMVs) could play an important role, please see Response 1 to Reviewer 1.

      (4) The authors imply that protease could contribute to the low accumulator mechanism. Proteases could certainly cleave and thus inactivate AMPs/tachyplesin, but would this effect really lead to a reduction in fluorescence levels since the fluorophore itself would not be affected by proteolytic cleavage?

      We agree with the reviewer that nitrobenzoxadiazole (NBD) might not be cleaved by proteases that inactivate tachyplesin and other AMPs. Therefore, inactivation of tachyplesin by proteases might not affect cellular fluorescence levels unless efflux of NBD is possible following the cleavage of tachyplesin-NBD. We have therefore removed the statement “Conversely, should efflux or proteolytic activities by proteases underpin the functioning of low accumulators, we should observe high initial tachyplesin-NBD fluorescence in the intracellular space of low accumulators followed by a decrease in fluorescence due to efflux or proteolytic degradation.” We have now stated the following: “Low accumulators displayed an upregulation of peptidases and proteases compared to high accumulators, suggesting a potential mechanism for degrading tachyplesin (Table S1 and Data Set S3).”

      These amendments can be found on lines 280-282.

      (5) To facilitate comparison with other literature (e.g. papers on sertraline) it would be helpful to state compound concentrations also as molar concentrations.

      We have now added the molar concentrations alongside all instances where concentrations are stated in μg mL<sup>-1</sup>.

      (6) The authors tested a series of efflux pump inhibitors and found that CCCP and sertraline prevented the generation of the low accumulator subpopulation, whereas other inhibitors did not. An overview and discussion of the known molecular targets and mode of action of the different selected inhibitors could reveal additional insights into the molecular mechanism underlying the synergy with tachyplesin.

      We have now added molecular targets and mode of action of the different inhibitors where known. “Moreover, we repeated tachyplesin-NBD efflux assays in the presence of M9 containing 50 μg mL<sup>-1</sup> (244 μM) carbonyl cyanide m-chlorophenyl hydrazone (CCCP), an ionophore that disrupts the proton motive force (PMF) and is commonly employed to abolish efflux and found that all cells retained tachyplesin-NBD fluorescence (Figure S15B). However, it is important to note that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes [63].” And “Interestingly, M9 containing 30 µg mL<sup>-1</sup> (98 μM) sertraline (Figure 4D and S15C), an antidepressant which inhibits efflux activity of RND pumps, potentially through direct binding to efflux pumps [65] and decreasing the PMF [66], or 50 µg mL<sup>-1</sup> (110 μM) verapamil (Figure S15D), a calcium channel blocker that inhibits MATE transporters [67] by a generally accepted mechanism of PMF generation interference [68,69], was able to prevent the emergence of low accumulators. Furthermore, tachyplesin-NBD cotreatment with sertraline simultaneously increased tachyplesin-NBD accumulation and PI fluorescence levels in individual cells (Figure 4E and F, p-value < 0.0001 and 0.05, respectively). The use of berberine, a natural isoquinoline alkaloid that inhibits MFS transporters [70] and RND pumps [71], potentially by inhibiting conformational changes required for efflux activity [70], and baicalein, a natural flavonoid compound that inhibits ABC [72] and MFS [73,74] transporters, potentially through PMF dissipation [75], prevented the formation of a bimodal distribution of tachyplesin accumulation, however displayed reduction in fluorescence of the whole population (Figure S15E and F). Phenylalanine-arginine beta-naphthylamide (PAbN), a synthetic peptidomimetic compound that inhibits RND pumps [76] through competitive inhibition [77], reserpine, an indole alkaloid that inhibits ABC and MFS transporters, and RND pumps [78], by altering the generation of the PMF [69], and 1-(1-naphthylmethyl)piperazine (NMP), a synthetic piperazine derivative that inhibits RND pumps [79], through non-competitive inhibition [80], did not prevent the emergence of low accumulators (Figure S15G-I).”

      These amendments can be found on lines 337-342 and 367-385.

      (7) Page 8. The term ´medium accumulators´ for a 1:1 mix of low and high accumulators is misleading.

      We have now replaced the term “medium accumulators” with “a 1:1 (v/v) mixture of low and high accumulators”.

      These amendments to the description can be found on lines 238-239.

      (8) Figure 3. It may be more appropriate to rephrase the title of the figure to ´biological processes associated with low tachyplesin accumulation´ (rather than ´facilitate accumulation´). The same applies to the section title on page 8.

      We have amended the title of Figure 3 as requested by the reviewer.

      (9) The fact that the low accumulation phenotype depends on the growth media and conditions and can be prevented by nutrients is highly relevant. I would encourage the authors to consider showing the corresponding data in the main manuscript rather than in the SI.

      We have created a new Figure 5, displaying the impact of the nutritional environment and bacterial growth phase on both tachyplesin-NBD accumulation and efficacy.

      (10) In the discussion the authors state´ Heterogeneous expression of efflux pumps within isogenic bacterial populations has been reported 29,32,33,67-69. However, recent reports have suggested that efflux is not the primary mechanism of antimicrobial resistance within stationary-phase bacteria 31,70.´. In light of the authors´ findings that the response to tachyplesin is induced by exposure and is not pre-selected, could they speculate on why this specific response can be induced in stationary, but not exponential cells? Could there be a combination of pre-existing traits and induced responses at play? Could e.g. the reduced growth rate/metabolism in these cells render these cells less susceptible to the intracellular effects of tachyplesin and slow down the antibiotic efficacy, giving the cells enough time to mount additional protective responses that then lead to the low accumulation phenotype?

      We have now acknowledged that it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.

      “As our accumulation assay did not require the prior selection for phenotypic variants, we have demonstrated that low accumulators emerge subsequent to the initial high accumulation of tachyplesin-NBD, suggesting enhanced efflux as an induced response. However, it is conceivable that other pre-existing traits of low accumulators also contribute to reduced tachyplesin accumulation. For example, reduced protein synthesis, energy production, and gene expression in low accumulators could slow down tachyplesin efficacy, giving low accumulators more time to mount efflux as an additional protective response.”

      This amendment can be found on lines 482-489.

      (11) In the abstract: Is it true that low accumulators ´sequester´ the drug in their membrane? In my understanding ´sequestering´ would imply that low accumulators would bind higher levels of tachyplesin-NBD in their membrane compared to high accumulators (and thereby preventing it from entering the cells). According to Figure 1 J, K, it rather seems that the fluorescent signal around the membrane is also stronger in high accumulators.

      We have now removed the sentence “low accumulators sequester the drug in their membrane” from the abstract. We have instead stated: “These phenotypic variants display enhanced efflux activity to limit intracellular peptide accumulation.”

      These amendments can be found on lines 34-35.

      Reviewer 3:

      (1) The authors' claims about high efflux being the main mechanism of survival are unconvincing, given the current data. There can be several alternative hypotheses that could explain their results, such as lower binding of the AMP, lower rate of internalization, metabolic inactivity, etc. It is unclear how efflux can be important for survival against a peptide that the authors claim binds externally to the cell. The addition of efflux assays would be beneficial for clear interpretations. Given the current data, the authors' claims about efflux being the major mechanism in this resistance are unconvincing (in my humble opinion). Some direct evidence is necessary to confirm the involvement of efflux. The data with CCCP in Figure 4C can only indicate accumulation, not efflux. The authors are encouraged to perform direct efflux assays using known methods (e.g., PMIDs 20606071, 30981730, etc.). Figure 4A: The data does not support the broad claims about efflux. First, if the peptide is accumulated on the outside of the outer membrane, how will efflux help in survival? The dynamics shown in 4A may be due to lower binding, lower entry, or lower efflux. These mechanisms are not dissected here. Second, the heterogeneity can be preexisting or a result of the response to this stress. Either way, whether active efflux or dynamic transcriptomic changes are responsible for these patterns is not clear. Direct efflux assays are crucial to conclude that efflux is a major factor here.

      This important comment is similar in scope to the first comment of reviewer 1 and it is partly due to the fact that we had not clearly explained our efflux assays reported in Figure 4 in the original manuscript. We kindly refer this reviewer to our extensive response 1 to reviewer 1 and corresponding amendments on lines 316-350 and in the new Figure S13 and Figure 4 (reported in the response 1 to reviewer 1 above), where we have now fully addressed this reviewer’s and reviewer 1 concerns, as well as performing new experiments following their important suggestions and the methods described in PMIDs 20606071 suggested by this reviewer.

      (2) The fluorescent imaging experiments can be conducted in the presence of externally added proteases, such as proteinase K, which has multiple cleavage sites on tachyplesin. This would ensure that all the external peptides (both free and bound) are removed. If the signal is still present, it can be concluded that the peptide is present internally. If the peptide is primarily external, the authors need to explain how efflux could help with externally bound peptides. Figure 1J-K: How are the authors sure about the location of the intensity? The peptide can be inside or outside and still give the same signal. To prove that the peptide is inside or outside, a proteolytic cleavage experiment is necessary (proteinase K, Arg-C proteinase, clostripain, etc.).

      We thank the reviewer for this important suggestion.

      We have now performed experiments where stationary phase E. coli was incubated in 46 μg mL<sup>-1</sup> (18.2 μM) tachyplesin-NBD in M9 for 60 min. Next, cells were pelleted and washed to remove extracellular tachyplesin-NBD and then incubated in either M9 or 20 μg mL<sup>-1</sup> (0.7 μΜ) proteinase K in M9 for 120 min. We found that the fluorescence of low accumulators decreased over time in the presence of proteinase K; in contrast, the fluorescence of high accumulators did not decrease over time in the presence of proteinase K. These data therefore suggest that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      Moreover, confocal microscopy using tachyplesin-NBD along with the membrane dye FM™ 4-64FX further confirmed that tachyplesin-NBD is present only on the cell membrane of low accumulators and both on the membrane and intracellularly in high accumulators.

      These amendments can be found on lines 173-179, lines 188-192 and in the new Figures S4 and S6.

      (3) Further genetic experiments are necessary to test whether efflux genes are involved at all. The genetic data presented by the authors in Figure S11 is crucial and should be further extended. The problem with fitting this data to the current hypothesis is as follows: If specific efflux pumps are involved in the resistance mechanism, then single deletions would cause some changes to the resistance phenotype, and the data in Figure S11 would look different. If there is redundancy (as is the case in many efflux phenotypes), the authors may consider performing double deletions on the major RND regulators (for example, evgA and marA). Additionally, the deletion of pump components such as TolC (one of the few OM components) and adaptors (such as acrA/D) might also provide insights. If the peptide is present in the periplasm, then deletions involving outer components would become important.

      This important comment is similar in scope to the third comment of reviewer 2. We have now performed tachyplesin-NBD accumulation assays using 28 single and 4 double E. coli BW25113 gene-deletion mutants of efflux components and transcription factors regulating efflux. While for the majority of the mutants we recorded bimodal distributions of tachyplesin-NBD accumulation similar to the distribution recorded for the E. coli BW25113 parental strain (Figure 4B and Figure S13), we found unimodal distributions of tachyplesin-NBD accumulation constituted only of high accumulators for both DqseB and DqseBDqseC mutants as well as reduced numbers of low accumulators for the DacrADtolC mutant.

      These amendments can be found on lines 351-361, in the new Figure 4B and in the new Figure S14, please also see our response to comment 3 of reviewer 2.

      (4) Line numbers would have been really helpful. Please mention the size of the peptide (length and spatial) for readers.

      We have now added line numbers to the revised manuscript. The length and molecular weight of tachyplesin-1 have now been added on lines 75.

      (5) Figure S4 is unclear. How were the low accumulators collected? What prompted the low-temperature experiment? The conclusion that it accumulates at the outer membrane is unjustified. Where is the data for high accumulators?

      We have now corrected the results section to state that tachyplesin-NBD accumulates on the cell membranes, rather than at the outer membrane of E. coli cells.

      These amendments can be found on lines 178 and 190.

      We would like to clarify that in Figure S4 we compare the distribution of tachyplesin-NBD single-cell fluorescence at low temperature versus 37 °C across the whole stationary phase E. coli population, we did not collect low accumulators only.

      The low-temperature experiment was prompted by a previous publication paper (Zhou Y et al. 2015: doi: 10.1021/ac504880r. Epub 2015 Mar 24. PMID: 25753586) that showed non-specific adherence of antimicrobials to the bacterial surface occurs at low temperatures and that passive and active transport of antimicrobials across the membrane is significantly diminished. Additionally, there are previous reports that suggest low temperatures inhibit post-binding peptide-lipid interactions, but not the primary binding step (PMID: 16569868; PMCID: PMC1426969; PMID: 3891625; PMCID: PMC262080).

      Therefore, the low-temperature experiment was performed to quantify the fluorescence of cells due to non-specific binding. This quantification allowed us to deduce that fluorescence levels of high accumulators are above the measured non-specific binding fluorescence (measured in the low-temperature experiment for the whole stationary phase E. coli population) is the result of intracellular tachyplesin-NBD accumulation. In contrast, the comparable fluorescence levels between all the cells in the low-temperature experiment and the low accumulator subpopulation at 37 °C suggest that tachyplesin-NBD is predominantly accumulated on the cell membranes of low accumulators instead of intracellularly.

      Please also see our response to comment 2 above for further evidence supporting that tachyplesin-NBD accumulates only on the cell membranes of low accumulators and both on the cell membranes and intracellularly in low accumulators.

      (6) Figure S5: Describe the microfluidic setup briefly. Why did the distribution pattern change (compared to Figure 1A)? Now, there are more high accumulators. Does the peptide get equally distributed between daughter cells?

      We have now added a brief description of the microfluidic setup on lines 182-184.

      The difference in the abundance of low and high accumulators between the microfluidics and flow cytometry measurements is likely due to differences in cell density, i.e. a few cells per channel vs millions of cells in a tube. A second major difference is that tachyplesin-NBD is continuously supplied in the microfluidic device for the entire duration of the experiment, therefore, the extracellular concentration of tachyplesin-NBD does not decrease over time. In contrast, tachyplesin-NBD is added to the tube only at the beginning of the experiment, therefore, the extracellular concentration of tachyplesin-NBD likely decreases in time as it is accumulated by the bacteria. The relative abundance of low and high accumulators changes with the extracellular concentration of tachyplesin-NBD as shown in Figure 1A.

      We have added a sentence to acknowledge this discrepancy on lines 186-187.

      No instances of cell division were observed in stationary phase E. coli in the absence of nutrients in all microfluidics assays. Therefore, we cannot comment on the distribution of tachyplesin-NBD across daughter cells.

      (7) How did the authors conclude this: "tachyplesin accumulation on the bacterial membrane may not be sufficient for bacterial eradication"? It is completely unclear to this reviewer.

      We presented this hypothesis at the end of the section “Tachyplesin accumulates primarily in the membranes of low accumulators” as a link to the following section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication” where we test this hypothesis. For clarity, we have now moved this sentence to the beginning of the section “Tachyplesin accumulation on the bacterial membranes is insufficient for bacterial eradication”.

      (8) What is meant by membrane accumulation? Outside, inside, periplasm? Where? Figure 2H conclusions are unjustified. Bacterial killing with many antibiotics is associated with membrane damage, which is an aftereffect of direct antibiotic action. How can the authors state that "low accumulators primarily accumulate tachyplesin-NBD on the bacterial membrane, maintaining an intact membrane, strongly contributing to the survival of the bacterial population"? This reviewer could not find justifications for the claims about the location of the accumulation or cells actively maintaining an intact membrane. Also, PI staining reports damage both membranes.

      Based on the experiments that we have carried out after this reviewer’s suggestions, please see response 2 above, it is likely that tachyplesin-NBD is present only on the bacterial surface, i.e. in or on the outer membrane of low accumulators, considering that their fluorescence decreases during treatment with proteinase K. However, to take a more conservative approach we have now written on the cell membranes throughout the manuscript, i.e. either the outer or the inner membrane.

      We have also rephrased the statement reported by the reviewer as follows:

      “Taken together with PI staining data indicating membrane damage caused by high tachyplesin accumulation, these data demonstrate that low accumulators, which primarily accumulate tachyplesin-NBD on the bacterial membranes, maintain membrane integrity and strongly contribute to the survival of the bacterial population in response to tachyplesin treatment.”

      These amendments can be found on lines 228-232.

      (9) Figure 3: The findings about cluster 2 and cluster 4 genes do not correlate logically. If the cells are in a metabolically low active state, how are the cells getting enough energy for active efflux and membrane transport? This scenario is possible, but the authors must confirm the metabolic activity by measuring respiration rates. Also, metabolically less-active cells may import a lower number of peptides to begin with. That also may contribute to cell survival. Additionally, lowered metabolism is a known strategy of antibiotic survival that is distinctly different from efflux-mediated survival.

      Following this reviewer’s comment and comment 2 of reviewer 1, we have now carried out further experiments to estimate the metabolic activity of low and high accumulators. Please see our response to comment 2 of reviewer 1 above.

      (10) Figure S10: How did the authors test their hypothesis that cardiolipin is involved in the binding of the peptide to the membrane? The transcriptome data does not confirm it. Genetic experiments are necessary to confirm this claim.

      We would like to clarify that we have not set out to test the hypothesis that cardiolipin is involved in the binding of tachyplesin-NBD. We have only stated that cardiolipin could bind tachyplesin due to its negative charge. We have now cited two previous studies that suggest that tachyplesin has an increased affinity for lipids mixtures containing either cardiolipin (Edwards et al. ACS Inf Dis 2017) or PG lipids (Matsuzaki et al. BBA 1991), i.e. the main constituents of cardiolipins.

      These amendments can be found on lines 264-267.

      (11) Figure 4B-F: There are several controls missing. For Sertraline treatment, the authors must test that the metabolic profile, transcriptomic changes, or import of the peptide are not responsible for enhanced survival. CCCP will not only abolish efflux but also many other respiration-associated or all other energy-driven processes.

      Figure 4D presents data acquired in efflux assays in the absence of extracellular tachyplesin-NBD. Therefore, altered tachyplesin-NBD import cannot contribute to the lack of formation of the low accumulator subpopulation.

      We have now acknowledged that it is conceivable that increased tachyplesin efficacy is due to metabolic and transcriptomic changes induced by sertraline.

      These amendments can be found on lines 396-397.

      We have also acknowledged that CCCP does not only abolish efflux but also other respiration-associated and energy-driven processes.

      These amendments can be found on lines 341-342.