10,000 Matching Annotations
  1. Jan 2025
    1. Reviewer #1 (Public review):

      This study investigates alterations in the autophagic-lysosomal pathway in the Q175 HD knock-in model crossed with the TRGL autophagy reporter mouse. The findings provide valuable insights into autophagy dynamics in HD and the potential therapeutic benefits of modulating this pathway. The study suggests that autophagy stimulation may offer therapeutic benefits in the early stages of HD progression, with mTOR inhibition showing promise in ameliorating lysosomal pathology and reducing mutant huntingtin accumulation.

      However, the data raises concerns regarding the strength of the evidence. The observed changes in autophagic markers, such as autolysosome and lysosome numbers, are relatively modest, and the Western blot results do not fully match the quantitative results. These discrepancies highlight the need for further validation and more pronounced effects to strengthen the conclusions. While the study suggests the potential of autophagy regulation as a long-term therapeutic strategy, additional experiments and more reliable data are necessary to confirm the broader applicability of the TRGL/Q175 mouse model.

      Furthermore, the 2004 publication by Ravikumar et al. demonstrated that inhibition of mTOR by rapamycin or the rapamycin ester CCI-779 induces autophagy and reduces the toxicity of polyglutamine expansions in fly and mouse models of Huntington's disease. mTOR is a key regulator of autophagy, and its inhibition has been explored as a therapeutic strategy for various neurodegenerative diseases, including HD. Studies suggest that inhibiting mTOR enhances autophagy, leading to the clearance of mHTT aggregates. Given that dysfunction of the autophagic-lysosomal pathway and lysosomal function in HD is already well-established, and that mTOR inhibition as a therapeutic approach for HD is also known, this study does not present entirely novel findings.

      Major Concerns:

      (1) In Figure 3A1 and A2, delayed and/or deficient acidification of AL causes deficits in the reformation of LY to replenish the LY pool. However, in Figure S2D, there is no difference in AL formation or substrate degradation, as shown by the Western blotting results for CTSD and CTSB. How can these discrepancies be explained?

      (2) The results demonstrate that in the brain sections of 17-month-old TRGL/Q175 mice, there was an increase in the number of acidic autolysosomes (AL), including poorly acidified autolysosomes (pa-AL), alongside a decrease in lysosome (LY) numbers. These AL/pa-AL changes were not significant in 2-month-old or 7-month-old TRGL/Q175 mice, where only a reduction in lysosome numbers was observed. This indicates that these changes, representing damage to the autophagy-lysosome pathway (ALP), manifest only at later stages of the disease. Considering that the ALP is affected predominantly in the advanced stages of the disease (e.g., at 17 months), why were 6-month-old TRGL/Q175 mice selected for oral mTORi INK treatment, and why was the treatment duration restricted to just 3 weeks?

      (3) Is the extent of motor dysfunction in TRGL/Q175 mice comparable to that in Q175 mice? Does the administration of mTORi INK improve these symptoms?

      (4) Why is eGFP expression not visible in Fig. 6A in TRGL-Veh mice? Additionally, why do normal (non-poly-Q) mice have fewer lysosomes (LY) than TRGL/Q175-INK mice? IHC results also show that CTSD levels are lower in TRGL mice compared to TRGL/Q175-INK mice. Does this suggest lysosome dysfunction in TRGL-Veh mice?

      (5) In Figure 5A, the phosphorylation of ATG14 (S29) shows minimal differences in Western blotting, which appears inconsistent with the quantitative results. A similar issue is observed in the quantification of Endo-LC3.

      (6) In Figure S2A and Figure S2B, 17-month-old TRGL/Q175 mice show a decrease in p-p70S6K and the p-ULK1/ULK1 ratio, but no changes are observed in autophagy-related markers. Do these results indicate only a slight change in autophagy at this stage in TRGL/Q175 mice? Since the mTOR pathway regulates multiple cellular mechanisms, could mTOR also influence other processes? Is it possible that additional mechanisms are involved?

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have explored the beneficial effect of autophagy upregulation in the context of HD pathology in a disease stage-specific manner. The authors have observed functional autophagy lysosomal pathway (ALP) and its machineries at the early stage in the HD mouse model, whereas impairment of ALP has been documented at the later stages of the disease progression. Eventually, the authors took advantage of the operational ALP pathway at the early stage of HD pathology, in order to upregulate ALP and autophagy flux by inhibiting mTORC1 in vivo, which ultimately reverted back to multiple ALP-related abnormalities and phenotypes. Therefore, this manuscript is a promising effort to shed light on the therapeutic interventions with which HD pathology can be treated at the patient level in the future.

      Strengths:

      The study has shown the alteration of ALP in the HD mouse model in a very detailed manner. Such stage-dependent in vivo study will be informative and has not been done before. Also, this research provides possible therapeutic interventions for patients in the future.

      Weaknesses:

      Some constructive comments and suggestions in order to reflect the key aspects and concepts better in the manuscript :

      (1) The authors have observed lysosome number alteration in a temporally regulated disease stage-specific manner. In this scenario investigation of regulation, localization, and level of TFEB, the transcription factor required for lysosome biogenesis, would be interesting and informative.

      (2) For the general scientific community better clarification of the short forms will be useful. For example, in line 97, page 4, AP full form would be useful. Also 'metabolized via autophagy' can be replaced by 'degraded via autophagy'.

      (3) The nuclear vs cytosolic localization of HTT aggregates shown in Figure 2, are very interesting. The increase in cytosolic HTT aggregate formation at 10 months compared to 6 months probably suggests spatio-temporal regulation of aggregate formation. The authors could comment in a more elaborate manner, on the reason and impact of this kind of regulation of aggregate formation in the context of HD pathology.

      (4) In this manuscript, the authors have convincingly shown that mTOR inhibition is inducing autophagy in the HD mouse model in vivo. On the other hand, mTOR inhibition would also reduce overall cellular protein translation. This aspect of mTOR inhibition can also potentially contribute to the alleviation of disease phenotype and disease symptoms by reducing protein overload in HD pathology. The authors' comments regarding this aspect would be appreciated.

      (5) The authors have shown nuclear inclusion formation and aggregation of mHTT and also commented on its potential removal with the UPS system (proteasomal degradation) in vivo. As there is also a reciprocal relationship present between autophagy and proteasomal machineries, upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease. How nuclear proteasomal activity increases to tackle nuclear mHTT IBs, would be interesting to understand in the context of HD pathology. Comments from the authors in this aspect would clarify the role of multiple degradation pathways in handling mutant HTT protein in HD pathology.

      (6) For the treatment of neurodegenerative disorders taking the temporal regulation into consideration is extremely important, as that will determine the success rate of the treatments in patients. The authors in this manuscript have clearly discussed this scenario. However, for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario. In that case, it will be complicated to initiate an early treatment regime in HD patients. If the authors can comment on and discuss the practicality of the early treatment regime for therapeutic purposes that would be impactful.

    1. eLife Assessment

      This valuable study on Pseudomonas subverting host immunity identifies a new immune evasion strategy. There is solid evidence for the cleavage of VgrG2B by Caspase 11 and the generation of fragments that inhibit activity of the NLRP3 inflammasome. This work should be of interest to immunologists and microbiologists.

    2. Reviewer #2 (Public review):

      Summary:

      In their manuscript, Quian and colleagues identified a novel mechanisms by which Pseudomonas control inflammatory responses upon inflammasome activation. They identified a caspase-11 substrates (VgrG2b) which, upon cleavage, binds and inhibit the NLRP3 to reduce the production of pro-inflammatory cytokines. This is a unique mechanism that allow for the tailoring of the innate immune response upon bacterial recognition.

      Strengths:

      The authors are presenting here a novel conceptual framework in host-pathogen interactions. Their work is supported by a range of approaches (biochemical, cellular immunology, microbiology, animal models) and their conclusions are supported by multiple independent evidences. The work is likely to have an important impact in the innate immunity field and host-pathogen interactions field and may guide the development of novel inhibitors.

      Weaknesses:

      Although quite exhaustive, a few of the authors conclusions are not fully supported (e.g, caspase-11 directly cleaving VgrG2b, the unique affinity of VgrG2b-C for NLRP3) and would require complementary approaches to validate their findings fully. This is minimal.

      Comments on revisions:

      I command the authors's effort to address my comments. They have addressed all my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the manuscript entitled "A VgrG2b fragment cleaved by caspase-11/4 promotes Pseudomonas aeruginosa infection through suppressing the NLRP3 inflammasome", Qian et al. found an activation of the non-canonical inflammasome, but not the downstream NLRP3 inflammasome, during the infection of macrophage by P. aeruginosa, which is in sharp contrast to that by E. coli (Figure 1). In realizing that the suppression of the NLRP3 inflammasome is Caspase-11 dependent, the authors performed a screening among P. aeruginosa proteins and identified VgrG2b being a major substrate of Caspase-11 (Figure 2). Next, the authors mapped the cleavage site on VgrG2b to D883, and demonstrated that cleavage of VgrG2b by Caspase-11 is essential for the suppression of the NLRP3 inflammasome (Figure 3). Furthermore, they found that a binding between the C-terminal fragment of the cleaved VgrG2b and NLRP3 existed (Figure 4), which was then proved to block the association of NLRP3 with NEK7 (Figure 5). Finally, the authors demonstrated that blocking of VgrG2b cleavage, by either mutation of the D883 or administration of a designed peptide, effectively improved the survival rate of the P. aeruginosa-infected mice (Figure 6). This is a well-designed and executed study, with the results clearly presented and stated.

      We are deeply grateful for your recognition and positive comments on our article. Thank you for your effort and dedication in reviewing our manuscript. We are honored to have the opportunity to receive feedback form professional reviewers like you.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript, Quian and colleagues identified a novel mechanism by which Pseudomonas control inflammatory responses upon inflammasome activation. They identified a caspase-11 substrate (VgrG2b) which, upon cleavage, binds and inhibits the NLRP3 to reduce the production of pro-inflammatory cytokines. This is a unique mechanism that allows for the tailoring of the innate immune response upon bacterial recognition.

      Strengths:

      The authors are presenting here a novel conceptual framework in host-pathogen interactions. Their work is supported by a range of approaches (biochemical, cellular immunology, microbiology, animal models), and their conclusions are supported by multiple independent evidences. The work is likely to have an important impact on the innate immunity field and host-pathogen interactions field and may guide the development of novel inhibitors.

      Weaknesses:

      Although quite exhaustive, a few of the authors' conclusions are not fully supported (e.g., caspase-11 directly cleaving VgrG2b, the unique affinity of VgrG2b-C for NLRP3) and would require complementary approaches to validate their findings fully. This is minimal.

      We sincerely appreciate your professional review and kind appraisal on our article. These comments are really valuable and helpful for improving our manuscript. According to your suggestions, we have made some modifications and added some supplemental data to make our results more convincing. The detailed responses are listed point-by-point below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I really enjoyed reading your manuscript and believe this is an important conceptual advance for the innate immunity field. Your conclusions are in general well-supported, you used a range of methodologies and the quality of the presentation of the results is excellent. I have a few comments here that I hope will contribute to improving an already great piece of work:

      Elements to be improved:

      Line 109-110: the author claims that the release of mito DNA is required for NLRP3 activation. ' I would support this with a reference. I believe this may not be fully agreed on in the field. Cleavage of GSDMD by caspase4/11 is required, however. A few groups showed the required for K+ efflux in this context (Broz, Brough, Schroder labs).

      It is a very good suggestion. Indeed, there is still controversy over this issue, and we have revised our text to make our manuscript more neutral. We have also cited these important references to help readers understand where the controversy lies.

      I disagree that OMV _+ Pseudomonas is a natural way to simulate natural infection. I would argue it is even quite artificial. Pseudomonas alone should be sufficient to generate OMV without the addition of extra OMVs.

      This is a good point. Before we infected BMDM cells with PAO1 stains, we had washed with PBS for at least three times to exclude the interference of contents in the LB medium. Moreover, in our experimental system, the time for co-incubation between bacteria and host cells is very limited. During this time, the amount of OMV secreted by bacteria may not reach the level of activating inflammasomes, and this concentration is also relatively low compared to the OMV concentration secreted by bacteria under physiological conditions. Therefore, we added extra OMVs to simulate the chronic infection condition in a short time.

      The co-expression of caspase with VrG2b and assume the cleavage is direct. However, the work is lacking work with recombinant proteases (commercially available), which would strengthen their conclusions regarding the ability of caspase-4/11 to directly cleave the protein. Based on the recognised sequence (DXXD), I believe caspase-4/11 is not directly responsible for this. These caspases were shown to cleave caspase-3/7, which can cleave such sequence (DXXX). As caspase-4 can cleave caspase-3/7 in their lysates, I would recommend testing this hypothesis to further strengthen the authors' conclusions.

      These are very good points. As data shown on Fig. 3F, we used recombinant VgrG2b and caspase-11 p22/p10 to prove the direct cleavage of caspase-11. To exclude the effect of caspase-3/7, we treated cells with inhibitors of caspase-3/7 and found that caspase-3/7 are not the executor for VgrG2b cleavage (new Fig. S3E, F).

      The affinity between caspase-11 and VgrG2b-C is puzzling as one would normally expect the caspase and its substrates to quickly dissociate. Does VgrG2b-C impact the activity of caspase-4/11 upon cleavage? Can VrgG2b-C also interact with p20/p10 caspase-1? I believe the authors only tried the full-length version of caspase-1 in supplemental.

      These are very good questions. We agree enzymes and substrates only have temporary interactions normally, which are not easy to catch. However, we used mutant caspase-11(C254A) inhibiting its cleavage of substrates, so that the combination of VgrG2b or VgrG2b-C with caspase-11(C254A) could be detected. This mutation is frequently used in immunoprecipitation (Wang K, Cell, 2020). We had tested the impact of VgrG2b-C on the enzyme activity of caspase-4/11, and showed that VgrG2b-C did not affect the cleavage of GSDMD by caspase-11 (Fig. 5C). We also tried the caspase-1 p20/p10, also found that they had no interaction with VgrG2b-C (new Fig. S4G).

      Can more details be provided about the generation of recombinant caspase-11, VgrG2b-C, and other recombinant proteins tested?

      Thanks for your suggestion, we have revised our description in the new version.

      The authors assumed that VgrG2C-b does not impact other inflammasome (such as NLRC4) based on their X-gal assay. I would also confirm this with a functional assay (e.g., transfection of flagellin in macrophages).

      This is a good suggestion. We have tested the impact of VgrG2b-C on NLRC4 inflammasome and found that VgrG2b-C does not affect NLRC4 activation with the transfection of flagellin (new Fig. S5K).

      Often, representative experiments are shown. For Elisa, cell death assays and quantitative experiments, pooling the data would be appropriate. Appropriate statistical analysis should be conducted based on this as well.

      Thanks for your suggestions. In the revised manuscript, we pooled the data of three independent experiments for our analysis of ELISA and cell death assays. We also added descriptions of statistical analysis in our revised text.

      VgrG2b has been suggested to be a metalloprotease (PMID: 31577948). Is its protease activity required for the phenomenon observed?

      This is a very good question. The active region of metalloprotease VgrG2b-C is aa932-941, especially the core sequence of HEXXH. Structure data also confirms that H935, E936, H939, E983 play key roles in the coordination with Zn ions (Sana TG, mBio, 2015; Wood TE, Cell reports, 2019). In our study, the cleavage of VgrG2b by caspase-4/11 depends on the recognition of tetrapeptide sequence in aa880-883. We added data showing that the cleavage of VgrG2b and the inhibition of NLRP3 inflammasome were not affected by VgrG2b enzymatic activity (new Fig. S4I-K).

      What is the affinity of VgrG2b-C for NLRP3? Is it higher than NEK7? A quantitative experiment would be required to claim this.

      This is a great point of view. We added the quantitative data certifying that VgrG2b-C has higher affinity with NLRP3 compared with NEK7 in the revised manuscript (326 nM VS 681 nM).

      The Material and Method section is a bit light and would benefit from adding more information (e.g. cell density, microscopy details, number of cells imaged, etc).

      Thanks for your suggestion. We have added more details in the Material and Method section in revised manuscript.

    1. eLife Assessment

      This study provides valuable insights into the lesser-known effects of the sodium-potassium pump on how nerve cells process signals, particularly in highly active cells like those of weakly electric fish. The authors use a detailed mathematical model to show how the pump can shift a cell's normal firing patterns and disrupt the coordination of signals when inputs change quickly. The computational methods used to establish the claims in this work are solid and can be used as a starting point for further studies, yet the conclusions would be strengthened with experimental evidence or testable predictions regarding some of the proposed mechanisms across different cell types.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na+/K+-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na+/K+-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na+/K+-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.<br /> (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.<br /> (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.<br /> (2) The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

    3. Reviewer #2 (Public review):

      Summary:

      The paper 'The electrogenicity of the Na+/K+-ATPase poses challenges for computation in highly active spiking cells' by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes-specialized highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells for each spike. This ion imbalance must be restored after each spike, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular volume. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. This does not pose an issue in most cells since the firing rate is much slower, and other compensatory mechanisms and other pumps can effectively restore the ion imbalances. In electrocytes of weakly electric fish, however, that operate under very different circumstances, the firing rate is exceptionally high. On top of this, these cells are also involved in critical communication and survival behaviors, emphasizing their reliable functioning.

      In a computation model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Additionally, their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implication of this cell in the context of chirps - a means of communication between individual fishes. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors show that it is necessary to include the extracellular potassium buffer to have a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte followed by a decay to the baseline. For reliable occurrence of this, they emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is warranted. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of Na and K currents to include the dynamics of the NaK pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for exploring and testing in in vivo experiments which of these proposed solutions the fish use and their relative importance.

      Weaknesses:

      The modeling work makes assumptions and simplifications that should be listed explicitly. For example, it assumes only potassium ions constitute the leak current, which may not be true as other ions (chloride and calcium) may also cross the cell membrane. This implies<br /> that the leak channels' reversal potential may differ from that of potassium. Additionally, the spikes are composed of sodium and potassium currents only and no other ion type (no calcium). Further, these ion channels are static and do not undergo any post-translational modifications. For instance, a sodium-dependent potassium pump could fine-tune the potassium leak currents and modulate the spike amplitude (Markham et al., 2013).

      This model considers only NaK pumps. In many cell types, several other ion pumps/exchangers/symporters are simultaneously present and actively participate in restoring the ion gradients. It may be true that only NaK pumps are expressed in the weakly electric fish Eigenmannia virescens. This limits the generalizability of the results to other cell types. While this does not invalidate the results of the present study, biological processes may find many other solutions to address the non-electroneutral nature of the NaK pump. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      Finally, including testable hypotheses for these computational models would strengthen this work.

    4. Author response:

      We thank the reviewers for their concise and detailed summaries, and appreciate the constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by presenting the model assumptions for the electrocyte more explicitly and further elaborate on the generalisability of the results to other cell types with different ion channels including calcium and chloride.

      Experimental work is beyond the scope of our modelling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialised excitable cells (such as electrocytes).

    1. eLife Assessment

      This well-designed study provides important findings concerning the way the brain encodes prediction about self-generated sensory inputs. The authors report that neurons in auditory cortex respond to mismatches in locomotion-driven auditory feedback and that those responses can be enhanced by concurrent mismatches in visual inputs. While there remain alternative explanations for some of the data, these findings provide convincing support for the role of predictive processing in cortical function by indicating that sensorimotor prediction errors in one modality influence the computation of prediction errors in another modality.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a short report investigating mismatch responses in the auditory cortex, following previous studies focused on visual cortex. By correlating mouse locomotion speed with acoustic feedback levels, the authors demonstrate excitatory responses in a subset of neurons to halts in expected acoustic feedback. They show a lack of responses to mismatch in he visual modality. A subset of neurons show enhanced mismatch responses when both auditory and visual modalities are coupled to the animal's locomotion.<br /> While the study is well-designed and addresses a timely question, several concerns exist regarding the quantification of animal behavior, potential alternative explanations for recorded signals, correlation between excitatory responses and animal velocity, discrepancies in reported values, and clarity regarding the identity of certain neurons.

      Strengths:

      (1) Well-designed study addressing a timely question in the field.<br /> (2) Successful transition from previous work focused on visual cortex to auditory cortex, demonstrating generic principles in mismatch responses.<br /> (3) Correlation between mouse locomotion speed and acoustic feedback levels provides evidence for prediction signal in the auditory cortex.<br /> (4) Coupling of visual and auditory feedback show putative multimodal integration in auditory cortex.

      Weaknesses:

      (1) Unclear correlation between excitatory responses and animal velocity during halts, particularly in closed-loop versus playback conditions.<br /> (2) Ambiguity regarding the identity of the [AM+VM] MM neurons.

    3. Reviewer #2 (Public review):

      Using multimodal closed-loop behavior and activity monitoring in the neocortex, Solyga and Keller show that the auditory cortex computes the deviation of current sensory input from expectations. Interestingly, in addition, mismatch responses within the auditory stream are non-linearly influenced by concurrent sensorimotor error computations in the visual pathway. These results suggest that non-hierarchical interactions (lateral relational cross-talk) must be considered when analyzing cortical models based on predictive processing. In my opinion, this is a fundamental study that addresses the question of hierarchical vs. no-hierarchical interactions across neocortical areas. Overall, I find the experiments elegantly designed, and the results robust, providing compelling evidence for non-hierarchical interactions across neocortical areas, and more specifically of exchange of sensorimotor prediction error signals across modalities. The authors thoroughly addressed the concerns raised. In my opinion, this has substantially strengthened the manuscript, enabling much clearer interpretation of the results reported.

    4. Reviewer #3 (Public review):

      This study explores sensory prediction errors in sensory cortex. It focuses on the question of how these signals are shaped by non-hierarchical interactions, specifically multimodal signals arising from same level cortical areas. The authors used 2-photon imaging of mouse auditory cortex in head-fixed mice that were presented with sounds and/or visual stimuli while moving on a ball. First, responses to pure tones, visual stimuli and movement onset were characterized. The authors then made the running speed of the mouse predictive of sound intensity and/or visual flow (closed loop). Mismatches were created through the interruption of sound and/or visual flow for 1 second, disrupting the expected sensory signal. As a control, sensory stimuli recorded during the close loop phase were presented again decoupled from the movement (open loop). The authors suggest that auditory responses to the unpredicted interruption of the sound, which affected neither running speed nor pupil size, reflect mismatch responses. That these mismatch responses were enhanced when the visual flow was congruently interrupted, indicates cross-modal influence of prediction error signals.

      This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. Responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation, yet the contribution of sound offset sensitivity to the observed mismatch responses is not discussed.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      I am satisfied with all clarifications and additional analyses performed by the authors. 

      The only concern I have is about changes in running after [AM+VM] mismatches. 

      The authors reported that they "found no evidence of a change in running speed or pupil diameter following [AM + VM] mismatch (Figures S5A)" (line 197). 

      Nevertheless, it seems that there is a clear increase in running speed for the [AM+VM] condition (S5A). Could this be more specifically quantified? I am concerned that part of the [AM+VM] could stem from this change in running behavior. Could one factor out the running contribution? 

      Please excuse, this was unintentionally omitted. We have added the quantification to Table S1 and included the results of the significance test in (Fig S2A, Fig S4A and Fig S5A). The increase in running speed upon MM presentation (0.5 – 1 s), compared to the baseline running speed in the time window preceding MM presentation (-0.5 – 0 s), was not significant in any of the tested conditions.

      In the process of adding the statistics, we noticed an unfortunate inconsistency in our figures that relates to Figure S5A. The data shown in all other Figures is aligned to the onset of audiomotor mismatch. In Figure S5A, however, the data were aligned to the onset of the visuomotor mismatch. As there is a differential delay in the closed loop coupling of auditory and visual feedback of approximately 170 ms (as described in the methods), visuomotor mismatch onset is slightly before audiomotor mismatch onset. We have corrected this now in the manuscript but have done the statistical analysis for both old and new versions of the figure. In neither case do we find evidence of a running speed response.

      The authors thoroughly addressed the concerns raised. In my opinion, this has substantially strengthened the manuscript, enabling much clearer interpretation of the results reported. I commend the authors for the response to review. Overall, I find the experiments elegantly designed, and the results robust, providing compelling evidence for non-hierarchical interactions across neocortical areas and more specifically for the exchange of sensorimotor prediction error signals across modalities. 

      We are happy to hear!

      Reviewer #2:

      The incorporation of the analysis of the animal's running speed and the pupil size upon sound interruption improves the interpretation of the data. The authors can now conclude that responses to the mismatch are not due to behavioral effects. 

      The issue of the relationship between mismatch responses and offset responses remains uncommented. The auditory system is sensitive to transitions, also to silence. See the work of the Linden or the Barkat labs (including the work of the first author of this manuscript) on offset responses, and also that of the Mesgarani lab (Khalighinejad et al., 2019) on responses to transitions 'to clean' (Figure 1c) in human auditory cortex. Offset responses, as the first author knows well, are modulated by intensity and stimulus length (after adaptation?). That responses to the interruption of the sound are similar in quality, if not quantity, in the closed and open loop conditions suggest that offset response might modulate the mismatch response. A mismatch response that reflects a break in predictability would presumably be less modulated by the exact details of the sensory input than an offset response. Therefore, what is the relationship between the mismatch response and the mean sound amplitude prior to the sound interruption (for example during the preceding 1 second)? And between the mismatch response and the mean firing rate over the same period? 

      Finally, how do visual stimuli modulate sound responses in the absence of a mismatch? Is the multimodal response potentiation specific to a mismatch?

      There are probably two points important to clarify before answering the question – just to make sure there is no semantic misunderstanding. 

      (1) In the jargon of predictive processing, a prediction error is a deviation from a predictable relationship. This can be sensorimotor coupling (as in audio- and visuomotor mismatch), stimulus history (as in oddball, or sound offset responses), surround sensory input (as in endstopping response and center-surround effects in visual processing), etc. A sound offset perceived by an animal in an open loop condition is thus a negative prediction error based on stimulus history (this assumes the animal has no way to predict the time of offset – as is the case in our experiments). We are primarily interested in our work here in characterizing negative prediction errors that result from motor-related predictions – hence the comparison we use is unpredictable sound offset in closed-loop coupling vs. unpredictable sound offset in open-loop coupling. The first is a mixture of an audiomotor prediction error and a stimulus history prediction error. The second is just a stimulus history prediction error. Thus, we compare the two types of responses to isolate the component that can only be attributed to audiomotor prediction errors. 

      (2) Audiomotor mismatch responses can of course be explained in a large variety of ways. For example, one could consider a sound offset a sensory stimulus. One could further assume that locomotion increases sensory responses. If so, one could explain audiomotor mismatch responses as a locomotion related gain of a sensory offset response. However, we need to further postulate that this locomotion related gain is stimulus specific, as for sound onset responses there is no detectable difference between locomotion and sitting. Thus, we are left with a model that explains audiomotor mismatch responses as a “stimulus specific locomotion gain of sensory responses”. This is correct – it is just not very satisfying, has no computational basis, and makes no useful predictions (see e.g. https://pubmed.ncbi.nlm.nih.gov/36821437/ for an extended treatise of exactly this point for visuomotor mismatch responses).

      That responses to the interruption of the sound are similar in quality, if not quantity, in the closed and open loop conditions suggest that offset response might modulate the mismatch response.

      Conceptually both a “sound offset” and an “audiomotor mismatch” are negative prediction errors. Could one describe the effect we see as an audiomotor mismatch modulating a sound offset? Certainly. But if the reviewer means modulate in the sense of neuromodulatory – we are not aware of a neuromodulatory responses that would be fast enough (or be strong enough to have these effects – we have looked into ACh, NA, and Ser (unpublished – no MM response)). Alternatively, they could simply add linearly (as predictive processing would predict). Given that AM mismatch responses are likely computed in auditory cortex, we see no reason to speculate that anything more complicated is happening than a linear summation of different prediction error responses. 

      A mismatch response that reflects a break in predictability would presumably be less modulated by the exact details of the sensory input than an offset response. Therefore, what is the relationship between the mismatch response and the mean sound amplitude prior to the sound interruption (for example during the preceding 1 second)? And between the mismatch response and the mean firing rate over the same period? 

      The reviewer’s intuition here – that mismatch responses have a lower resolution than what one thinks of as sensory responses (or sound offset responses) – is probably not warranted. Experiments that quantify the resolution of mismatch responses are relatively data intense – and to the best of our knowledge this has only been done once in the visual system for visuomotor mismatch responses (Zmarz and Keller, 2016). Here we found that visuomotor mismatch responses exhibited matched spatial (in visual space) resolution to that of visual responses. 

      Regarding the suggested analyses: In a closed loop session, the sound amplitude preceding the mismatch is directly related to the running speed of the mouse. In visual cortex, the amplitude of visuomotor mismatch responses linearly scales with running speed (and consequently visual flow speed) prior to the mismatch – as predicted by predictive processing. See e.g. figure 4B in (Zmarz and Keller, 2016). We have tried this analysis for audiomotor mismatches in the previous round of reviews, but we fear we do not have sufficient data to address this question properly. If we look at how mismatch responses change as a function of locomotion speed (sound amplitude) across the entire population of neurons, we have no evidence of a systematic change (and the effects are highly variable as a function of speed bins we choose). However, just looking at the most audiomotor mismatch responsive neurons, we find a trend for increased responses with increasing running speed (Author response image 1). We analyzed the top 5% of cells that showed the strongest response to mismatch (MM) and divided the MM trials into three groups based on running speed: slow (10-20 cm/s), middle (20-30 cm/s), and fast (>30 cm/s). Given the fact that we have on average 14 mismatch events in total per neuron, the analysis when split by running speed is under-powered.  

      Author response image 1.

      The average response of strongest AM MM responders to AM mismatches as a function of running speed (data are from 51 cells, 11 fields of view, 6 mice).

      Regarding the relationship between mismatch response and firing rate prior to mismatch, we are not sure we understand the intuition. Does the reviewer mean, the average firing rate of the mismatch neuron? Or the population mean? The first is likely uninterpretable as it is bound to be confounded by regression to the mean type artefacts. But in either case, we would have no prediction of what to expect.

    1. eLife Assessment

      Giamundo et al. present fundamental data with new insights into the role of Ezrin, a major membrane-actin linker that assembles signaling complexes, in the spatial regulation of EGF signaling mediators. The use of multiple state-of-the-art microscopy techniques, multiple cell lines and inhibitors, together with in vivo models convincingly supports the majority of their conclusions. The findings are helpful for understanding EGF/mTOR signal transduction and support a critical role for the scaffolding protein Ezrin in the upstream regulation of EGFR/AKT activity, TSC subcellular localization and mTORC1 signaling. These findings contribute substantially to understanding how endo-lysosomal signaling are regulated, alterations which are implicated in many human diseases.

    2. Reviewer #2 (Public review):

      Summary:

      The authors begin with the stated goal of gaining insight into the known repression of autophagy by Ezrin, a major membrane-actin linker that assembles signaling complexes on membranes. RNA and protein expression analysis is consistent with upregulation of lysosomal proteins in Ezrin-deficient MEFs, which the authors confirm by immunostaining and western blotting for lysosomal markers. Expression analysis also implicates EGF signaling as being altered downstream of Ezrin loss, and the authors demonstrate that Ezrin promotes relocalization of EGFR from the plasma membrane to endosomes. Ezrin loss reduces downstream MAPK and Akt signaling, and represses mTORC1 signaling by promoting lysosomal localization of the TSC complex. An Ezrin mutant Medaka fish line is then generated to test its role in retinal cells, which are known to be sensitive to changes in autophagy regulation. Phenotypes in this model appear generally consistent with observations made in cultured cells, though milder overall.

      Strengths:

      Data on the impact of Ezrin-loss on relocalization of EGFR from the plasma membrane are extensive, and thoroughly demonstrate that Ezrin is required for EGFR internalization in response to EGF.

      A new Ezrin-deficient in vivo model (Medaka fish) is generated.

      Strong data demonstrating that Ezrin loss suppresses Akt signaling and mTORC1 signaling by promoting TSC complex localization to the lysosome.

      Weaknesses:

      The authors have addressed all concerns

    3. Reviewer #3 (Public review):

      Summary:

      In this study, the authors have attempted to demonstrate a critical role for the cytoskeletal scaffold protein Ezrin, in the upstream regulation of EGFR/AKT/MTOR signaling. They show that in the absence of Ezrin, ligand-induced EGFR trafficking and activation at the endosomes is perturbed, with decreased endosomal recruitment of the TSC complex, and a corresponding decrease in AKT/MTOR signaling.

      Strengths:

      The authors have used a combination of novel imaging techniques, as well as conventional proteomic and biochemical assays to substantiate their findings. The findings expand our understanding of the upstream regulators of the EGFR/AKT MTOR signaling and lysosomal biogenesis, appear to be conserved in multiple species, and may have important implications for the pathogenesis and treatment of diseases involving endo-lysosomal function, such as diabetes and cancer, as well as neuro-degenerative diseases like macular degeneration. Furthermore, pharmacological targeting of Ezrin could potentially be utilized in diseases with defective TFEB/TFE3 functions like LSDs. While a majority of the findings appear to support the hypotheses, there are substantial gaps in the findings that could be better addressed. Since Ezrin appears to directly regulate MTOR activity, the effects of Ezrin KO on MTOR-regulated, TFEB/TFE3 -driven lysosomal function should be explored more thoroughly. Similarly, a more convincing analysis of autophagic flux should be carried out. Additionally, many immunoblots lack key controls (Control IgG in CO-Ips) and many others merit repetition to either improve upon the quality of the existing data, validate the findings using orthogonal approaches or to provide a more rigorous quantitative assessment of the findings, as highlighted in the recommendation for authors.

      Comments on revisions:

      The authors have satisfactorily addressed most of the concerns raised in the prior version, and have significantly improved upon the overall findings in the revised version.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrate that, while the loss of Ezrin increases lysosomal biogenesis and function, its presence is required for the specific endocytosis of EGFR. Upon further investigation, the authors reveal that Ezrin is a crucial intermediary protein that links EGFR to AKT, leading to the phosphorylation and inhibition of TSC. TSC is a critical negative regulator of the mTORC1 complex, which is dysregulated in various diseases, making their findings a valuable addition to multiple fields of study. Their cell signaling findings are translatable to an in vivo Medaka fish model and suggest that Ezrin may play a crucial role in retinal degeneration.

      Strengths: 

      Giamundo, Intartaglia, et al. utilized unbiased proteomic and transcriptomic screens in Ezrin KO cells to investigate the mechanistic function of Ezrin in lysosome and cell signaling pathways. The authors' findings are consistent with past literature demonstrating Ezrin's role in the EGFR and mTORC1 signaling pathways. They used several cell lines, small molecule inhibitors, and cellular and in vivo knockout models to validate signaling changes through biochemical and microscopy assays. Their use of multiple advanced microscopy techniques is also impressive.

      We are grateful to the Editor and the Reviewers for their important and constructive comments, which amended us to improve our manuscript. We have now carried out new experiments and analyses to further support our findings.

      Weaknesses: 

      While the authors demonstrated activation of TSC1 (lysosomal accumulation) and inactivation of Akt (decreased phosphorylation in TSC1), as well as decreased mTORC1 signaling in Ezrin knockout cells, direct experiments showing the rescue of mTORC1 activity by AKT and TSC1 mutants are required to confirm the linear signaling pathway and establish Ezrin as a mediator of EGFR-AKTTSC1-mTORC1 signaling. Although the authors presented representative images from advanced microscopy techniques to support their claims, there is insufficient quantification of these experiments. Additionally, several immunoblots in the manuscript lack vital loading controls, such as input lanes for immunoprecipitations and loading controls for western blots.

      We wish to thank the Reviewer for his/her important and constructive comments on our manuscript and to consider that our study provides new information for understanding the mechanism regulating TSC/mTORC1 pathway. We have now extensively revised the manuscript according to his/her suggestions. Indeed, to expand on the evidence demonstrating Ezrin as a mediator of EGFR-AKTTSC1-mTORC1 signaling, the revised manuscript includes quantification of all advanced microscopy images, rescue experiments demonstrating the role of Ezrin in AKT/TSC/mTORC1 molecular network, and controls for WBs and immunoprecipitations.

      Reviewer #2 (Public Review):

      Summary: 

      The authors begin with the stated goal of gaining insight into the known repression of autophagy by Ezrin, a major membrane-actin linker that assembles signaling complexes on membranes. RNA and protein expression analysis is consistent with upregulation of lysosomal proteins in Ezrin-deficient MEFs, which the authors confirm by immunostaining and western blotting for lysosomal markers. Expression analysis also implicates EGF signaling as being altered downstream of Ezrin loss, and the authors demonstrate that Ezrin promotes relocalization of EGFR from the plasma membrane to endosomes. Ezrin loss impacts downstream MAPK/Akt/mTORC1 signaling, although the mechanistic links remain unclear. An Ezrin mutant Medaka fish line was then generated to test Ezrin's role in retinal cells, which are known to be sensitive to changes in autophagy regulation. Phenotypes in this model appear generally consistent with observations made in cultured cells, though mild overall. 

      Strengths: 

      Data on the impact of Ezrin-loss on relocalization of EGFR from the plasma membrane are extensive, and thoroughly demonstrate that Ezrin is required for EGFR internalization in response to EGF. 

      A new Ezrin-deficient in vivo model (Medaka fish) is generated.

      Strong data demonstrates that Ezrin loss suppresses Akt signaling. Ezrin loss also clearly suppresses mTORC1 signaling in cell culture, although examination of mTORC1 activity is notably missing in Ezrin-deficient fish. 

      We thank the Reviewer for the recognition of our study and apologize for the insufficient evidence reported in the previous version of the manuscript. As requested by the Reviewer, we considerably expanded the number of experiments to support EZRIN/EGFR/TSC molecular network in regulating autophagy pathway in the revised manuscript. Furthermore, following the Reviewer’s comment we have expanded the interpretation of our findings in the "Discussion” section. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      Weaknesses: 

      LC3 is used as a readout of autophagy, however the lipidated/unlipidated LC3 ratio generally does not appear to change, thus there does not appear to be evidence that Ezrin loss is affecting autophagy in this study. 

      We certainly agree with the Reviewer on the importance of this issue and apologize for the lack of clarity. Ezrin is an already widely characterized protein participating autophagy pathway. Several studies, including our previous studies, demonstrated that both silencing and pharmacological inhibition of Ezrin may promote autophagy by promoting activation of TFEB, in part through the TRPML1-calcineurin signaling pathway (Naso et al 2020; Intartaglia et al 2022; Lou et al 2024). However, a full elucidation on how Ezrin controls autophagy is still not unknown. As suggested by the Reviewer, to reinforce our data, we have now fixed this inaccuracy by better elucidating this aspect in the revised manuscript. Accordingly, we have monitored the autophagic flux and LC3 expression level following the guidelines for the use and interpretation of assays for monitoring autophagy (4th edition) by Klionsky et al. 2021. The data presented in the new Figure supplement 1 now better support the notion that depletion of Ezrin increases autophagic flux. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      The conclusion is drawn that Ezrin loss suppresses EGF signaling, however this is complicated by a strong increase in phosphorylation of the p38 MAPK substrate MK2. Without additional characterization of MAPK and Erk signaling, the effect of Ezrin loss remains unclear.  Causative conclusions between effects on MAPK, Akt, and mTORC1 signaling are frequently drawn, but the data only demonstrate correlations. For example, many signaling pathways can activate mTORC1 including MAPK/Erk, thus reduced mTORC1 activity upon Ezrin-loss cannot currently be attributed to reduced Akt signaling. Similarly, other kinases can phosphorylate TSC2 at the sites examined here, so the conclusion cannot be drawn that Ezrin-loss causes a reduction in Akt-mediated TSC2 phosphorylation.

      We agree with the Reviewer that this is an interesting and important question. However, we respectfully disagree with the Reviewer and feel that addressing this point by additional studies on both MAPK and ERK pathways, as the Reviewer suggests, is outside the scope of this manuscript. We therefore prefer to address these questions in future studies. However, following the Reviewer’s comment we have expanded the interpretation of our findings in the "Discussion” section. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      In Figure 7, the conclusion cannot be drawn that retinal degeneration results from aberrant EGFR signaling.

      We certainly agree with the Reviewer on the importance of this issue. We now fixed this inaccuracy by adding TUNEL staining that showed the retinal degeneration in Ezrin KO medaka fish. The results of these assays are described in the Results section and documented in revised Figure 7, panels H.

      It is unclear why TSC1 is highlighted in the title, as there does not appear to be any specific regulation of TSC1 here. 

      We modified the title accordingly

      In Figure 1 the conclusion is drawn that there is an increase in lysosome number with Ezrin KO, however it does not appear that the current analysis can distinguish an increased number from increased lysosome size or activity. Similarly, conclusions about increased lysosome "biogenesis" could instead reflect decreased turnover.

      Following this Reviewer’s observation, we changed the text according to his/her suggestion.

      Immunoprecipitation data for a role for Ezrin as a signaling scaffold appear minimal and seem to lack important controls.

      We apologize for these inaccuracies. We have now carried out new experiments to further support our findings. Moreover, all blots were changed for better exposed images. In the revised Figures the controls were showed.

      In Figure 3A it seems difficult to conclude that EGFR dimerization is reduced since the whole blot, including the background between lanes, is lighter on that side.

      We now fixed this inaccuracy. The blots were changed for better exposed images in revised Figure 3, panel A. and quantified

      In Figure 6C specificity controls for the TSC1 and TSC2 antibodies are not included but seem necessary since their localization patterns appear very different from each other in WT cells.

      We apologize because we have created some confusion. We have now emended this mistake and revised all panels in Figure 6C (now Figure 6D) for consistency between figures and text. Concerning the specificity of TSC1 and TSC2 antibodies and staining, indeed, antibodies labelling was showing the ordinary pattern from TSC in the cells as stated in Menon et al. 2014. We would like to point out that the antibodies are the same indicated in Menon et al. 2014 and our data are not only based on TSC1 and TSC2 staining but on a considerable number of in vivo and in vitro experiments in which many and different markers were used by performing several complementary approaches (i.e. immunofluorescence, western blot analysis, Omics, etc.)

      Menon S, Dibble CC, Talbott G, Hoxhaj G, Valvezan AJ, Takahashi H, Cantley LC, Manning BD. Spatial control of the TSC complex integrates insulin and nutrient regulation of mTORC1 at the lysosome. Cell. 2014 Feb 13;156(4):771-85.

      In Figure 7 the signaling effects in Ezrin-deficient fish are mild compared to cultured cells, and effects on mTORC1 are not examined. Further data on the retinal cell phenotypes would strengthen the conclusions.

      We thank the Reviewer for his/her comment. We have now fixed this inaccuracy in the revised manuscript. We added the analysis for p4EBP1 (S65), a mTORC1 substrate Figure 7 panel D. 

      In Figure 7F there appears to be more EGFR throughout the cell, so it is difficult to conclude that more EGFR at the PM in Ezrin-/- fish means reduced internalization. 

      We agree with the Reviewer that it is an important question that helped us to improve the quality of the data presented. As correctly noted by the Reviewer, EGFR protein level is increased due to EZRIN deletion. This is evident in Figure 7 panel F, in line with both proteomic analysis and in vitro experiments (Figure 2I; Figure 3E; Figure 5C). We also agree that the increase of EGFR protein level could strength the background of immunofluorescence. Therefore, to better represent the EGFR membrane translocation on flat mount RPE from medaka lines, we add a highlighting box showing it in both WT and KO medaka line in the revised Figure 7 panel F.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors have attempted to demonstrate a critical role for the cytoskeletal scaffold protein Ezrin, in the upstream regulation of EGFR/AKT/MTOR signaling. They show that in the absence of Ezrin, ligand-induced EGFR trafficking and activation at the endosomes is perturbed, with decreased endosomal recruitment of the TSC complex, and a corresponding decrease in AKT/MTOR signaling. 

      Strengths: 

      The authors have used a combination of novel imaging techniques, as well as conventional proteomic and biochemical assays to substantiate their findings. The findings expand our understanding of the upstream regulators of the EGFR/AKT MTOR signaling and lysosomal biogenesis, appear to be conserved in multiple species, and may have important implications for the pathogenesis and treatment of diseases involving endo-lysosomal function, such as diabetes and cancer, as well as neuro-degenerative diseases like macular degeneration. Furthermore, pharmacological targeting of Ezrin could potentially be utilized in diseases with defective TFEB/TFE3 functions like LSDs. While a majority of the findings appear to support the hypotheses, there are substantial gaps in the findings that could be better addressed. Since Ezrin appears to directly regulate MTOR activity, the effects of Ezrin KO on MTOR-regulated, TFEB/TFE3 -driven lysosomal function should be explored more thoroughly. Similarly, a more convincing analysis of autophagic flux should be carried out. Additionally, many immunoblots lack key controls (Control IgG in co-IPs) and many others merit repetition to either improve upon the quality of the existing data, validate the findings using orthogonal approaches, or provide a more rigorous quantitative assessment of the findings, as highlighted in the recommendation for authors. 

      We thank the Reviewer for the recognition of our study and apologize for the inaccuracies previously. We also greatly appreciate the efforts the reviewer went through with his/her support and help for the improvement of our manuscript. We considerably expanded the number of experiments to support EZRIN/EGFR/AKT network in controlling mTORC1 pathway in the revised manuscript as requested by the Reviewer. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      Reviewer #1 (Recommendations for The Authors):

      Major comments: 

      (1) While the authors show that, in the absence of Ezrin, TSC accumulates on the lysosome and suppresses mTORC1 signaling, they should perform additional genetic experiments to strengthen their conclusions. Can they knockout or knockdown TSC1/2 in Ezrin-deficient cells to rescue mTORC1 activity? Can they mutate the lysosomal localization signal on TSC1 (TSC1Q149E/R204E/K238E) in Ezrin-deficient cells to rescue mTORC1 activity? Does constitutively active AKT (myr-AKT or AKT-E40K) restore mTORC1 activity in Ezrin-deficient cells? 

      We agree with the Reviewer that it is an important concern that helped us to improve the quality of the data presented. We now provide in the revised version of Figure supplement 4F the results of pharmacological inhibition of Ezrin on MEF-TSC2 KO cells. In line with our findings, the lack of TSC2 is able to rescue mTORC1 signaling in absence of Ezrin activity. Thus, these data strongly support that Ezrin is required for TORC1pathway via TSC complex targeting.

      (2) In the absence of Ezrin, TSC1 constitutively localizes on the lysosome and suppresses mTORC1. Does this suppression hold in the presence of other mTORC1-activating signals (i.e., amino acids, insulin, oxygen)? 

      Following the reviewer’s suggestion we now provide this information in the revised Figure 6C, in which we showed that stimulation with insulin does not exert its activating effect on mTORC1 signaling (i.e. phosphorylation of pP70 S6 - pT389). These new data, together with the experiments on MEF TSC2 KO cells, clearly support the model by which Ezrin works as a scaffold protein connecting ATK signaling to TSC complex. The lack of Ezrin induces a disconnection between AKT and TSC complex, which is translocated on lysosomes and insensitive to inhibition of AKT signaling.

      (3) In Figure 3A, the authors showed EGFR dimerization through a western blot of a crosslinking assay. However, the western blot data are unclear and do not strongly support their statement. Additionally, the authors mentioned that the dimerization is confirmed by immunofluorescence analysis, but this statement should be revised since the imaging analysis only indirectly shows the copresence of EZR and EGFR, not necessarily the dimerized EGFR. The authors should perform additional experiments to strengthen their claim or tone down their statements in the text and model figure. 

      We certainly agree with the Reviewer on the importance of this issue and now we have fixed this inaccuracy in the revised manuscript. The blots of crosslinking were changed for better exposed images in revised Figure 3, panel A. Moreover, we also properly quantified signals to support our conclusion.

      (4) It is interesting that Ezrin binds EGFR, AKT, and TSC as a scaffolding protein. To define the mechanisms by which Ezrin interacts with AKT, EGFR, and TSC, can the authors perform domain analyses to determine which regions of Ezrin are required for its binding with AKT, EGFR, and TSC in mediating EGFR-AKT-TSC-mTORC1 signaling? 

      We thank the Reviewer for his/her comment that improves our manuscript. Conducting domain analysis in the lab would be ideal, although this seems to us a long tour de force that might be associated to several technical and experimental issues. However, in silico approaches provide a helpful alternative for generating initial hypotheses about domain-domain interactions, though they should be seen as a starting point rather than a complete solution. Recent advances in fold prediction suggest that AlphaFold3 could be used to predict dimer formation and, consequently, domain-domain interactions. However, such an approach is challenging in this case because some of the considered proteins are transmembrane, and all are prone to form multimeric complexes with multiple partners, making them poor candidates for reliable fold predictions. In fact, the predicted dimers are poorly supported, and AlphaFold3 lacks confidence in the relative positioning of interactors, limiting its interpretability. Alternatively, database mining and machine-learning methods, such as HINT, Domine, and PPIDomainMiner, provide more robust evidence. Indeed, these tools allow us to consistently identify a strong interaction between Ezrin's FERM central domain and EGFR's PK domain shown now in the Figure Supplement 2C and Supplement Figure 3C-H. Importantly, these findings generate valuable hypotheses, therefore experimental validation is still necessary. But we prefer to leave it for future studies.

      Minor Comments: 

      (1) There are several immunoblots that did not have adequate controls:  - In Figure 2D, an input lane should be shown for each of the cell lysates to demonstrate the presence of other proteins in the cell lysate used for the IP.

      We have now fixed this inaccuracy in the revised manuscript.

      - Figure 3A does not have a loading control. Also, immunoblot quality should be significantly improved.

      We have now fixed this inaccuracy in the revised manuscript.

      - The HER2 western blot in Figure 5C does not accurately represent the data shown in the quantification graph.

      We have now fixed this inaccuracy by replacing HER2 western blot in the revised Figure 5C.

      - In Figure 6A, the authors should include an input as a control for the IP. To further support their claim in the model figure, can the authors also probe the IP lysate for Ezrin and Tsc2? If all are indeed in a complex together, they should be present. 

      Following this Reviewer’s observation, we add the input as control in the IP in the revised Figure 6A. Moreover, we include the immunoprecipitation data for the EZRIN and TSC2 interaction, accordingly (Figure 6A).

      - Phosphorylation sites across figures should be uniformly annotated for consistency and ease of understanding, e.g., pTSC2(S939), pS6K1(T389), and pAKT(S473).

      We have now fixed this inaccuracy in the revised text.

      (2) There are several microscopy data that lack adequate quantification. For instance, Figures 2E, 2F, 3C, 4A, 5A, and 6F only show very few cells as representative images, which is not sufficient to support their claims. 

      We thank the Reviewer for his/her comment that improves our manuscript. Accordingly, we add adequate quantification and statistical analysis in the revised Figures, accordingly.

      (3) Some suggestions to improve the readability of the manuscript: 

      -  In the abstract (line 32): "Loss of Ezrin was deficient in TSC repression by EGF and culminated in translocation of TSC to lysosomes triggering suppression of mTORC1 signaling." The wording is somewhat confusing, please change such as "Loss of Ezrin was not sufficient to repress TSC by EGF and culminated..." or "Loss of Ezrin blunted EGF-induced TSC suppression and culminated..." 

      We apologize for the lack of clarity and now we have fixed this inaccuracy by better elucidating this aspect in the revised manuscript.

      -  Figure 3D has a typo in the western blot labeling. Please change Citosol to Cytosol. 

      We have now fixed this inaccuracy in the revised text.

      -  Line 291: "Moreover, TSC2 resulted activated and AKT/mTOR signaling..." The wording is confusing. 

      We have now fixed this inaccuracy in the revised text. The text now reads: “Moreover, we found that TSC2 was dephosphorylated  in response to light in the retina, when inactive Ezrin (Naso et al., 2020) and EGFR are weakly expressed (Figure supplement 6C) as a consequence of a decrease of the AKT/mTORC1 signaling…..)

      -  The model in Figure 8 indicates that upon EGF stimulation, the activated Ezrin interacts with EGFR, causing its dissociation from actin filaments and leading to its endosome incorporation. However, the authors did not provide supporting data for this claim. Can the authors either cite literature or provide data for this? Otherwise, the model should be edited to remove actin filaments in the model. 

      We have now fixed this inaccuracy by removing actin filaments in the revised model.

      Reviewer #2 (Recommendations For The Authors):

      The data and written text seem to deal entirely with mTORC1, rather than mTORC2, thus it seems "mTOR" should be changed to "mTORC1" throughout. 

      We have now fixed this inaccuracy in the revised manuscript.

      For clarification, the TSC protein complex should be referred to as the "TSC complex", whereas "TSC" generally refers to the tumor syndrome Tuberous Sclerosis Complex.

      We have now fixed this inaccuracy in the revised manuscript.

      Quantification of colocalization would be helpful in all the panels where it is currently missing.

      We thank the Reviewer for his/her comment that improves our manuscript. Accordingly, we add adequate quantification of colocalization for each immunofluorescence in the revised Figures, accordingly.

      Line 84 typo "thorough" should be "through" 

      We have now fixed this inaccuracy in the revised manuscript.

      Line 178 - typo 

      We have now fixed this inaccuracy in the revised manuscript.

      Line 209 - typo 

      We have now fixed this inaccuracy in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      Fig. 1 The data showing an increase in lysosomal biogenesis suggests an increase in transcriptional activity. This should be confirmed by one or more of the following: 1) Increased TFEB/TFE3 nuclear localization following EZR loss, 2) Increased CLEAR promoter luciferase activity assays, 3) Increased expression of multiple CLEAR transcripts (https://www.science.org/doi/10.1126/science.1174447) or 4) Increased TFEB/ TFE3/ CLEAR gene signatures by RNA seq. Similarly, data showing increased autophagic flux should be confirmed in the presence of chloroquine or bafilomycin. 

      We agree with the Reviewer that it is an important concern that helped us to improve the quality of the data presented. It is well established that a major mechanism regulating TFEB activity is represented by the nuclear translocation. We have now carried out new experiments demonstrating that depletion of Ezrin induces TFEB nuclear translocation in Ezrin<sup>-/-</sup> cells. These findings are in line with our previous data in which pharmacological inhibition and silencing of Ezrin induced the same cellular phenotype. We also apologize because we have created some confusion, because we already carried out experiments with Bafilomycin to confirm the increase of autophagic flux. Therefore, the blots of autophagic flux were changed for better exposed images in revised Figure supplement 1H and the text was modified to emphasize these findings, accordingly.

      Fig 2D, the lanes with EZR -/- cells expressing the EZR mutants should be repeated on the same gel as the first 2 lanes (with the WT and EZR<sup>-/-</sup> cells) 

      We thank the Reviewer for his/her comment that improves our manuscript. In order to avoid any confusion, when describing the results in Figure 2D, we have now modified the Figure 2D, providing the required controls in the response to Reviewer #1 and #2. We hope the new version of our data will satisfy the Reviewer’s worries.

      Fig 2F- The presence of reduced EGFR in intracellular compartments in Ezrin KO/ -/- cells should be quantified, and shown for a 2nd EZR null cell line as well (Ezrin null MEFs) 

      We added EGFR quantification in Figure 2F. We have now carried out new experiments demonstrating that EGFR is localized on cytoplasmic membrane in MEF Ezrin KO (Figure supplement 2H), accordingly. 

      Fig 2G, did the authors test the effects of EZR depletion on basal and EGF stimulated EGFR autophosphorylation on Y1068 and Y1045 as well as downstream activation of p42/44 ERK MAPK?  Those should be tested in the HeLa system as well as the MEFs cells with EZR KO. 

      Following the Reviewer’s request, we have now added western blot data for EGFR autophosphorylation on Y1068 and p42/44 ERK MAPK in Figure 5C. Moreover, we have now added western blot data for p42/44 ERK MAPK on MEF cells in Figure supplement 2F. In contrast, we cannot provide any data for EGFR autophosphorylation on Y1068, because the antibody was not working on proteins from MEF cells.

      Also, why would HER3 levels be expected to decrease? There seems to be minimal change in HER3 expression. Also, the significance of increased MK2 phosphorylation should be further elaborated. 

      The Reviewer raised justified concerns about the HER3 and MK2. We have discussed these aspects in the "results section”, accordingly. 

      Fig 3A- Crosslinking of EGFR is not very apparent in this blot. The crosslinking blots should be repeated 3 times and quantified. 

      We certainly agree with the Reviewer on the importance of this issue and now we have fixed this inaccuracy in the revised manuscript. The blots of crosslinking were changed for better exposed images in revised Figure 3, panel A. Moreover, we also properly quantified signals to support our conclusion.

      Fig 3D- How were membrane endosomes isolated? This should be stated in the methods. Membrane/ Cytosol and Endosome fractionation showing EGFR levels should be shown in Ezrin null MEFs as well, and membrane expression should be further substantiated with surface biotinylation for cell surface EGFR. 

      We now report more information about the method that we used for membrane endosomes isolation in the Materials and Methods section. Following the Reviewer’s request, we also show that EGFR was not localized on endosomes upon EGF on Ezrin null MEFs. This data was reported in the new revised Figure Supplement 2G. Moreover, we have now carried out new experiments demonstrating the membrane localization of EGFR in MEF Ezrin KO cells. These findings are shown in Figure supplement 2H.

      Fig 5C: Similar to 2G, EGFR autophosphorylation on Y1068 and Y1045 should also be measured, as well as downstream activation of p42/44 ERK MAPK? 

      Following the Reviewer’s request, we have now carried out new experiments to assess the EGFR autophosphorylation on Y1068 and Y1045, as well as downstream activation of p42/44 ERK MAPK.  We added these new data in the revised Figure 5C, accordingly. 

      Fig 5D: Similar to 3D, Membrane/ Cytosol and Endosome fractionation showing EGFR levels should be shown in Ezrin null MEFs as well, and further substantiated with surface biotinylation for cell surface EGFR. 

      Following the Reviewer’s request, we show that EGFR was not localized on endosomes upon EGF (Figure Supplement 2G). 

      Supplement 2E: The blots show lower expression of EGFR and higher MAPK activation in EZR KO cells, contradicting the data in the other cells. 

      We apologize because we have created some confusion. It occurred during the preparation of Figure supplement 2E, reflecting image of a previous not finalized version of the Figure. We have now removed the error and replaced with a correct WB panel.

      Supplement 2F: The authors should repeat the NSC668394 experiment using: 1) multiple doses, 2) In both the Ezrin KO and null cell lines 3) and repeat 3X to quantify differences in total EGFR. 

      We respectfully disagree with the Reviewer and feel that addressing this point by additional studies on dose response of NSC668394, as the Reviewer suggests, is outside the scope of this manuscript. However, we would like to point out that we have already conducted extensive studies on the doseresponse effects of NSC668394 administration in vitro (Patent: WO2020070333A1). 

      Moreover, we apologize for not having provided enough information about the number of biological independent replicates for WB analyses. Therefore, to fill this gap of information we have expanded the Material and Methods section, accordingly.

      Patent: WO2020070333A1 - Ezrin inhibitors and uses thereof

      Fig 6A: The IP experiments should be repeated with Control IgG 

      We have now fixed this inaccuracy in the revised manuscript.

      Typos: 

      (1) Figure 3D: Citosol 

      We have now fixed this inaccuracy in the revised manuscript.

      (2) Line 216-217: "increased EGFR protein 217 levels on purified membranes and endosomes (Figure 3D and E)" - That should be decreased EGFR on endosomes in accordance with Figure 3D (lower panels) 

      We have now fixed this inaccuracy in the revised manuscript.

      (3) Abstract: "Consistently, Medaka fish deficient for Ezrin exhibit defective endo-lysosomal pathway" 

      We have now fixed this inaccuracy in the revised manuscript.

    1. eLife Assessment

      This study builds on previous findings showing modular organisation of primate visual cortical areas by presenting important results about the cortical processing of colour, disparity and naturalistic textures in the human visual cortex at the spatial scale of cortical layers and columns using state-of-the-art high-resolution fMRI methods at ultra-high magnetic field strength (7 T). Solid evidence supports an interesting layer-specific informational connectivity analysis to infer information flow across early visual areas for processing disparity and color signals. While the question of how the modularity of representation relates to cortical hierarchical processing is interesting, the findings that texture does not map onto previously established columnar architecture in V2 is suggestive. The successful application of high-resolution fMRI methods to study the functional organization along cortical columns and layers is relevant to a broad readership interested in general neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study examines the cortical modular functional organization of visual texture in comparison with that of color and disparity. While color, disparity, and orientation have been shown to exhibit clear functional organizations within the thin, thick, and thick/pale stripes of V2, whether the feature of texture is also organized within V2 is unknown. Using ultrahigh field 7T fMRI in humans viewing color-, disparity-, and texture-specific visual stimuli, the authors find that, unlike color and disparity, texture does not exhibit stripe-specific organization in V2. Moreover, using laminar imaging methods and calculations of informational connectivity, they find V2 color and disparity stripes exhibit the expected feedforward and feedback relationships with V1 & V4, and with V1 & V3ab, respectively. In contrast, texture activation, found predominantly in the deep layers of V2, is driven preferentially by feedback from V4. Based on these findings, the authors suggest that texture is a visual feature computed in higher-order areas and not generated by local intra-V2 computation.

      Strengths:

      This study poses an interesting and fundamental question regarding the relationship between functional modularity and hierarchical origin of computed properties. This question is thus highly significant and deserves study. The methodology is appropriate for the question and the areal and laminar resolution achieved across 10 subjects is commendable. The combination of high-resolution functional imaging and informational connectivity analysis introduces a useful way for examining feedforward and feedback relationships in mesoscale imaging data.

      Comments on latest version:

      The authors have responded adequately to my comments. The lack of texture organization in V2 is now strengthened by the apparently more clustered texture response in V4 (Fig. S9). The paired results in V2 and V4 make the study stronger. The authors may suggest that texture response, while present at the neural level, may not emerge as a primary organizational cue in V2, based on this texture stimulus paradigm. The negative results should still be presented cautiously. The connectivity inferences are interesting but should also be stated cautiously, as there are multiple assumptions. Overall, this study makes a contribution to emerging views about texture processing in the early visual pathways.

    3. Reviewer #2 (Public review):

      This study investigates the cortical circuitry at the mesoscopic level of cortical columns in the human secondary visual cortex (V2) using high-resolution fMRI at ultra-high field strength (7T). The findings confirm the columnar organization of color-selective thin and disparity-selective thick stripes, a result previously demonstrated and replicated in human fMRI research. However, this study adds a novel layer of analysis by examining cortical depth, providing insights into feedforward and feedback connections to and from V2. Furthermore, examining texture selectivity in V2 showed no evidence of a columnar structure when compared to color- and disparity-selective activation clusters. Interestingly, texture selectivity in V2 was most pronounced in deeper cortical layers, with significant feedback connectivity from V4. The authors conclude that local columnar circuitry plays a crucial role in color and disparity processing within V2, while texture selectivity is driven by feedback modulation. This research underscores the potential of high-resolution human fMRI to explore the local circuitry of the cortex at the mesoscopic scale.

      However, I still have a few comments that I would like to be addressed:

      (1) In lines 401-403, the authors state that differential BOLD responses can significantly enhance the laminar specificity. Differential contrasts indeed have the potential to reduce macrovascular contributions that are unspecific to both experimental conditions, which was already discussed in the literature (e.g., Yacoub et al., 2008, High-field fMRI unveils orientation columns in humans). This might be especially true for the pial vasculature that drains a larger surface area of the cortex, e.g., multiple columns, which is probably the key factor that enables cortical column mapping using differential BOLD contrasts despite the relatively large spatial point spread function of the BOLD response. However, this may differ for laminar analyses, where neuronal and vascular responses from intracortical and pial veins might be harder to disentangle. It would, therefore, be advisable to tone down this statement somewhat since it could imply that laminar specificity can be readily achieved with GE-BOLD, while this remains an active area of research. This is not to say that the present results are incorrect, but the broader implications of this statement should be cautiously framed.

      (2) Looking at Figure 3, one might also argue (excluding responses from V4) that statistically significant differences in selectivity are only observed where the cortical profiles generally show higher response levels. Could this be simply due to varying signal-to-noise ratios (SNR) achieved by different contrasts (color, disparity, texture)?

      (3) In lines 480-484, the authors state that twenty blocks for each stimulus condition should be sufficient to investigate within-subject effects. It would be helpful if they could elaborate on the basis for this claim. High-resolution fMRI is typically limited by low temporal signal-to-noise ratio (tSNR), and extensive averaging is often required to achieve sufficient signal. Clarifying the rationale behind this assertion would strengthen the argument.

    4. Reviewer #3 (Public review):

      Summary:

      Ai et al. studied texture, color and disparity selectivity in human visual cortex at mesoscale level using high-resolution fMRI. They reproduced earlier monkey and human studies showing interdigitated color-selective and disparity-selective sub-compartments within area V2, likely corresponding to thin and thick stripes, respectively. At least with the stimuli used, no clear evidence for texture-selective mesoscale activations were observed in area V2. The most interesting and novel part of this study focused on cortical-depth-dependent connectivity analyses across areas. The data suggest feedback and feedforward functional connectivity between V1 and V3A for disparity signals and feedback from V4 to the deep layers of V2 for textures.

      Strengths:

      High-resolution fMRI and highly interesting layer-specific informational connectivity analyses.

      Weaknesses:

      The authors tend to overclaim their results. Too few data to make conclusive inferences.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1:

      (1) To support the finding that texture is not represented in a modular fashion, additional possibilities must be considered. These include (a) the effectiveness and specificity of the texture stimulus and control stimuli, (b) further analysis of possible structure in images that may have been missed, and (c) limitations of imaging resolution.

      Thank you for your comments. To address your concerns, we have conducted a new 3T fMRI experiment to demonstrate the effectiveness and specificity of our stimuli, performed further analyses to investigate possible structure of texture-selective activation, and discussed the limitations of imaging resolution.

      (a) To demonstrate the effectiveness and specificity of our stimuli, we conducted a new 3T fMRI experiment in five participants using an experimental design and texture families similar to those in Freeman (2013). Six texture stimuli in the 7T experiment were also included. To assess the effectiveness of each stimulus type, different texture families and their corresponding noise patterns were presented in separate blocks for 24 seconds, at a high presentation rate of 5 frames per second. In Figure S7, all texture families showed significantly stronger activation in V2 compared to their corresponding noise patterns, even for those that ‘appeared’ to have residual texture (e.g., the third texture family). These results demonstrate that our texture vs. noise stimuli were effective in producing texture-selective activations in area V2. Compared to the 7T results, the 3T data showed a notable increase in texture-selective activations in V2, likely due to increased stimulus presentation speed (1.25 vs. 5 frames/second). Future studies should use stimuli with faster presentation speed to validate our results in the 7T experiment.

      (b)Thank you for pointing out the possible structures of texture-selective activations in the peripheral visual field (Figure S1). In further analyses, we also found stronger texture selectivity in more peripheral visual fields (Figure 2D), and there were weak but significant correlations in the texture-noise activation patterns during split-half analysis (Author response image 2). Although this is not strong evidence for columnar organization of naturalistic textures, it suggests a possibility for modular organizations in the peripheral visual field.

      (c) Although our fMRI result at 1-mm isotropic resolution did not show strong evidence for modular processing of naturalistic texture in V2 stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We have discussed this possibility in the revised manuscript.

      We hope this response clarifies our findings, and we have revised the conclusions in the manuscript accordingly.

      (2) More in-depth analysis of subject data is needed. The apparent structure in the texture images in peripheral fields of some subjects calls for more detailed analysis. e.g Relationship to eccentricity and the need for a 'modularity index' to quantify the degree of modularity. A possible relationship to eccentricity should also be considered.

      Based on your recommendations, we have performed further analysis and found interesting results regarding the modularity index in relation to eccentricity. As shown in Figure 2D, the texture-selectivity index increased as eccentricity. This may suggest a higher possibility of modular organization for texture representation in the peripheral compared to central visual fields. We have updated our results in Figure 2C, and discussed this possibility in the revised manuscript.

      (3) Given what is known as a modular organization in V4 and V3 (e.g. for color, orientation, curvature), did images reveal these organizations? If so, connectivity analysis would be improved based on such ROIs. This would further strengthen the hierarchical scheme.

      Following your recommendations, we have conducted further analysis to investigate the potential modular organizations in V4 and V3ab. In Figure S9 (Figure S9), vertices that are most responsive to color, disparity and texture were shown in a representative subject. Indeed, texture-selective patches can be found in both V4 and V3ab, along with the color- and disparity-selective patches. We agree with you that there should be pathway-specific connectivity among the same type of functional modules. In the informational connectivity analyses, we already used highly informative voxels by feature selection, which should mainly represent information from the modular organizations in these higher visual areas.

      Reviewer #2:

      (1) In lines 162-163, it is stated that no clear columnar organization exists for naturalistic texture processing in V2. In my opinion, this should be rephrased. As far as I understand, Figure 2B refers to the analysis used to support the conclusion. The left and middle bar plots only show a circular analysis since ROIs were based on the color and disparity contrast used to define thin and thick stripes. The interesting graph is the right plot, which shows no statistically significant overlap of texture processing with thin, thick, and pale stripe ROIs. It should be pointed out that this analysis does not dismiss a columnar organization per se but instead only supports the conclusion of no coincidence with the CO-stripe architecture.

      Thank you for your suggestions. Reviewer #1 also raised a similar concern. We agree that there may be a smaller functional module of textures in area V2 at a finer spatial scale than our fMRI resolution. We have rephrased our conclusions to be more precise.

      (2) In Figure 3, cortical depth-dependent analyses are presented for color, disparity, and texture processing. I acknowledge that the authors took care of venous effects by excluding outlier voxels. However, the GE-BOLD signal at high magnetic fields is still biased to extravascular contributions from around larger veins. Therefore, the highest color selectivity in superficial layers might also result from the bias to draining veins and might not be of neuronal origin. Furthermore, it is interesting that cortical profiles with the highest selectivity in superficial layers show overall higher selectivity across cortical depth. Could the missing increase toward the pial surface in other profiles result from the ROI definition or overall smaller signal changes (effect size) of selected voxels? At least, a more careful interpretation and discussion would be helpful for the reader.

      We agree with you that there will be residual venous effects even after removing voxels containing large veins. However, calculating the selectivity index largely removed the superficial bias (Figure 3). In the revised manuscript, we discussed the limitations of cortical depth-dependent analysis using GE-BOLD fMRI.

      In Line 397-403: “Due to the limitations of the T2*w GE-BOLD signal in its sensitivity to large draining veins (Fracasso et al., 2021; Parkes et al., 2005; Uludag & Havlicek, 2021), the original BOLD responses were strongly biased towards the superficial depth in our data (Figure S8). Compared to GE-BOLD, VASO-CBV and SE-BOLD fMRI techniques have higher spatial specificity but much lower sensitivity (Huber et al., 2019). As shown in a recent study (Qian et al., 2024), using differential BOLD responses in a continuous­­ stimulus design can significantly enhance the laminar specificity of the feature selectivity measures in our results (Figure 3).”

      It is unlikely that the strongest color selectivity index in the superficial depth is a result of stronger signal change or larger effect size in this condition. As shown by the original BOLD responses in Figure S8, all stimulus conditions produced robust activations that strongly biased to the superficial depth. High texture selectivity was also found in V4 and V3ab across cortical depth, which showed a flat laminar profile.

      (3) I was slightly surprised that no retinotopy data was acquired. The ROI definition in the manuscript was based on a retinotopy atlas plus manual stripe segmentation of single columns. Both steps have disadvantages because they neglect individual differences and are based on subjective assessment. A few points might be worth discussing: (1) In lines 467-468, the authors state that V2 was defined based on the extent of stripes. This classical definition of area V2 was questioned by a recent publication (Nasr et al., 2016, J Neurosci, 36, 1841-1857), which showed that stripes might extend into V3. Could this have been a problem in the present analysis, e.g., in the connectivity analysis? (2) The manual segmentation depends on the chosen threshold value, which is inevitably arbitrary. Which value was used?

      A previous study showed that the retinotopic atlas of early visual areas (V1-V3) aligned very well across participants on the standard surface after surface-based registration by the anatomical landmarks (Benson 2018). Thus, the group-averaged atlas should be accurate in defining the boundaries of early visual areas. To directly demonstrate the accuracy of this method, retinotopic data were acquired in five participants in a 3T fMRI experiment. A phase-encoded method was used to define the boundaries of early visual areas (black lines in Author response image 1), which were highly consistent with the Benson atlas.

      Although a few feature-selective stripes may extend into V3, these stripe patterns were mainly represented in V2. Thus, the signal contribution from V3 is likely to be small and should not affect the pattern of results. The activation map threshold for manual segmentation was abs(T)>2. We have clarified this in the revised methods.

      Author response image 1.

      Retinotopic ROIs defined by the Benson atlas (left) and the polar angle map (right) of the representative subject. Black lines denote the boundaries of early visual areas based on the retinotopic map from the subject.

      Benson, N. C., Jamison, K. W., Arcaro, M. J., Vu, A. T., Glasser, M. F., Coalson, T. S., Van Essen, D. C., Yacoub, E., Ugurbil, K., Winawer, J., & Kay, K. (2018). The Human Connectome Project 7 Tesla retinotopy dataset: Description and population receptive field analysis. J Vis, 18(13), 23. https://doi.org/10.1167/18.13.23

      (4) The use of 1-mm isotropic voxels is relatively coarse for cortical depth-dependent analyses, especially in the early visual cortex, which is highly convoluted and has a small cortical thickness. For example, most layer-fMRI studies use a voxel size of around isotropic 0.8 mm, which has half the voxel volume of 1 mm isotropic voxels. With increasing voxel volume, partial volume effects become more pronounced. For example, partial volume with CSF might confound the analysis by introducing pulsatility effects.

      We agree that a 1-mm isotropic voxel is much larger in volume than a 0.8-mm isotropic voxel, but the resolution along the cortical depth is not a big difference. In addition to our study, a previous study showed that fMRI at 1-mm isotropic resolution is capable of resolving cortical depth-dependent signals (Roefs et al., 2024; Shao et al., 2021). We have discussed these issues about fMRI resolution in the revised manuscript.

      In Line 403-408: “Compared to the submillimeter voxels, as used in most laminar fMRI studies, our fMRI resolution at 1-mm isotropic voxel may have a stronger partial volume effect in the cortical depth-dependent analysis. However, consistent with our results, previous studies have also shown that 7T fMRI at 1-mm isotropic resolution can resolve cortical depth-dependent signals in human visual cortex (Roefs et al., 2024; Shao et al., 2021).”

      Shao, X., Guo, F., Shou, Q., Wang, K., Jann, K., Yan, L., Toga, A. W., Zhang, P., & Wang, D. J. J. (2021). Laminar perfusion imaging with zoomed arterial spin labeling at 7 Tesla. NeuroImage, 245, 118724. https://doi.org/10.1016/j.neuroimage.2021.118724

      Roefs, E. C., Schellekens, W., Báez-Yáñez, M. G., Bhogal, A. A., Groen, I. I., van Osch, M. J., ... & Petridou, N. (2024). The Contribution of the Vascular Architecture and Cerebrovascular Reactivity to the BOLD signal Formation across Cortical Depth. Imaging Neuroscience, 2, 1–19.

      (5) The SVM analysis included a feature selection step stated in lines 531-533. Although this step is reasonable for the training of a machine learning classifier, it would be interesting to know if the authors think this step could have reintroduced some bias to draining vein contributions.

      We excluded vertices with extremely large signal change and their corresponding voxels in the gray matter when defining ROIs. The same number of voxels were selected from each cortical depth for the SVM analysis, thus there was no bias in the number of voxels from the superficial layers susceptible to large draining veins.

      Reviewer #3:

      The authors tend to overclaim their results.

      Re: Thank you for your comments. We added more control analyses to strengthen our findings, and gave more appropriate discussion of results.

      Recommendations for the authors:

      Reviewer #1:

      (1) Controls: There is a bit more complexity than is expressed in the introduction. The authors hypothesize that the emergence of computational features such as texture may be reflected in specialized columns. That is, if texture is generated in V2, there may be texture columns (perhaps in the pale stripes of V2); but if generated at a higher level, then no texture columns would be needed. This is a very interesting and fundamental hypothesis. While there may be merit to this hypothesis, the demonstration that color and disparity are modular but not texture falls short of making a compelling argument. At a minimum, the finding that texture is not organized in V2 requires additional controls. (a) To boost the texture signal, additional texture stimuli or a sequence of multiple texture stimuli per trial could be considered. (b) Unfortunately, the comparison noise pattern also seems to contain texture; perhaps a less textured control could be designed. (c) It also appears that some of the texture images in Supplementary Figure S1 contain possible structure, e.g. in more peripheral visual fields. (d) Is it possible that the current imaging resolution is not sufficient for revealing texture domains? (e) Note that 'texture' may be a property that defines surfaces and not contours. Thus, while texture may have orientation content, its function may be associated with the surface processing pathways. A control stimulus might contain oriented elements of a texture stimulus that do not elicit texture percept; such a control might activate pale and/or thick stripes (both of which contain orientation domains), while the texture percept stimulus may activate surface-related bands in V4.

      Thank you for your suggestions. They are extremely helpful in improving our manuscript. For the controls you mentioned in (a-d), we discussed them in the public review that we also attached below.

      (a) and (b): To demonstrate the effectiveness and specificity of our stimuli, we conducted a new 3T fMRI experiment in five participants using an experimental design and texture families similar to those in Freeman (2013). All texture stimuli in the 7T experiment were also included. To assess the effectiveness of each stimulus type, different texture families and their corresponding noise patterns were presented in separate blocks for 24 seconds, at a high presentation rate of 5 frames per second. In Figure S7, all texture families showed significantly stronger activation in V2 compared to their corresponding noise patterns, even for those that ‘appeared’ to have residual texture (e.g., the third texture family). These results suggest that our texture stimuli were effective in producing texture-selective activations in area V2 compared to the noise control. Compared to the 7T results, the 3T data showed a notable increase in texture-selective activations in V2, likely due to the increased stimulus presentation speed (1.25 vs. 5 frames/second). Weak texture activations might preclude the detection of columnar representations in the 7T experiment.

      (c) Thank you for pointing out the possible structures of texture-selective activations in the peripheral visual field (Figure S1). In further analyses, we also found stronger texture selectivity in more peripheral visual fields (Figure 2D), and there were weak but significant correlations in the texture-noise activation patterns during split-half analysis (Author response image 2). Although these are not strong evidence for columnar organization of naturalistic textures, it suggests a possibility for such organizations in the peripheral visual field.

      (d) Although our fMRI result at 1-mm isotropic resolution did not show strong evidence for modular processing of naturalistic texture in V2 stripe columns, this does not exclude the possibility that smaller modules exist beyond the current fMRI resolution. We have discussed these limitations in the revised manuscript.

      We fully agree with your explanation in (e). It fits our data very well. Both texture and control stimuli strongly activated the CO-stripes (Figure 2 and Figure 2D), while modular organizations for texture were found in V4 and V3ab (Figure S9). We have discussed this explanation in the revised manuscript.

      In Line 371-374: “Consistently, our pilot results also revealed modular organizations for textures in V4 and V3ab (Figure S9). These texture-selective organizations may be related to surface representations in these higher order visual areas (Wang et al., 2024).”

      (2) Overly simple description of FF, FB circuitry. The classic anatomical definition of feedforward is output from a 'lower' area, in most cases predominantly arising from superficial layers and projecting to middle layers of a 'higher area' (Felleman and Van Essen 1991). This description holds for V1-to-V2, V2-to-V3, and V2-to-V4. [Note there are also feedforward projections from central 5 degrees of V1-to-V4 (cf. Ungerleider) as well as V3-to-V4.] The definition of feedback can be more varied but is generally considered from cells in superficial and deep layers of 'higher' areas projecting to superficial and deep layers of 'lower' areas. Feedback inputs to V1 heavily innervate Layer 1 and superficial Layer 2, as well as the deep layers. Note that feedback connections from V2 to V1, similar to that from V1 to V2, are functionally specific, i.e. thin-to-blob and pale/thick-to interblob (Federer...Angelucci 2021, Hu...Roe 2022). Thus, current views are moving away from the dogma that feedback is diffuse. Recognition that feedback may be modular introduces new ideas about analysis.

      Thanks for your detailed recommendations. We have expanded the discussion of circuit models of functional connectivity in the introduction. Our model and experiments primarily aim to investigate how higher-level areas provide feedback to the V2 area. While we acknowledge that feedback may indeed be functionally specific, our methodology has some certain advantages: it ensures signal stability and avoids the double-dipping issue. Meanwhile, it also focuses on voxels with high feature selectivity, which may already be included in the modular organizations of early visual areas. In the functional connectivity analysis, we performed feature selection to use the most informative voxels. These voxels with high feature selectivity should already be included in the modular organizations of early visual areas. Identifying functionally specific feedback connections between modular areas will be an important and meaningful work for future research. We have added a discussion of this topic in the revised manuscript.

      In Line 136-138: “Only major connections were shown here. There are also other connections, such as V1 interblobs projecting to thick stripes (Federer et al., 2021; Hu & Roe, 2022; Sincich and Horton, 2005).”

      (3) Imaging superficial layers: Although removal of the top layer of cortical voxels (top 5% of voxels) is a common method for dealing with surface vascular artifact contribution to BOLD signal, it likely removes a portion of the Layer 1&2 feedback signals. Is this why the authors define feedback and deep layer to deep layer? If so, both superficial and deep-layer data in Figure 4 should be explicitly explained and discussed.

      Thank you for pointing this out. We would like to clarify the surface-based method removing vascular artifact. The vertices influenced by large pial veins were first defined on the cortical surface, and then voxels were removed from the entire columns corresponding to these vertices to avoid sampling bias along the cortical depth. Thus, there should be complete data from all cortical depths for the remaining columns. We defined the feedback connectivity from deep layers to deep layers because it represents strong feedback connections according to literature (Markov et al., 2013; Ullman, 1995) and also avoids confounding the feedforward signals from superficial layers.

      Markov, N. T., Vezoli, J., Chameau, P., Falchier, A., Quilodran, R., Huissoud, C., Lamy, C., Misery, P., Giroud, P., Ullman, S., Barone, P., Dehay, C., Knoblauch, K., & Kennedy, H. (2014). Anatomy of hierarchy: feedforward and feedback pathways in macaque visual cortex. The Journal of comparative neurology, 522(1), 225–259. https://doi.org/10.1002/cne.23458

      Ullman S. (1995). Sequence seeking and counter streams: a computational model for bidirectional information flow in the visual cortex. Cerebral cortex, 5(1), 1–11. https://doi.org/10.1093/cercor/5.1.1

      (4) More detail on other subjects in Figure S1. Ten subjects conducted visual fixation and used a bite bar. Imaging data are illustrated in detail from one subject and the remaining subjects are depicted in graphs and in Supplemental Figure S1. Please provide arrowheads in each image to help guide the reader. Some kind of summary or index of modularity would also be helpful.

      Thanks for your suggestions. There are arrowheads in each image in our original manuscript and we have revised Figure S1 for better illustration. Additionally, we have added a table summarizing the number of stripes to provide a clearer overview.

      (5) How are ROIs in V3ab and V4 defined? V2 ROIs were defined (thin, thick, and pale stripe), but V3ab and V4 averaged across the whole area. Why not use the most activated "domains" from V3ab and V4? How does this influence connectivity analysis?

      Thank you for your question. We defined V4 and V3ab on the cortical surface using a retinotopic atlas (Benson 2018), which has been shown to be quite accurate in defining ROIs for the early visual areas. Since all ‘domains’ showed robust BOLD activation to our stimuli, we used voxels from the entire ROI in the depth-dependent analysis. In the functional connectivity analysis, we used the most informative voxels by feature selection, which should already be included in the feature domains.

      Minor:

      English language editing is needed.

      Thank you for your feedback. We have carefully revised the manuscript for clarity and readability.

      Line 31 "its" should be "their".

      Thank you. We have corrected "its" to "their".

      Replace 'representative subject' with 'subject'.

      We have replaced "representative subject" with "subject" in the manuscript.

      Replace 'naturalistic texture' with 'texture'.

      Thank you for your suggestion. The textures used in our experiment were generated based on the algorithm by Portilla and Simoncelli (2000), and the term "naturalistic texture" was used to be consistent with literature. The textures used in our study are different from traditional artificial textures, as they contain higher-order statistical dependencies. Following your recommendations, we have replaced ‘naturalistic texture’ with ‘texture’ in some places in the main text to improve readability.

      Typo: Line 126, Fig 2B should be 1B.

      Thank you. We have corrected "Fig 2B" to "Fig 1B" in Line 128.

      Fig. 2A: point out where are texture domains in anterior V2.

      The texture-selective activations in anterior V2 (corresponds to peripheral visual field) have been highlighted by arrowheads.

      Fig 2B, 3 legend: Round symbols are for each subject?

      Yes, the round symbols in Figures 2B represent data for individual participants. We have revised the legend for clarity.

      Fig. 3: Disparity and texture values do not look different across depth (except may the V2 texture values).

      While the difference in feature selectivity is small across cortical depths, they are highly consistent across participants. We have provided a figure showing the original BOLD responses in the revised manuscript (Figure S8 and Figure S8). Data from individual subjects were also available at Open Science Framework (OSF, https://doi.org/10.17605/OSF.IO/KSXT8 (‘rawBetaValues.mat’ in the data directory)).

      Line 57-59 The statement is not strictly accurate. V1 also has color, orientation, and motion representations.

      Thank you for your feedback. Our statement was intended to convey that M and P information from the geniculate input are transformed into representations of color, orientation, disparity, and motion in the primary visual cortex. We have clarified this point in the revised manuscript.

      In Line 58-60: “In the primary visual cortex (V1), the M and P information from the geniculate input are transformed into higher-level visual representations, such as motion, disparity, color, orientation, etc. (Tootell & Nasr, 2017).”

      Fig. 1B V1 interblobs also project to thick stripes (Sincich and Horton).

      Thank you for the additional information. We appreciate your input. Our figure is intended as a simplified schematic and does not fully represent all the connections. We have discussed this reference in the revised manuscript.

      In Line 136-138: “Only major connections were shown here. There are also other connections, such as V1 interblobs projecting to thick stripes (Federer et al., 2021; Hu & Roe, 2022; Sincich and Horton, 2005).”

      Line 207 "suggesting that both local and feedforward connections are involved in processing color information in area V2." Logic? English?

      Thank you for pointing this out. The superficial layers are involved in local intracortical processing by lateral connections and also send output to higher order visual areas along the feedforward pathway. Thus, the strongest color selectivity in the superficial depth of V2 supports that color information was processed in local neural circuits in area V2 and transmitted to higher order areas along the feedforward pathway. We have revised the manuscript for clarity.

      In Line 241-245: “According to the hierarchical model, the strongest color selectivity in the superficial cortical depth is consistent with the fact that color blobs locate in the superficial layers of V1 (Figure 1B, Felleman & Van Essen, 1991; Hubel & Livingstone, 1987; Nassi & Callaway, 2009). The strongest color selectivity in superficial V2 suggests that both local and feedforward connections are involved in processing color information (Figure 1C).”

      Line 254 "Laminar". Please use "cortical depth" or explicitly state that 'laminar' refers to superficial, middle, and deep as defined by cortical depth.

      Thank you for your suggestion. We have clarified the term "laminar" in the manuscript as referring to superficial, middle, and deep layers as defined by cortical depth.

      In Line 96-99: “To better understand the mesoscale functional organizations and neural circuits of information processing in area V2, the present study investigated laminar (or cortical depth-dependent) and columnar response profiles for color, disparity, and naturalistic texture in human V2 using 7T fMRI at 1-mm isotropic resolution.”

      Fig. S5 Please add a unit of isoluminance.

      Thank you for your suggestion. Supplementary Figure S10A and S10B illustrate the blue-matched luminance levels in RGB index. In our isoluminance experiment, blue was set as the reference color (RGB [0 0 255]) to measure the red and gray isoluminance.

      Line 448-449 To make this rationale clearer, refer to:

      Wang J, Nasr S, Roe AW, Polimeni JR. 2022. Critical factors in achieving fine‐scale functional MRI: Removing sources of inadvertent spatial smoothing. Human Brain Mapping. 43:3311-3331.

      Thank you for your suggestion. We have added this reference to better support the rationale of data analysis.

      Reviewer #2:

      (1) Line 126 should refer to Figure 1B.

      Thank you. We have corrected the reference in the revised manuscript as Figure 1B.

      (2) Even if only one naturalistic texture session was acquired per participant, it might be interesting to see the within-session repeatability by, e.g., splitting the texture runs into two halves.

      Thank you for your suggestion. We performed a split-half correlation analysis for participants who completed 10 runs in the naturalistic texture session. The result from one representative subject was shown in the figure below (for other participants, r = 0.38, 0.38, 0.24, and 0.23, respectively).

      Author response image 2.

      Split-half correlations for the texture-selective activation maps in a representative subject (S01) in V2.

      (3) Unfortunately, Figure S2 only shows the stripe ROIs but not V3ab or V4 ROIs. Including another figure that shows all ROIs in more detail would be interesting.

      Thank you for your suggestion. We have included a figure showing the ROIs for V4 and V3ab (the black dotted lines in Figure S9).

      (4) It would be helpful for the reader to have a more detailed discussion about methodological limitations, including the unspecificity of the GE-BOLD signal (Engel et al., 1997, Cereb Cortex, 7, 181-192; Parkes et al., 2005, MRM, 54, 1465-1472; Fracasso et al., 2021, Prog Neurobiol, 202, 102187) and the used voxel sizes.

      Thank you for your suggestion. We have added a more detailed discussion about the methodological limitations, including the unspecificity of the GE-BOLD signal and the voxel sizes used.

      In Line 397-408: “Due to the limitations of the T2*w GE-BOLD signal in its sensitivity to large draining veins (Fracasso et al., 2021; Parkes et al., 2005; Uludag & Havlicek, 2021), the original BOLD responses were strongly biased towards the superficial depth in our data (Figure S8). Compared to GE-BOLD, VASO-CBV and SE-BOLD fMRI techniques have higher spatial specificity but much lower sensitivity (Huber et al., 2019). As shown in a recent study (Qian et al., 2024), using differential BOLD responses in a continuous¬¬ stimulus design can significantly enhance the laminar specificity of the feature selectivity measures in our results (Figure 3). Compared to the submillimeter voxels, as used in most laminar fMRI studies, our fMRI resolution at 1-mm isotropic voxel may have a stronger partial volume effect in the cortical depth-dependent analysis. However, consistent with our results, previous studies have also shown that 7T fMRI at 1-mm isotropic resolution can resolve cortical depth-dependent signals in human visual cortex (Roefs et al., 2024; Shao et al., 2021).”

      (5) If I understand correctly, different numbers of runs/sessions were acquired for different subjects. It would be good to discuss if this could have impacted the results, e.g., different effect sizes could have biased the manual ROI definition.

      Thank you for your suggestion. Although there were differences in the number of runs/sessions acquired for different subjects, there were at least four runs of data for each experiment, which should be enough to examine the within-subject effect. We have discussed this point in the revised manuscript.

      In Line 481-484: “Although the number of runs were not equal across participants, there were at least four runs (twenty blocks for each stimulus condition) of data in each experiment, which should be sufficient to investigate within-subject effects.”

      (6) It would be good to add the software used for layer definition. Was it Laynii?

      We have provided more details in the revised methods.

      In Line 523-526: “An equi-volume method was used to calculate the relative cortical depth of each voxel to the white matter and pial surface (0: white matter surface, 1: pial surface, Supplementary Figure S11A), using mripy (https://github.com/herrlich10/mripy).”

      (7) It would be interesting to see (at least for one subject) the contrasts of color-selective thin stripes and disparity-selective thick stripes from single sessions to demonstrate the repeatability of measurements.

      Thank you for your suggestion. We have shown the test-retest reliability of the response pattern of color-selective thin stripes and disparity-selective thick stripes in a representative subject in Figure S5.

      (8) By any chance, do the authors also have resting-state data from the same subjects? It would be interesting to see the connectivity analysis between stripes and V3ab, V4 with resting-state data.

      Thank you for your suggestion. Unfortunately, we do not have resting-state data from the same subjects at this time. We agree with you that layer-specific connectivity analysis with resting-state data is very interesting and worth investigating in future studies.

      Reviewer #3:

      (1) For investigating information flow across areas, the authors rely on layer-specific informational connectivity analyses, which is an exciting approach. Covariation in decoding accuracy for a specific dependent variable between the superficial layers of a lower area and the middle layer of a higher area is taken as evidence for feedforward connectivity, whereas FB was defined as the connection between the two deep layers. Yet this method is not assumption-free. For example, the canonical idea (Figure 1C) of FF terminals exclusively arriving in layer 4 and FB terminals exclusively terminating in supra-or infragranular layers is not entirely correct. This is not even the case for area V1 - see for example Kathy Rockland's exquisite tractography studies, showing that even single axons with branches terminating in different layers. Also, feedback signals not only arrive in the deep layers of a lower area. Although these informational connectivity analyses can be suggestive of information flow, this reviewer doubts it can be considered as conclusive evidence. Therefore, the authors should drastically tone down their language in this respect, throughout the text. They present suggestive, not conclusive evidence. To obtain truly conclusive evidence, one likely has to perform laminar electrophysiological recordings simultaneously across multiple areas and infer the directionality of information flow using, for example, granger causality.

      Thank you for pointing out this important issue. In our response to a previous question (Reviewer #1, the 2nd comment), we have discussed other possible connections in addition to the canonical feedforward and feedback pathways. In the revised manuscript, the conclusion has been toned down to properly reflect our findings. However, we would also like to emphasize that our conclusion about laminar circuits was supported by converging lines of evidence. For example, in addition to the depth-dependent connectivity results, the role of feedback circuit in processing texture information was also supported by greater selectivity in V4 than V2, and the strongest deep layer selectivity in V2 (Figure 3C).

      (2) In the same realm, how reproducible are the information connectivity results? In the first part of the study, the authors performed a split-half analyses. This should be also done for Figure 4.

      Thank you for your suggestion. We have performed a split-half analysis for the informational connectivity results. As shown in Author response image 3, the results for the color experiment were robust and reproducible, while the disparity and texture connectivity results were less consistent between the two halves. The results from the second half (Author response image 3, below) are more consistent with the original findings (Figure 4). Overall, the pattern of results were qualitatively similar between the two halves. The inconsistency may be due to the fact that some participants had only four runs of data, which could make the split-half analysis less reliable.

      Author response image 3.

      Split-half analysis of informational connectivity.

      (3) Most of the other layer-specific claims (not the ones about the flow of information) are based on indices. It is unclear which ROIs contributed to these indices. Was it the entire extent of V1, V2, ...? Or only the visually-driven voxels within these areas? How exactly were the voxels selected? For V2, it would make sense to calculate the selectivity indices independently for the disparity and color-selective (putative) thick and (putative) thin stripe compartments, respectively. Adding voxels of non-selective compartments (e.g. putative thick stripe voxels for calculating the color-index; or adding putative thin-strip voxels for calculating the disparity index), will only add noise.

      In the revised manuscript, we have clarified that we selected the entire ROI in the depth-dependent analysis. Since our study does not have an independent functional localizer, using the entire ROI avoids the problem of double dipping. The processing of visual features is not confined solely to specific stripes. We have also provided a more comprehensive explanation of this issue in the discussion section.

      In Line 541-544: “For the cortical depth-dependent analyses in Figure 3, we used all voxels in the retinotopic ROI. Pooling all voxels in the ROI avoids the problem of double-dipping and also increases the signal-to-noise ratio of ROI-averaged BOLD responses.”

      (4) It is apparent from Figure 3, that the indices are largely (though not exclusively) driven by 2 subjects. Therefore, this reviewer wishes to see the raw data in addition to a table for calculating the color, disparity, and texture selectivity indices -along with the number of voxels that contributed to it.

      Thank you for your suggestion. We have provided a figure showing the original BOLD responses (Figure S8 and Figure S8). Data from individual subjects were also available at Open Science Framework (OSF, https://doi.org/10.17605/OSF.IO/KSXT8 (‘rawBetaValues.mat’ in the data directory)).

      Minor:

      (1) I typically find inferences about 'layer fMRI' vastly overstated. We all know that fMRI does not (yet) provide laminar-specific resolution, i.e., whereby meaningful differences in fMRI signals can be extracted from all 6 individual layers of neocortex, without partial volume effects, or without taking into account pre-and postsynaptic contributions of neurons to the fMRI signal (the cell bodies may very well lay in different layers than the dendritic trees etc.), or without taking into account the vascular anatomy, etc. The authors should use the term cortical depth-dependent fMRI throughout the text -as they do in the abstract and intro.

      Thank you for pointing out this important issue. We have now defined the meaning of layer or laminar as “cortical depth-dependent” in the introduction, to be consistent with the terminology in most published papers on this topic.

      (2) 1st sentence abstract: I disagree with this statement. The parallel streams in intermediate-level areas are probably equally well studied as the geniculostriate pathway -already starting with the seminal work of Hubel, Livingstone, and more recently by Angelucci and co-workers who looked in detail at the anatomical and functional interactions across sub-compartments of V1 and V2.

      Thank you for your feedback. In the revised manuscript, we have removed the term "much" from the first sentence of the abstract. Although there have been seminal studies of V2 sub-compartments in monkeys, only a few fMRI studies investigated this issue in humans.

      (3) The authors show inter-session correlations for color and disparity. This reviewer would like to see test-retest images since the explained variance is not terribly good. Also, show the correlation values for the inter-session texture beta values.

      Thank you for your suggestion. We have performed the test-retest reliability analysis of texture-selective patterns in the response to a previous question (Reviewer #2, the 2nd comment, Author response image 2).

      (4) The stripe definitions are threshold dependent. Please clarify whether the reported results are threshold-independent.

      Thank you for your question. To address your concern, we defined the stripe ROIs using different thresholds, and the results remained consistent. Specifically, we ranked the voxels in manually defined stripe ROIs by the color-disparity response. We then defined the lowest 10% as the thick stripe voxels, the highest 10% as thin stripe voxels, and the middle 10% as pale stripe voxels. Additionally, we adjusted the thresholds to 20% and 30% to define the three stripes (with 30% being the least strict threshold). Feature selectivities at different thresholds were shown in Figure S6 (from left to right: 10%, 20%, 30%). Notably, in all threshold conditions, there was no significant difference in texture selectivity across different stripes.

      (5) How were the visual areas defined?

      In the revised manuscript, we have provided a detailed description about methods.

      In Line 531-535: “ROIs were defined on the inflated cortical surface. Surface ROIs for V1, V2, V3ab, and V4 were defined based on the polar angle atlas from the 7T retinotopic dataset of Human Connectome Project (Benson et al., 2014, 2018). Moreover, the boundary of V2 was edited manually based on columnar patterns. All ROIs were constrained to regions where mean activation across all stimulus conditions exceeded 0.”

      (6) "According to the hierarchical model in Figure 1B and 1C, the strongest color selectivity in the superficial cortical depth is consistent with the fact that color blobs mainly locate in the superficial layers of V1, suggesting that both local and feedforward connections are involved in processing color information in area V2." But color-selective activation within V2 could be also consistent with feedback from other areas (some of which were not covered in the present experiments) -the more since most parts of the brain were not covered (i.e. a slab of 4 cm was covered)?

      Thank you for reminding us about this issue. We have discussed the possibility of feedback influence in explanation of the superficial bias of color selectivity in area V2.

    1. eLife Assessment

      This valuable study investigated the role of PLECTIN, a cytoskeletal crosslinker protein, in hepatocellular carcinoma development and progression. Using a liver-specific Plectin knockout mouse model, the authors showed solid evidence that PLECTIN is critical for hepatocarcinogenesis, since inhibition of PLECTIN suppressed tumor formation in multiple models. They also show that PLECTIN is key for HCC invasion and metastasis. They show a correlation between PLECTIN inhibition and attenuated FAK, MAPK/ERK, and PI3K/AKT signaling.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the role of PLECTIN, a cytoskeletal crosslinker protein, in liver cancer formation and progression. Using the liver-specific Plectin knockout mouse model, the authors convincingly showed that PLECTIN is critical for hepatocarcinogenesis, as functional inhibition of PLECTIN suppressed tumor formation in several models. They also provided evidence to show that inhibition of PLECTIN inhibited HCC cell invasion and reduced metastatic outgrowth in the lung. Mechanistically, they suggested that PLECTIN inhibition attenuated FAK, MAPK/ERK, and PI3K/AKT signaling.

      Strengths:

      The authors generated a liver-specific Plectin knockout mouse model. By using DEN and sgP53/MYC models, the authors convincingly demonstrated an oncogenic role of PLECTIN in HCC development. plecstatin-1 (PST), as a plectin inhibitor, showed promising efficacy in inhibiting HCC growth, which provides a basis for potentially treating HCC using PST.

      The MIR images for tracking tumor growth in animal models were compelling. The high-quality confocal images and related qualifications convincingly showed the impact of plectin functional inhibition on contractility and adhesions in HCC cells.

      Comments on latest version:

      My concerns have been largely addressed. The authors did a good job in addressing the questions and clarifying the inconsistent results. I have two comments:

      (1) The current data still cannot support the conclusion that plectin inactivation attenuates HCC oncogenic potential through FAK, Erk1/2, and PI3K/Akt axis, unless they can reactivate these signaling to restore the HCC congenic potential in plectin inactivated cells. It might be more appropriate to claim that plectin inactivation suppresses FAK, Erk1/2, and PI3K/Akt oncogenic signaling.

      (2) I think it would be beneficial to include the H&E and HNF4α staining from lung tissue of mice inoculated with WT Huh7 cells indicated in the rebuttal letter.

    3. Reviewer #2 (Public review):

      Summary:

      Plectin is a cytolinker that associates with cytoskeletal and intercellular junction proteins and is essential for epithelial integrity and cell migration. Previous reports showed that PLEC regulates tumor growth and metastasis in different cancers. In this manuscript, the authors describe PLEC as a target in initiation and growth of HCC. They show that inhibiting PLEC reduced tumorigenesis in different in vitro and in vivo HCC models, including in a xenograft model, DEN model, oncogene-induced HCC model and a lung metastasis model. A drug PST had similar effects, a purported Plectin inhibitor, suggesting that PLEC inhibition could be a tumor prevention or treatment strategy. Mechanistically, the authors show that inhibiting PLEC results in a disorganized cytoskeleton, deficiency in cell migration, and changes in cancer-relevant signaling pathways. This study demonstrates the importance of understanding mechanobiology of HCC for the development of new treatment strategies.

      Strengths:

      (1) This study used a variety of in vivo models to explore the role of Plectin in HCC formation and metastasis, which extend beyond the cell line-based studies reported in prior research.<br /> (2) Blocking PLEC disrupts pathways that promote tumors and cell migration, thus preventing tumor progression.<br /> (3) Overall, the anti-cancer phenotype is promising, strengthening the important role of PLEC and related factors in tumor growth and metastasis.

      Weaknesses:

      (1) There is limited novel mechanistic insights as the effect of inhibiting PLEC on the cytoskeleton, cell migration and related signaling pathways have previously been reported.<br /> (2) The results associated with PST, should be interpretated with caution. Although it is reported as an inhibitor of PLECTIN, and the phenotypes and pathways affected are similar to the knock-out, additional research is needed to support whether it will be safe and specific in treating or preventing HCC.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Outla Z et al described the analysis of Plectin in HCC pathogenesis. Specifically, it was found that elevated Plectin levels in liver tumors, correlated with poor prognosis for HCC patients. Mechanistically, it showed that Plectin-dependent disruption of cytoskeletal networks leads to the attenuation of oncogenic FAK, MAPK/Erk, and PI3K/AKT signals. Finally, the authors showed that Plectin inhibitor plecstatin-1 (PST) is well-tolerated and capable of overcoming therapy resistance in HCC.

      Strengths:

      The studies of Plectin are not entirely novel (Pubmed: 36613521). Nevertheless, the current manuscript provides a much more detailed mechanistic study and the results have translational implications. Additional strengths include convincing cell biology data, such as Plectin regulates cytoskeletal networks, and HCC migration/invasion.

      Comments on latest version:

      The authors have addressed my comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point responses to the reviewers' comments:

      All three reviewers found our analysis of focal adhesion-associated oncogenic pathways (Figs 3 and S3) to be inconsistent (Reviewer 1), not convincing/consistent (Reviewer 2, #2), and too variable and not well supported (Reviewer 3, #2). This was probably the basis for the eLife assessment, which stated: “However, the study is incomplete because the downstream molecular activities of PLECTIN that mediate the cancer phenotypes were not fully evaluated.” We agree with the reviewers that the degree of attenuation of the FAK, MAP/Erk, and PI3K/AKT signaling pathways differs depending on the cell line used (Huh7 and SNU-475) and the mode of inactivation (CRISPR/Cas9-generated plectin KO, functional KO (∆IFBD), and organoruthenium-based inhibitor plecstatin-1). However, we do not share the reviewers' skepticism about the unconvincing nature of the data presented.

      Several previous studies have shown that plectin inactivation invariably leads to dysregulation of cell adhesions and associated signaling pathways in various cell systems. The molecular mechanisms driving these changes are not fully understood, but the most convincingly supported scenarios are uncoupling of keratin filaments (hemidesmosomes; (Koster et al., 2004)) and vimentin filaments (focal adhesions; (Burgstaller et al., 2010; Gregor et al., 2014)) from adhesion sites in conjunction with altered actomyosin contractility (Osmanagic-Myers et al., 2015; Prechova et al., 2022; Wang et al., 2020). This results in altered morphometry (Wang et al., 2020), dynamics (Gregor et al., 2014), and adhesion strength (Bonakdar et al., 2015) of adhesions. These changes are accompanied by reduced mechanotransduction capacity and attenuation of downstream signaling such as FAK, Src, Erk1/2, and p38 in dermal fibroblasts (Gregor et al., 2014); decrease in pFAK, pSrc, and pPI3K levels in prostate cancer cells (Wenta et al., 2022); increase in pErk and pSrc in keratinocytes (Osmanagic-Myers et al., 2006); decrease in pERK1/2 in HCC cells (Xu et al., 2022) and head and neck squamous carcinoma cells (Katada et al., 2012).  

      Consistent with these published findings, we show that upon plectin inactivation, the HCC cell line SNU475 exhibits aberrant cytoskeletal organization (vimentin and actin; Figs 4A-D, S4A-F), altered number, topography and morphometry of focal adhesions (Figs 4A, E-G, S4H,I), and ineffective transmission of traction forces (Fig 4H,I). Similar, although not quantified, phenotypes are present in Huh7 with inactivated plectin (data not shown). It is worth noting, that even robust cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (%central FA, Fig 4A,E) phenotypes differ significantly between different modes of plectin inactivation and would certainly do so if compared between cell lines. These phenotypes are heterogeneous but not inconsistent. Interestingly, both SNU-475 and Huh7 plectin-inactivated cells show similar functional consequences such as prominent decrease in migration speed (Fig 5B). This suggests that while specific aspects of cytoarchitecture are differentially affected in different cell lines, the functional consequences of plectin inactivation are shared between HCC cell lines.

      It is therefore not surprising that the activation status of downstream effectors, resulting from different degrees of cytoskeletal and focal adhesion reconfiguration, is not identical (or even comparable) between cell lines and treatment conditions. Furthermore, we compare highly epithelial (keratin- and almost no vimentin-expressing) Huh7 cells with highly dedifferentiated (low keratin- and high vimentinexpressing) SNU-475 cells, which differ significantly in their cytoskeleton, adhesions, and signaling networks. Alternative approaches to plectin inactivation are not expected to result in the same degree of dysregulation of specific signaling pathways. Effects of adaptation (CRISPR/Cas9-generated KOs and ∆IFBDs), engagement of different binding domains (CRISPR/Cas9-generated ∆IFBDs), and pleiotropic modes of action (plecstatin-1) are expected.

      In our study, we provide the reader with an unprecedented complex comparison of adhesion-associated signaling between WT and plectin-inactivated HCC cell lines. First, we compared the proteomes of WT, KO and PST-treated WT SNU-475 cells using MS-based shotgun proteomics and phosphoproteomics (Fig 3A-C). Second, we extensively and quantitatively immunoblotted the major molecular denominators of MS-identified dysregulated pathways (such as “FAK signaling”, “ILK signaling”, and “Integrin signaling”) with the following results. Data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values are highlighted in red:

      Author response table 1.

      In addition, we show dysregulated expression (mostly downregulation) of focal adhesion constituents ITGβ1 and αv, talin, vinculin, and paxilin which nicely complements fewer and larger focal adhesions in plectin-inactivated HCC cells. In light of these results, we believe that our statement that “Although these alterations were not found systematically in both cell lines and conditions (reflecting thus presumably their distinct differentiation grade and plectin inactivation efficacy), collectively these data confirmed plectin-dependent adhesome remodeling together with attenuation of oncogenic FAK, MAPK/Erk, and PI3K/Akt pathways upon plectin inactivation” (see pages 8-9) is fully supported. Furthermore, in support of the results of MS-based (phospho)proteomic and immunoblot analyses we show strong correlation between plectin expression and the signatures of “Integrin pathway” (R<sup>2</sup>=0.15, p= 2x10<sup>-45</sup>), “FAK pathway” (R<sup>2</sup>=0.11, p= 2x10<sup>-34</sup>), “PI3K Akt/mTOR signaling” (R<sup>2</sup>=0.06, p= 2x10<sup>-20</sup>) or “Erk pathway” (R<sup>2</sup>=0.10, p= 6x10<sup>-30</sup>) in HCC samples from 1268 patients (Fig S7-2C and S7-3).

      In conclusion, we show that plectin is required for proper/physiological adhesion-associated signaling pathways in HCC cells. The HCC adhesome and associated pathways are dysregulated upon plectin inactivation and we show context-dependent varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways. In our view, presenting context-dependent variability in expression/activation of pathway molecular denominators is a trade-off for our intention to address this aspect of plectin inactivation in the complexity of different cell lines, tissues, and modes of inactivation. We prefer rather this complex approach to presenting “more convincing” black-and-white data assessed in a single cell line (Qi et al., 2022) or upon plectin inactivation by a single approach (compare with otherwise excellent studies such as (Xu et al., 2022) or (Buckup et al., 2021)). In fact, unlike the reviewers, we consider this complexity (and the resulting heterogeneity of the data) to be a strength rather than a weakness of our study.

      Reviewer 1:

      (1) The authors suggest that plectin controls oncogenic FAK, MAPK/Erk, and PI3K/Akt signaling in HCC cells, representing the mechanisms by which plectin promotes HCC formation and progression. However, the effect of plectin inactivation on these signaling was inconsistent in Huh7 and SNU-475 cells (Figure 3D), despite similar cell growth inhibition in both cell lines (Figure 2G). For example, pAKT and pERK were only reduced by plectin inhibition in SNU-475 cells but not in Huh7 cells.

      We agree with the reviewer that plectin inactivation yields varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways depending on the cell type (Huh7 vs SNU-475 cells) and mode of plectin inactivation (CRISPR/Cas9-generated plectin KO vs functional KO (∆IFBD) vs organorutheniumbased inhibitor plecstatin-1). This context-dependent heterogeneity in the expression/activation of molecular denominators of signaling pathways reflects different degrees of cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (e.g. %central FA, Fig 4A,E) phenotypes under different conditions. We expect, that functional consequences (such as reduced migration and anchorage-independent proliferation) arise from a combination of changes in individual pathways. The sum of often subtle changes will result in comparable effects not only on cell growth, but also on migration or transmission of traction forces. For more detailed comment, please see our response to all Reviewers on the first three pages of this letter.

      We believe, that our data show that both pAkt and pErk are attenuated upon plectin inactivation in both Huh7 and SNU-475 cells. The following data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values are highlighted in red:

      Author response table 2.

      (2) In addition, pFAK was not changed by plectin inhibition in both cells, and the ratio of pFAK/FAK was increased in both cells.

      We agree with the reviewer that pFAK/FAK levels are either comparable or slightly higher upon plectin inactivation. However, we believe that our data convincingly show that FAK expression is downregulated in both Huh7 and Snu-475 cells. In our opinion, this results in an overall attenuation of the FAK signaling (see percentage for Normalized pFAKxNormalized FAK), which is expectedly more pronounced in migratory Snu-475 cells. The following data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values are highlighted in red:

      Author response table 3.

      Given these results, we feel that our statement that “inhibition of plectin attenuates FAK signaling” (pages 8-9) is well supported.

      (3) Thus, it is hard to convince me that plectin promotes HCC formation and progression by regulating these signalings.

      Previous studies have shown that dysregulation of cell adhesions and attenuation of adhesionassociated FAK, MAPK/Erk, and PI3K/Akt signaling has inhibitory effects on HCC formation and progression. We show that plectin is required for the proper/physiological functioning of adhesionassociated signaling pathways in selected HCC cells. The HCC adhesome and associated pathways are dysregulated upon plectin inactivation and we show context-dependent varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways. We support these conclusions by providing the reader with proteomic and phosphoproteomic comparisons of adhesion-associated signaling between WT and plectin-inactivated HCC cell lines (Figs 3B,C and S3A,B). We further validate our findings by extensive and quantitative immunoblotting analysis (Figs 3D and S3C). In addition, we show a strong correlation between plectin expression and the signatures of “Integrin pathway” (R<sup>2</sup>=0.15, p= 2x10<sup>-45</sup>), “FAK pathway” (R<sup>2</sup>=0.11, p= 2x10<sup>-34</sup>), “PI3K Akt/mTOR signaling” (R<sup>2</sup>=0.06, p= 2x10<sup>-20</sup>) or “Erk pathway” (R<sup>2</sup>=0.10, p= 6x10<sup>-30</sup>) in HCC samples from 1268 patients (Fig S7E).

      Our data and conclusions are fully consistent with previously published studies in HCC cells. For instance, even a mild decrease in FAK levels leads to a significant reduction in colony size (see effects of KD (Gnani et al., 2017) , effects of FAK inhibitor and sorafenib in xenografts (Romito et al., 2021), or effects of inhibitors in soft agars and xenografts (Wang et al., 2016)). Similar effects were observed upon partial Akt inhibition (compare with Akt inhibitors in soft agars (Cuconati et al., 2013; Liu et al., 2020)). Of course, we cannot rule out synergistic plectin-dependent effects mediated via adhesion-independent mechanisms. To identify these mechanisms and to distinguish contribution of various consequences of cytoskeletal dysregulation to phenotypes described in this manuscript would be experimentally challenging and we feel that these studies go beyond the scope of our current study.

      As we feel that the adhesion-independent mechanisms were not sufficiently discussed in the original manuscript, we have removed the original sentence “Given the well-established oncogenic activation of these pathways in human cancer(33), our study identifies a new set of potential therapeutic targets.” (page 15) from the Discussion and added the following text: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesion-independent effects of plectin inactivation on HCC growth and metastasis.” (page 15). See also our response to Reviewer 2, #4 and Reviewer 3, #3 and #4.

      (4) The authors claimed that Plectin inactivation inhibits HCC invasion and metastasis using in vitro and in vivo models. However, the results from in vivo models were not as compelling as the in vitro data. The lung colonization assay is not an ideal in vivo model for studying HCC metastasis and invasion, especially when Plectin inhibition suppresses HCC cell growth and survival. Using an orthotopic model that can metastasize into the lung or spleen could be much more convincing for an essential claim.

      We agree with the reviewer that the orthotopic in vivo model would be an ideal setting to address HCC metastasis experimentally. There are several published models of HCC extrahepatic metastasis, including an orthotopic model of lung metastasis (Fan et al., 2012; Voisin et al., 2024; You et al., 2016), but to our knowledge, none of these orthotopic models are commonly used in the field. In contrast, the administration of tumor cells via the tail vein of mice is a standard, well-established approach of first choice for modelling lung metastasis in a variety of tumor types (e.g. (Hiratsuka et al., 2011; Jakab et al., 2024; Lu et al., 2020)), including HCC (Jin et al., 2017; Lu et al., 2020; Tao et al., 2015; Zhao et al., 2020). 

      Furthermore, we do not believe that the use of an orthotopic model would provide a comparable advantage in terms of plectin-mediated effects on metastatic growth compared to tail vein delivery of tumor cells. Importantly, the lung colonization model used in our study allows for the injection of a defined number of HCC cells into the bloodstream, thus eliminating the effect of the primary tumor size on the number of metastasizing cells. To distinguish between effects of plectin inhibition on HCC cell growth/survival and dissemination, we carefully evaluated both the number and volume of lung metastases (Figs 6I and S6C-F). The observed reduction in the number of metastases (Figs 6I and S6D) reflects the initiation/early phase of metastasis formation, which is strongly influenced by the adhesion, migration, and invasion properties of the HCC cells and corresponds well with the phenotypes described after plectin inactivation in vitro (Figs 4H,I; 5; 6A-E; S5; and S6A,B). The reduction in the volume of metastases (Figs 6I and S6E) reflects the effects of plectin inhibition on HCC cell growth and metastatic outgrowth and corresponds well with the in vitro data shown in Figs 2G,H and S2F,G.

      (5) Also, in Figure 6H, histology images of lungs from this experiment need to be shown to understand plectin's effect on metastasis better.

      We are grateful to the reviewer for bringing our attention to the lung colonization assay results presented. The description of the experiments in the text of the original manuscript was incorrect. The animals monitored by in vivo bioluminescence imaging (shown in Fig 6H) are the same as the mice from which cleared whole lung lobes were analyzed by lattice light sheet fluorescence microscopy (shown in Fig. 6I). The corrected description is now provided in the revised manuscript as follows: “To identify early phase of metastasis formation, we next monitored the HCC cell retention in the lungs using in vivo bioluminescence imaging (Fig. 6H). This experimental cohort was expanded for WT-injected mice which were administered PST…” (page 11).

      Therefore, lungs from all animals shown in Fig 6H,I were CUBIC-cleared and analyzed by lattice light sheet fluorescence microscopy. As requested by Reviewer 2, Recommendation #1, we provide in the revised manuscript (Fig S6F) “whole slide scan results for all the groups” which could help to understand plectin's effect on metastasis better”. To address the reviewer's concern, we also post-processed cleared and visualized lungs for hematoxylin staining and immunolabeled them for HNF4α. A representative image is shown as a panel A in Author response image 1. Post-processing of CUBIC-cleared and immunolabeled lung lobes resulted in partial tissue destruction and some samples were lost. In addition, as the entire experimental setup was designed for the early phase of metastasis formation, only small Huh7 foci were formed (compared to the larger metastases that developed within 13 weeks after inoculation shown in the panel B). As the IHC for HNF4α provides significantly lower sensitivity compared to the immunofluorescence images provided in the manuscript, we were only able to identify a few HNF4α-positive foci. Overall, we consider our immunofluorescence images to be qualitatively and quantitatively superior to IHC sections. However, if the reviewer or the editor considers it beneficial, we are prepared to show our current data as a part of the manuscript.

      Author response image 1.

      (A) HNF4α staining of lung tissue after CUBIC clearing from mice inoculated with WT Huh7 from the timepoint of BLI, when the positive signal in chest area has been detected. This timepoint was then selected for the comparison of initial stages of lung colonization. (B) H&E and HNF4α staining from lung tissue of mice inoculated with WT Huh7 cells from the survival experiment. Scale bars, 50 µm.

      (6) Figure 6G, it is unclear how many mice were used for this experiment. Did these mice die due to the tumor burdens in the lungs?

      The number of animals is given in the legend to Fig 6G (page 34; N = 14 (WT), 13 (KO)). Large Huh7 metastases were identified in the lungs of animals that could be analyzed post-mortem by IHC (see panel B in the figure above). No large metastases were found in other organs examined, such as the liver, kidney and brain. It is therefore highly likely that these mice died as a result of the tumor burden in the lungs. A similar conclusion was drawn from the results of the lung colonization model in the previous studies (Jin et al., 2017; Zhao et al., 2020).

      (7) The whole paper used inhibition strategies to understand the function of plectin. However, the expression of plectin in Huh7 cells is low (Figure 1D). It might be more appropriate to overexpress plectin in this cell line or others with low plectin expression to examine the effect on HCC cell growth and migration.

      For this study, we selected two model HCC cell lines – Huh7 and SNU-475. Our intention was to investigate the role of plectin in “well-differentiated” (Huh7) and “poorly differentiated” (SNU-475) HCC cells, including thus early and advanced stages of HCC development (as categorized before (Boyault et al., 2007; Yuzugullu et al., 2009a); see also our description and rationale on page 6). As anticipated, less migratory “epithelial-like” Huh7 cells are characterized by relatively high E-cadherin, low vimentin, and low plectin expression levels (Fig 1D). In contrast, migratory “mesenchymal-like” SNU-475 cells are characterized by relatively low E-cadherin, high vimentin, and high plectin expression levels (Fig 1D). Therefore, the majority of analyses were performed in both relatively low plectin-expressing Huh7 and high plectin-expressing SNU-475 cells. It is noteworthy, that inactivation of plectin had similar (although less pronounced) inhibitory effects on growth and migration in both Huh7 and SNU-475 cells.

      We agree with the reviewer that “It might be more appropriate to overexpress plectin in this cell line or others with low plectin expression to examine the effect on HCC cell growth and migration”. In fact, we have received similar suggestions since we started publishing our studies on plectin. There are two reasons, which preclude the successful overexpression experiments. First, there are about 14 known isoforms of plectin (Prechova et al., 2023). Although, previous studies have analyzed the phenotypic rescue potential of some plectin isoforms using transient transfection (e.g. (Burgstaller et al., 2010; Osmanagic-Myers et al., 2015; Prechova et al., 2022)), the isoform variability precludes rescue/overexpression experiments if the causative isoform is not known. Second, plectin is a giant cytoskeletal crosslinker protein of more than 4,500 amino acids with binding sites for intermediate filaments, F-actin, and microtubules. Overexpression of the approximately 500 kDa-large crosslinker invariably leads to the collapse of cytoskeletal networks in every cell type we have tested so far. See also our response to Reviewer 3, #2.

      Reviewer 2:

      (1) The annotation of mouse numbers is confusing. In Figures 2A B D E F, it should be the same experiment, but the N numbers in A are 6 and 5. In E and F they are 8 and 3. Similarly, in Figure 2H, in the tumor size curve, the N values are 4,4,5,6. In the table, N values are 8,8,10,11 (the authors showed 8,7,8,7 tumors that formed in the picture). 

      We are grateful to the reviewer for bringing our attention to the inconsistency the number of animals in DEN-induced hepatocarcinogenesis. Results from two independent cohorts are presented in the manuscript. The first cohort was used for MRI screening (Fig 2A-C) and at the second screening timepoint of 44 weeks, approximately 75% of animals died during anesthesia. Therefore, the second cohort of Ple<sup>ΔAlb</sup> and Ple<sup>fl/fl</sup> mice was used for macroscopic confirmation and histology (Figs 2D-F and S2A). We agree with the reviewer that the original presentation of the data may be misleading; therefore, we have rephrased the sentence describing macroscopic confirmation and histology (Figs 2D-F and S2A) as follows: “Decreased tumor burden in the second cohort of Ple<sup>ΔAlb</sup> mice was confirmed macroscopically…” (page 7).

      For the experiments shown in Fig 2H, mice were injected in both hind flanks. We have added this information to the figure legend along with the correct number of tumors.

      (2) In Figure 3D and Figure S3C, the changes in most of the proteins/phosphorylation sites are not convincing/consistent. These data are not essential for the conclusion of the paper and WB is semi-quantitative. Maybe including more plots of the proteins from proteomic data could strengthen their detailed conclusions about the link between Plectin and the FAK, MAPK/Erk, PI3K/Akt pathways as shown in 3E.

      We agree with the reviewer that plectin inactivation yields varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways depending on the cell type (Huh7 vs SNU-475 cells) and mode of plectin inactivation (CRISPR/Cas9-generated plectin KO vs functional KO (∆IFBD) vs organorutheniumbased inhibitor plecstatin-1). This context-dependent heterogeneity in the expression/activation of pathway molecular denominators reflects different degrees of cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (e.g. %central FA, Fig 4A,E) phenotypes under different conditions. See also the detailed response to all reviewers (on the first three pages of this letter) and the responses to Reviewer 1, #1 and #2, Reviewer 3, #4.

      Our immunoblot analysis is based on NIR fluorescent secondary antibodies which were detected and quantified using an Odyssey imaging system (LI-COR Biosciences). This approach allows a wider linear detection range than chemiluminescence without a signal loss and is considered to provide quantitative immunoblot detection (Mathews et al., 2009; Pillai-Kastoori et al., 2020) (see also manufacturer's website: https://www.licor.com/bio/applications/quantitative-western-blots/).

      Following the reviewer's recommendation, we have carefully reviewed our proteomic and phosphoproteomic data. There are no further MS-based data (other than those already presented in the manuscript) to support the association of plectin with the FAK, MAPK/Erk, PI3K/Akt pathways.

      (3) Figure S7A and B, The pictures do not show any tumor, which is different from Figure 7A and B (and from the quantification in S7A lower right). Is it just because male mice were used in Figure 7 and female mice were used in Figure S7? Is there literature supporting the sex difference for the Myc-sgP53 model?

      As indicated in the Figure legends and in the corresponding text in the Results section (page 12), the Fig 7A,B shows Myc;sgTp53-driven hepatocarcinogenesis in male mice, whereas Fig S7C,D shows results from the female cohort. In general, the HDTVi-induced HCC onset and progression differs considerably between individual experiments, and it is therefore crucial to compare data within an experimental cohort (as we have done for Ple<sup>ΔAlb</sup> and Ple<sup>fl/fl</sup> mice). Nevertheless, we cannot exclude the influence of sexual dimorphism on the results presented. The existence of sexual dimorphism in liver cancer is supported by a substantial body of evidence derived from various studies (e.g. (Bigsby and CaperellGrant, 2011; Bray et al., 2024)). To date, no reports have specifically addressed sexual dimorphism in Myc;sgTp53 HDTVI-induced liver cancer. This is likely due to the fact that the vast majority of studies using this model have only presented data for one sex. However, a study using an HDTVI-administered combination of c-MET and mutated beta-catenin oncogenes to induce HCC in mice observed elevated levels of alpha-fetoprotein (AFP) in males when compared to females (Bernal et al., 2024). The study suggests that estrogen may have a protective effect in female mice, as ovariectomized females had AFP levels comparable to those observed in males. Our data suggest that female hormones may have a similar effect in the Myc;sgTp53 HDTVI-induced liver cancer model.

      (4) Figure 2F, S2A, Ple<sup>ΔAlb</sup> mice more frequently formed larger tumors, as reflected by overall tumor size increase. The interpretation of the authors is "possibly implying reduced migration or increased cohesion of plectin-depleted cells". It is quite arbitrary to make this suggestion in the absence of substantial data or literature to support this theory.

      We agree with the reviewer that our statement “Notably, Ple<sup>ΔAlb</sup> mice more frequently formed larger tumors, as reflected by overall tumor size increase (Fig. 2F; Figure 2—figure supplement 1A), possibly implying reduced migration or increased cohesion of plectin-depleted cells(25).” (page 7) is rather speculative. As we did not further address the formation of larger tumors in Ple<sup>ΔAlb</sup> mice further in the current study, we wanted to provide the readers with some, even speculative, hypotheses. In support of our hypothesis, we cite our own publication (#26; Jirouskova et al., J Hepatol., 2018), where we show that plectin inactivation in Ple<sup>ΔAlb</sup> livers results in upregulation of the epithelial marker E-cadherin. Previous studies have shown that similar increase in E-cadherin expression levels reflects mesenchymalto-epithelial transition (e.g. (Adhikary et al., 2014; Auersperg et al., 1999; Wendt et al., 2011)) and is often associated with reduced cancer cell migration/invasion. This is consistent with our finding that “migrating plectin-disabled SNU-475 cells exhibited more cohesive, epithelial-like features while progressing collectively. By contrast, WT SNU-475 leader cells were more polarized and found to migrate into scratch areas more frequently than their plectin-deficient counterparts (Figure 5—figure supplement 1B). Consistent with this observation, individually seeded SNU-475 cells less frequently assumed a polarized, mesenchymal-like shape upon plectin inactivation in both 2D and 3D environments (Fig. 5C). Moreover, plectin-inactivated SNU-475 cells exhibited a decrease in N-cadherin and vimentin levels when compared to WT counterparts (Figure 5—figure supplement 1C).” (page 10).

      In conclusion, we have shown that plectin-deficient hepatocytes express higher levels of E-cadherin and hepatocyte-derived SNU-475 cells express less N-cadherin and vimentin. In addition, we show that SNU475 cells exhibited more cohesive, epithelial-like features in scratch-wound experiments. To address the reviewer's concern and to further support our statement about the increased cohesiveness of plectindeficient HCC cells we have included the citation of the recent study #27 (Xu et al., 2022). Using the MHCC97H and MHCC97L HCC cell lines, this study shows that plectin downregulation “inhibits HCC cell migration and epithelial mesenchymal transformation”, which is fully consistent with our hypothesis. To mitigate the impression of an unsubstantiated statement, we also discuss adhesion-independent plectin-mediated mechanisms in the revised Discussion section as follows: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesion-independent effects of plectin inactivation on HCC growth and metastasis.” (page 15).

      (5) Mutation or KO PLEC has been shown to cause severe diseases in humans and mice, including skin blistering, muscular dystrophy, and progressive familial intrahepatic cholestasis. Please elaborate on the potential side effects of targeting Plectin to treat HCC.

      Indeed, mutation or ablation of plectin has been implicated in many diseases (collectively known as plectinopathies). These multisystem disorders include an autosomal dominant form of epidermolysis bullosa simplex (EBS), limb-girdle muscular dystrophy, aplasia cutis congenita, and an autosomal recessive form of EBS that may be associated with muscular dystrophy, pyloric atresia, and/or congenital myasthenic syndrome. Several mutations have also been associated with cardiomyopathy and malignant arrhythmias. Progressive familial intrahepatic cholestasis has also been reported. In genetic mouse models, loss of plectin leads to skin fragility, extensive intestinal lesions, instability of the biliary epithelium, and progressive muscle wasting (for more details see (Vahidnezhad et al., 2022)). 

      It is therefore important to evaluate potential side effects, and plectin inactivation therefore presents challenges comparable to other anti-HCC targets. For instance, Sorafenib, the most widely used chemotherapy in recent decades, targets numerous serine/threonine and tyrosine kinases (RAF1, BRAF, VEGFR 1, 2, 3, PDGFR, KIT, FLT3, FGFR1, and RET) that are critical for proper non-pathological functions (Strumberg et al., 2007; Wilhelm et al., 2006; Wilhelm et al., 2004). The combinatorial therapy of atezolizumab and bevacizumab targets also PD-L1 in conjunction with VEGF, which plays an essential role in bone formation (Gerber et al., 1999), hematopoiesis (Ferrara et al., 1996), or wound healing (Chintalgattu et al., 2003). To allow readers to read a comprehensive account of the pathological consequences of plectin inactivation, we included two additional citations (Prechova et al., 2023; Vahidnezhad et al., 2022)  and rephrased Introduction section as follows: “…multiple reports have linked plectin with tumor malignancy(12) and other pathologies (Prechova et al., 2023; Vahidnezhad et al., 2022), mechanistic insights…” (page 4-5).

      Reviewer 3:

      (1) The rationale for using Huh7 cells in the manuscript is not well explained as it has the lowest Plectin expression levels.

      For this study, we selected two model HCC cell lines - Huh7 and SNU-475. Our intention was to address the role of plectin in “well-differentiated” (Huh7) and “poorly differentiated” (SNU-475) HCC cells, thus including early and advanced stages of HCC development (as categorized before (Boyault et al., 2007; Yuzugullu et al., 2009b) see also our description and reasoning on page 6). The Huh7 cell line is also a well-established and widely used model suitable for both in vitro and in vivo settings (e.g. (Du et al., 2024; Fu et al., 2018; Si et al., 2023; Zheng et al., 2018).

      As anticipated, less migratory “epithelial-like” Huh7 cells are characterized by relatively high E-cadherin, low vimentin, and low plectin expression levels (Fig 1D). In contrast, migratory “mesenchymal-like” SNU475 cells are characterized by relatively low E-cadherin, high vimentin, and high plectin expression levels (Fig 1D). Therefore, the majority of analyses were performed in both relatively low plectin-expressing Huh7 and high plectin-expressing SNU-475 cells. It is noteworthy, that inactivation of plectin had similar (although less pronounced) inhibitory effects on the phenotypes in both Huh7 and SNU-475 cells. We believe that these findings highlight the importance of plectin in HCC growth and metastasis, as plectin inactivation has inhibitory effects on both early (low plectin) and advanced (high plectin) stages of HCC.

      (2) The KO cell experiments should be supplemented with overexpression experiments.

      We agree with the reviewer that it would be helpful to complement our plectin inactivation experiments by overexpressing plectin in the HCC cell lines used in this study. In fact, we have received similar suggestions since we started to publish our studies on plectin. There are two reasons, which preclude the successful overexpression experiments. First, there is about 14 known isoforms of plectin (Prechova et al., 2023). Although previous studies have analyzed the phenotypic rescue potential of some plectin isoforms using transient transfection (e.g. (Burgstaller et al., 2010; Osmanagic-Myers et al., 2015; Prechova et al., 2022)), the isoform variability precludes rescue/overexpression experiments if the causative isoform is not known. Second, plectin is a giant cytoskeletal crosslinker protein of more than 4,500 amino acids with binding sites for intermediate filaments, F-actin, and microtubules. Overexpression of the approximately 500 kDa-large crosslinker invariably leads to the collapse of cytoskeletal networks in every cell type we have tested so far. See also our response to Reviewer 1, #7.

      (3) There is significant concern that while ablation of Ple led to reduced tumor number, these mice had larger tumors. The data indicate that Plectin may have distinct roles in HCC initiation versus progression. The data are not well explained and do not fully support that Plectin promotes hepatocarcinogenesis.

      In the DEN-induced HCC model MRI screening revealed fewer tumors and also tumor volume was reduced at 32 and 44 weeks post-induction (Fig 2A-C). Larger tumors formed in Ple<sup>ΔAlb</sup> compared to Ple<sup>fl/fl</sup> livers (Figs 2F and S2A) refer only to a subset of macroscopic tumors visually identified at necropsy. Larger Ple<sup>ΔAlb</sup> tumors were not observed in the Myc;sgTp53 HDTVI-induced HCC model (data not shown). In contrast, plectin deficiency reduced the size of xenografts formed in NSG mice (Fig 2H), and agar colonies grown from Huh7 and SNU-475 cells with inactivated plectin were also smaller (Fig S2F). In all in vivo and in vitro approaches presented in the manuscript, plectin inactivation reduced the number of colonies/xenografts/tumors. As hepatocarcinogenesis is a multistep process including initiation, promotion, and progression (Pitot, 2001), we feel confident in concluding that plectin inactivation inhibits hepatocarcinogenesis and we consider this conclusion to be fully supported by the data presented in the manuscript.

      However, we agree with the reviewer that larger macroscopic Ple<sup>ΔAlb</sup> tumors in the DEN-induced HCC model are intriguing. As we do not see similar effects (or even trends) in other approaches used in this study, we cannot exclude the contribution of plectin-deficient environment in Ple<sup>ΔAlb</sup> livers during longterm (44 weeks) tumor formation and growth. In our previous study (Jirouskova et al., 2018), we showed that plectin deficiency in Ple<sup>ΔAlb</sup> livers leads to biliary tree malformations, collapse of bile ducts and ductules, and mild ductular reaction. We could speculate that Ple<sup>ΔAlb</sup> livers suffer from continuous bile leakage into the parenchyma, which would exacerbate all models of long-term pathology.

      As we did not further address the formation of larger tumors in Ple<sup>ΔAlb</sup> mice further in the current study, we offered the reader the hypothesis that large tumors could “…possibly implying reduced migration or increased cohesion of plectin-depleted cells25.” In support of our hypothesis, we cite our own publication (#26; Jirouskova et al., J Hepatol., 2018), where we show that plectin inactivation in Ple<sup>ΔAlb</sup> livers results in upregulation of the epithelial marker E-cadherin. Previous studies have shown that similar increase in E-cadherin expression levels reflects mesenchymal-to-epithelial transition (e.g. (Adhikary et al., 2014; Auersperg et al., 1999; Wendt et al., 2011)) and is often associated with reduced cancer cell migration/invasion. This is consistent with our finding that “migrating plectin-disabled SNU475 cells exhibited more cohesive, epithelial-like features while progressing collectively. By contrast, WT SNU-475 leader cells were more polarized and found to migrate into scratch areas more frequently than their plectin-deficient counterparts (Figure 5—figure supplement 1B). Consistent with this observation, individually seeded SNU-475 cells less frequently assumed a polarized, mesenchymal-like shape upon plectin inactivation in both 2D and 3D environments (Fig. 5C). Moreover, plectin-inactivated SNU-475 cells exhibited a decrease in N-cadherin and vimentin levels when compared to WT counterparts (Figure 5—figure supplement 1C).” (page 10).

      In conclusion, we have shown that plectin-deficient hepatocytes express higher levels of E-cadherin and hepatocyte-derived SNU-475 cells less N-cadherin and vimentin. In addition, we show that SNU-475 cells exhibited more cohesive, epithelial-like features in scratch-wound experiments. To address the reviewer's concern and to further support our claim of increased cohesiveness of plectin-deficient HCC cells we included the citation of the recent study(27). Using the MHCC97H and MHCC97L HCC cell lines, this study shows that plectin downregulation “inhibits HCC cell migration and epithelial mesenchymal transformation” and is therefore fully consistent with our hypothesis. To mitigate the impression of an unsubstantiated statement, we also discuss adhesion-independent plectin-mediated mechanisms in the revised Discussion section as follows: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesionindependent effects of plectin inactivation on HCC growth and metastasis.” (page 15).

      (4) Figure 3 showed that Plectin does not regulate p-FAK/FAK expression. Therefore, the statement that Plectin regulates the FAK pathway is not valid. Furthermore, there are too many variables in turns of p-AKT and p-ERK expression, making the conclusion not well supported.

      We agree with the reviewer that pFAK/FAK levels are either comparable or slightly higher upon plectin inactivation. However, we believe that our data convincingly show that FAK expression is downregulated in both Huh7 and Snu-475 cells. In our opinion, this results in an overall attenuation of the FAK signaling (see percentage for Normalized pFAKxNormalized FAK), which is expectedly more pronounced in migratory Snu-475 cells. The following data (shown in Figs 3D and S3C) are expressed as a percentage of untreated WT, with downregulated values highlighted in red:

      Author response table 4.

      Given these results, we believe that our statement that “inhibition of plectin attenuates FAK signaling” (pages 8-9) is well supported.

      We believe, that our data show that both pAkt and pErk are attenuated upon plectin inactivation in both Huh7 and SNU-475 cells. The following data (presented in Figs 3D and S3C) are shown as a percentage of untreated WT, with downregulated values highlighted in red:

      Author response table 5.

      We agree with the reviewer that plectin inactivation yields varying degrees of attenuation of the FAK, MAPK/Erk, and PI3K/Akt pathways depending on the cell type (Huh7 vs SNU-475 cells) and mode of plectin inactivation (CRISPR/Cas9-generated plectin KO vs functional KO (∆IFBD) vs organorutheniumbased inhibitor plecstatin-1). This context-dependent heterogeneity in the expression/activation of pathway molecular denominators reflects different degrees of cytoskeletal (e.g. #ventral stress fibers, Fig 4A,D and vimentin architecture, Fig S4A-C) and focal adhesion (e.g. %central FA, Fig 4A,E) phenotypes under different conditions. See also the detailed response to all Reviewers (on the first three pages of this letter) and the responses to Reviewer 1, #1 and #2 and Reviewer 2, #4.

      (5) The studies of plecstatin-1 in HCC should be expanded to a panel of human HCC cells with various Plectin expression levels in turns of cell growth and cell migration. The IC50 values should be determined and correlate with Plectin expression.

      Following the reviewer's suggestion, we have included graphs showing IC50 values for Huh7 (low plectin) and SNU-475 (high plectin) cells as Fig S2E. As expected, the IC50 values are higher for SNU-475 cells. Corresponding parts of the Figure legends have been changed. We refer to new data in the Results section as follows: “If not stated otherwise, we applied PST in the final concentration of 8 µM, which corresponds to the 25% of IC50 for Huh7 cells (Figure 2—figure supplement 1E).” (page 7). We also provide details of the IC50 determination in the revised Supplement Materials and methods section (pages 5-6).

      (6) One of the major issues is the mechanistic studies focusing on Plectin regulating HCC migration/metastasis, whereas the in vivo mouse studies focus on HCC formation (Figures 3 and 7). These are distinct processes and should not be mixed.

      In our study, we investigated the role of plectin in the development and dissemination of HCC. Using DEN- and Myc;sgTp53 HDTVI-induced HCC models (Figs 2A-F, S2A, 7A-C, and S7A-D), we show the effects of plectin inactivation on HCC formation in vivo. These studies are complemented by xenografts (Figs 2H and S2G) and in vitro colony formation assay (Figs 2G and S2F). Using an in vivo lung colonization assay (Figs 6G-I and S6C-F), we show the effects of plectin inactivation on the metastatic potential of HCC cells. In complementary in vitro studies, we show how plectin deficiency affects migration (Figs 5 and S5) and invasion (Figs 6A-E and S6A,B). 

      Our mechanistic studies show that plectin inactivation leads to dysregulation of cytoskeletal networks, adhesions, and adhesion-associated signaling. We believe that we have provided substantial experimental data suggesting that the proposed mechanisms play a role in plectin-mediated inhibition of both HCC development and dissemination. Of course, we cannot rule out additional, adhesionindependent mechanisms for HCC formation. To clarify this, we have revised the Discussion section as follows: “However, it is conceivable that dysregulated cytoskeletal crosstalk could affect HCC through multiple mechanisms independent from FA-associated signaling. Indeed, we and others (Jirouskova et al., 2018; Xu et al., 2022) have shown that upon plectin inactivation, liver cells acquire epithelial characteristics that promote increased intercellular cohesion and reduced migration. Further studies will be required to identify and investigate synergistic adhesion-independent effects of plectin inactivation on HCC growth and metastasis.” (page 15).

      (7) Figure 7B showed that Ple KO mice were treated with PST, but the data are not presented in the manuscript. Tumor cell proliferation and apoptosis rates should be analyzed as well.

      We do not show any effects of PST in Ple<sup>ΔAlb</sup> mice. As stated in the Fig 7B legend: “Myc;sgTp53 HCC was induced in Ple<sup>fl/fl</sup>, Ple<sup>ΔAlb</sup>, and PST-treated Ple<sup>fl/fl</sup> (Ple<sup>fl/fl</sup>+PST) male mice as in (A). Shown are representative images of Ple<sup>fl/fl</sup>, Ple<sup>ΔAlb</sup>, and Ple<sup>fl/fl</sup>+PST livers from mice with fully developed multifocal HCC sacrificed 6 weeks post-induction.”.

      Following the reviewer's recommendation, we include the analysis of proliferation and apoptosis rates as revised Fig S7A,B. Please note, that no differences in apoptosis and proliferation rates were found between experimental conditions. Due to additional data, the original Fig S7 – 1 has been split into revised Fig S7 – 1 and Fig S7 – 2.

      (8) The status of FAK, AKT, and ERK pathway activation was not analyzed in mouse liver samples. In Figure 7D, most of the adjusted p-values are not significant.

      We are aware that the majority of FDR corrected p-values shown in the Fig 7D are not significant. In fact, we deliberated with our colleagues from the laboratory of Prof. Samuel Meier-Menches (Department of Analytical Chemistry, University of Vienna), who conducted all the proteomic studies presented in this manuscript, on whether to present such "weak" data. Following a lengthy discussion, a decision was taken to include them despite the anticipation of criticism from the reviewers. The rationale for including these data is that, despite the lack of statistical significance, the findings are consistent with those of MS/immunoblot analyses of HCC cells (Figs 3 and S3) and patient data (Figs 7E, S7-2). The lack of statistical significance observed in the presented data is a consequence of the limited number of animals included in the Ple<sup>fl/fl</sup>, Ple<sup>ΔAlb</sup>, and PST-treated Ple<sup>fl/fl</sup> cohorts, which has resulted in a high degree of variability in the MS results. We agree with the reviewer that the inclusion of immunoblot analysis would provide further support for our conclusions. However, we do not have any remaining liver tissue that could be analyzed.

      (9) There is no evidence to support that PST is capable of overcoming therapy resistance in HCC. For example, no comparison with the current standard care was provided in the preclinical studies.

      We are grateful to the reviewer for bringing our attention to the incorrect statement in the Abstract: “…we show that plectin inhibitor plecstatin-1 (PST) is well-tolerated and capable of overcoming therapy resistance in HCC”. To address the reviewer's concern, we rephrased the Abstract as follows: “…we show that plectin inhibitor plecstatin-1 (PST) is well-tolerated and potently inhibits HCC progression”.

      Recommendations for the authors: 

      Reviewer 2 (Recommendations for the authors):

      (1) In Figures 6I and S6C, it would be better to show the whole slide scan result for all the groups.

      Following the reviewer's recommendation, we include the whole slide scan result for all the groups as revised Fig S6F.

      (2) In Figures S7C and D, what do the highlighted/colored dots represent? They are not mentioned in the figure legend or the results.

      Following the reviewer's recommendation, we include the explanation in the revised Figure legends (page 30).

      (3) In Figure 2H, the experiment schedule showed "6w Huh7 t.v.i.", but should it be subcutaneous injection?

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The schematics was corrected. The schematic has been corrected. We have also noticed an error in the table summarizing the number of tumors formed (N) and have corrected the values for the WT+PST and KO conditions.

      (4) Supplemental Materials and Methods, Xenograft tumorigenesis, Error: 2.5×106 Huh7 cells in 250 ml PBS mice were administered subcutaneously in the left and right hind flanks. It probably should be "250ul".

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The corresponding part of the Materials and Methods section has been corrected (page 2).

      (5) In Figure legend Supplementary Figure 6 C,D,E : "Representative magnified images from lung lobes with GFP-positive WT, KO, and WT+PST SNU-475 nodules". There is no picture for the WT+PST SNU-475 group.

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The corresponding part of the Figure legend (“WT+PST SNU-475”) has been deleted (page 27).

      (6) In the Figure legend for Figure 6H, "Representative BLI images of WT, KO, and PST-treated WT (WT+PST) SNU-475 cells-bearing mice are shown". Should it be Huh7, not SNU-475?

      We are grateful to the reviewer for bringing our attention to the incorrect description of the experiment. The description of the cell line has been corrected (page 34).

      (7) The statement that current therapies rely on multikinase inhibitors is no longer correct.

      We are grateful to the reviewer for bringing our attention to the incorrect statement. To address the reviewer's concern, we rephrased the original part of Discussion section: “Current therapies for HCC rely on multikinase inhibitors (such as sorafenib) that provide only moderate survival benefit(60,61) due to primary resistance and the plasticity of signaling networks(62)” as follows: “Current systemic therapies for advanced HCC rely on a combination of multikinase inhibitor (such as sorafenib) or anti-VEGF /VEGF inhibitor (such as bevacizumab) treatment with immunotherapy(59). Multikinase inhibitors provide only moderate survival benefit(60,61) due to primary resistance and the plasticity of signaling networks(62), and only a subset of patients benefits from addition of immunotherapy in HCC treatment(63)” (page 15).

      References

      Adhikary, A., S. Chakraborty, M. Mazumdar, S. Ghosh, S. Mukherjee, A. Manna, S. Mohanty, K.K. Nakka, S. Joshi, A. De, S. Chattopadhyay, G. Sa, and T. Das. 2014. Inhibition of epithelial to mesenchymal transition by E-cadherin up-regulation via repression of slug transcription and inhibition of Ecadherin degradation: dual role of scaffold/matrix attachment region-binding protein 1 (SMAR1) in breast cancer cells. The Journal of biological chemistry. 289:25431-25444.

      Auersperg, N., J. Pan, B.D. Grove, T. Peterson, J. Fisher, S. Maines-Bandiera, A. Somasiri, and C.D. Roskelley. 1999. E-cadherin induces mesenchymal-to-epithelial transition in human ovarian surface epithelium. Proc Natl Acad Sci U S A. 96:6249-6254.

      Bernal, A., M. McLaughlin, A. Tiwari, F. Cigarroa, and L. Sun. 2024. Abstract 772: Investigation of gender disparity in liver tumor formation using a hydrodynamic tail vein injection mouse model. Cancer Research. 84:772-772.

      Bigsby, R.M., and A. Caperell-Grant. 2011. The role for estrogen receptor-alpha and prolactin receptor in sex-dependent DEN-induced liver tumorigenesis. Carcinogenesis. 32:1162-1166.

      Bonakdar, N., A. Schilling, M. Sporrer, P. Lennert, A. Mainka, L. Winter, G. Walko, G. Wiche, B. Fabry, and W.H. Goldmann. 2015. Determining the mechanical properties of plectin in mouse myoblasts and keratinocytes. Exp Cell Res. 331:331-337.

      Boyault, S., D.S. Rickman, A. de Reynies, C. Balabaud, S. Rebouissou, E. Jeannot, A. Herault, J. Saric, J. Belghiti, D. Franco, P. Bioulac-Sage, P. Laurent-Puig, and J. Zucman-Rossi. 2007. Transcriptome classification of HCC is related to gene alterations and to new therapeutic targets. Hepatology. 45:42-52.

      Bray, F., M. Laversanne, H. Sung, J. Ferlay, R.L. Siegel, I. Soerjomataram, and A. Jemal. 2024. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 74:229-263.

      Buckup, M., M.A. Rice, E.C. Hsu, F. Garcia-Marques, S. Liu, M. Aslan, A. Bermudez, J. Huang, S.J. Pitteri, and T. Stoyanova. 2021. Plectin is a regulator of prostate cancer growth and metastasis. Oncogene. 40:663-676.

      Burgstaller, G., M. Gregor, L. Winter, and G. Wiche. 2010. Keeping the vimentin network under control: cell-matrix adhesion-associated plectin 1f affects cell shape and polarity of fibroblasts. Mol Biol Cell. 21:3362-3375.

      Chintalgattu, V., D.M. Nair, and L.C. Katwa. 2003. Cardiac myofibroblasts: a novel source of vascular endothelial growth factor (VEGF) and its receptors Flt-1 and KDR. J Mol Cell Cardiol. 35:277-286. Cuconati, A., C. Mills, C. Goddard, X. Zhang, W. Yu, H. Guo, X. Xu, and T.M. Block. 2013. Suppression of AKT anti-apoptotic signaling by a novel drug candidate results in growth arrest and apoptosis of hepatocellular carcinoma cells. PLoS One. 8:e54595.

      Du, Y.Q., B. Yuan, Y.X. Ye, F.L. Zhou, H. Liu, J.J. Huang, and Y.F. Wei. 2024. Plumbagin Regulates Snail to Inhibit Hepatocellular Carcinoma Epithelial-Mesenchymal Transition in vivo and in vitro. J Hepatocell Carcinoma. 11:565-580.

      Fan, Z.C., J. Yan, G.D. Liu, X.Y. Tan, X.F. Weng, W.Z. Wu, J. Zhou, and X.B. Wei. 2012. Real-time monitoring of rare circulating hepatocellular carcinoma cells in an orthotopic model by in vivo flow cytometry assesses resection on metastasis. Cancer Res. 72:2683-2691.

      Ferrara, N., K. Carver-Moore, H. Chen, M. Dowd, L. Lu, K.S. O'Shea, L. Powell-Braxton, K.J. Hillan, and M.W. Moore. 1996. Heterozygous embryonic lethality induced by targeted inactivation of the VEGF gene. Nature. 380:439-442.

      Fu, Q., Q. Zhang, Y. Lou, J. Yang, G. Nie, Q. Chen, Y. Chen, J. Zhang, J. Wang, T. Wei, H. Qin, X. Dang, X. Bai, and T. Liang. 2018. Primary tumor-derived exosomes facilitate metastasis by regulating adhesion of circulating tumor cells via SMAD3 in liver cancer. Oncogene. 37:6105-6118.

      Gerber, H.P., T.H. Vu, A.M. Ryan, J. Kowalski, Z. Werb, and N. Ferrara. 1999. VEGF couples hypertrophic cartilage remodeling, ossification and angiogenesis during endochondral bone formation. Nat Med. 5:623-628.

      Gnani, D., I. Romito, S. Artuso, M. Chierici, C. De Stefanis, N. Panera, A. Crudele, S. Ceccarelli, E. Carcarino, V. D'Oria, M. Porru, E. Giorda, K. Ferrari, L. Miele, E. Villa, C. Balsano, D. Pasini, C. Furlanello, F. Locatelli, V. Nobili, R. Rota, C. Leonetti, and A. Alisi. 2017. Focal adhesion kinase depletion reduces human hepatocellular carcinoma growth by repressing enhancer of zeste homolog 2. Cell Death Differ. 24:889-902.

      Gregor, M., S. Osmanagic-Myers, G. Burgstaller, M. Wolfram, I. Fischer, G. Walko, G.P. Resch, A. Jorgl, H. Herrmann, and G. Wiche. 2014. Mechanosensing through focal adhesion-anchored intermediate filaments. FASEB J. 28:715-729.

      Hiratsuka, S., S. Goel, W.S. Kamoun, Y. Maru, D. Fukumura, D.G. Duda, and R.K. Jain. 2011. Endothelial focal adhesion kinase mediates cancer cell homing to discrete regions of the lungs via E-selectin up-regulation. Proc Natl Acad Sci U S A. 108:3725-3730.

      Jakab, M., K.H. Lee, A. Uvarovskii, S. Ovchinnikova, S.R. Kulkarni, S. Jakab, T. Rostalski, C. Spegg, S. Anders, and H.G. Augustin. 2024. Lung endothelium exploits susceptible tumor cell states to instruct metastatic latency. Nat Cancer. 5:716-730.

      Jin, H., C. Wang, G. Jin, H. Ruan, D. Gu, L. Wei, H. Wang, N. Wang, E. Arunachalam, Y. Zhang, X. Deng, C. Yang, Y. Xiong, H. Feng, M. Yao, J. Fang, J. Gu, W. Cong, and W. Qin. 2017. Regulator of Calcineurin 1 Gene Isoform 4, Down-regulated in Hepatocellular Carcinoma, Prevents Proliferation, Migration, and Invasive Activity of Cancer Cells and Metastasis of Orthotopic Tumors by Inhibiting Nuclear Translocation of NFAT1. Gastroenterology. 153:799-811 e733.

      Jirouskova, M., K. Nepomucka, G. Oyman-Eyrilmez, A. Kalendova, H. Havelkova, L. Sarnova, K. Chalupsky, B. Schuster, O. Benada, P. Miksatkova, M. Kuchar, O. Fabian, R. Sedlacek, G. Wiche, and M. Gregor. 2018. Plectin controls biliary tree architecture and stability in cholestasis. J Hepatol. 68:1006-1017.

      Katada, K., T. Tomonaga, M. Satoh, K. Matsushita, Y. Tonoike, Y. Kodera, T. Hanazawa, F. Nomura, and Y. Okamoto. 2012. Plectin promotes migration and invasion of cancer cells and is a novel prognostic marker for head and neck squamous cell carcinoma. J Proteomics. 75:1803-1815.

      Koster, J., S. van Wilpe, I. Kuikman, S.H. Litjens, and A. Sonnenberg. 2004. Role of binding of plectin to the integrin beta4 subunit in the assembly of hemidesmosomes. Mol Biol Cell. 15:1211-1223.

      Liu, H., Q. Chen, D. Lu, X. Pang, S. Yin, K. Wang, R. Wang, S. Yang, Y. Zhang, Y. Qiu, T. Wang, and H. Yu. 2020. HTBPI, an active phenanthroindolizidine alkaloid, inhibits liver tumorigenesis by targeting Akt. FASEB J. 34:12255-12268.

      Lu, H.H., S.Y. Lin, R.R. Weng, Y.H. Juan, Y.W. Chen, H.H. Hou, Z.C. Hung, G.A. Oswita, Y.J. Huang, S.Y. Guu, K.H. Khoo, J.Y. Shih, C.J. Yu, and H.C. Tsai. 2020. Fucosyltransferase 4 shapes oncogenic glycoproteome to drive metastasis of lung adenocarcinoma. EBioMedicine. 57:102846.

      Mathews, S.T., E.P. Plaisance, and T. Kim. 2009. Imaging systems for westerns: chemiluminescence vs. infrared detection. Methods in molecular biology (Clifton, N.J.). 536:499-513.

      Osmanagic-Myers, S., M. Gregor, G. Walko, G. Burgstaller, S. Reipert, and G. Wiche. 2006. Plectincontrolled keratin cytoarchitecture affects MAP kinases involved in cellular stress response and migration. J Cell Biol. 174:557-568.

      Osmanagic-Myers, S., S. Rus, M. Wolfram, D. Brunner, W.H. Goldmann, N. Bonakdar, I. Fischer, S. Reipert, A. Zuzuarregui, G. Walko, and G. Wiche. 2015. Plectin reinforces vascular integrity by mediating crosstalk between the vimentin and the actin networks. J Cell Sci. 128:4138-4150.

      Pillai-Kastoori, L., A.R. Schutz-Geschwender, and J.A. Harford. 2020. A systematic approach to quantitative Western blot analysis. Analytical biochemistry. 593:113608.

      Pitot, H.C. 2001. Pathways of progression in hepatocarcinogenesis. Lancet (London, England). 358:859860.

      Prechova, M., Z. Adamova, A.L. Schweizer, M. Maninova, A. Bauer, D. Kah, S.M. Meier-Menches, G. Wiche, B. Fabry, and M. Gregor. 2022. Plectin-mediated cytoskeletal crosstalk controls cell tension and cohesion in epithelial sheets. J Cell Biol. 221.

      Prechova, M., K. Korelova, and M. Gregor. 2023. Plectin. Curr Biol. 33:R128-R130.

      Qi, L., T. Knifley, M. Chen, and K.L. O'Connor. 2022. Integrin alpha6beta4 requires plectin and vimentin for adhesion complex distribution and invasive growth. J Cell Sci. 135.

      Romito, I., M. Porru, M.R. Braghini, L. Pompili, N. Panera, A. Crudele, D. Gnani, C. De Stefanis, M. Scarsella, S. Pomella, S. Levi Mortera, E. de Billy, A.L. Conti, V. Marzano, L. Putignani, M. Vinciguerra, C. Balsano, A. Pastore, R. Rota, M. Tartaglia, C. Leonetti, and A. Alisi. 2021. Focal adhesion kinase inhibitor TAE226 combined with Sorafenib slows down hepatocellular carcinoma by multiple epigenetic effects. J Exp Clin Cancer Res. 40:364.

      Si, T., L. Huang, T. Liang, P. Huang, H. Zhang, M. Zhang, and X. Zhou. 2023. Ruangan Lidan decoction inhibits the growth and metastasis of liver cancer by downregulating miR-9-5p and upregulating PDK4. Cancer Biol Ther. 24:2246198.

      Strumberg, D., J.W. Clark, A. Awada, M.J. Moore, H. Richly, A. Hendlisz, H.W. Hirte, J.P. Eder, H.J. Lenz, and B. Schwartz. 2007. Safety, pharmacokinetics, and preliminary antitumor activity of sorafenib: a review of four phase I trials in patients with advanced refractory solid tumors. Oncologist. 12:426-437.

      Tao, Q.F., S.X. Yuan, F. Yang, S. Yang, Y. Yang, J.H. Yuan, Z.G. Wang, Q.G. Xu, K.Y. Lin, J. Cai, J. Yu, W.L. Huang, X.L. Teng, C.C. Zhou, F. Wang, S.H. Sun, and W.P. Zhou. 2015. Aldolase B inhibits metastasis through Ten-Eleven Translocation 1 and serves as a prognostic biomarker in hepatocellular carcinoma. Mol Cancer. 14:170.

      Vahidnezhad, H., L. Youssefian, N. Harvey, A.R. Tavasoli, A.H. Saeidian, S. Sotoudeh, A. Varghaei, H. Mahmoudi, P. Mansouri, N. Mozafari, O. Zargari, S. Zeinali, and J. Uitto. 2022. Mutation update: The spectra of PLEC sequence variants and related plectinopathies. Human mutation. 43:17061731.

      Voisin, L., M. Lapouge, M.K. Saba-El-Leil, M. Gombos, J. Javary, V.Q. Trinh, and S. Meloche. 2024. Syngeneic mouse model of YES-driven metastatic and proliferative hepatocellular carcinoma. Dis Model Mech. 17.

      Wang, D.D., Y. Chen, Z.B. Chen, F.J. Yan, X.Y. Dai, M.D. Ying, J. Cao, J. Ma, P.H. Luo, Y.X. Han, Y. Peng, Y.H. Sun, H. Zhang, Q.J. He, B. Yang, and H. Zhu. 2016. CT-707, a Novel FAK Inhibitor, Synergizes with Cabozantinib to Suppress Hepatocellular Carcinoma by Blocking Cabozantinib-Induced FAK Activation. Mol Cancer Ther. 15:2916-2925.

      Wang, W., A. Zuidema, L. Te Molder, L. Nahidiazar, L. Hoekman, T. Schmidt, S. Coppola, and A. Sonnenberg. 2020. Hemidesmosomes modulate force generation via focal adhesions. J Cell Biol. 219.

      Wendt, M.K., M.A. Taylor, B.J. Schiemann, and W.P. Schiemann. 2011. Down-regulation of epithelial cadherin is required to initiate metastatic outgrowth of breast cancer. Mol Biol Cell. 22:24232435.

      Wenta, T., A. Schmidt, Q. Zhang, R. Devarajan, P. Singh, X. Yang, A. Ahtikoski, M. Vaarala, G.H. Wei, and A. Manninen. 2022. Disassembly of alpha6beta4-mediated hemidesmosomal adhesions promotes tumorigenesis in PTEN-negative prostate cancer by targeting plectin to focal adhesions. Oncogene. 41:3804-3820.

      Wilhelm, S., C. Carter, M. Lynch, T. Lowinger, J. Dumas, R.A. Smith, B. Schwartz, R. Simantov, and S. Kelley. 2006. Discovery and development of sorafenib: a multikinase inhibitor for treating cancer. Nat Rev Drug Discov. 5:835-844.

      Wilhelm, S.M., C. Carter, L. Tang, D. Wilkie, A. McNabola, H. Rong, C. Chen, X. Zhang, P. Vincent, M. McHugh, Y. Cao, J. Shujath, S. Gawlak, D. Eveleigh, B. Rowley, L. Liu, L. Adnane, M. Lynch, D. Auclair, I. Taylor, R. Gedrich, A. Voznesensky, B. Riedl, L.E. Post, G. Bollag, and P.A. Trail. 2004. BAY 43-9006 exhibits broad spectrum oral antitumor activity and targets the RAF/MEK/ERK pathway and receptor tyrosine kinases involved in tumor progression and angiogenesis. Cancer Res. 64:7099-7109.

      Xu, R., S. He, D. Ma, R. Liang, Q. Luo, and G. Song. 2022. Plectin Downregulation Inhibits Migration and Suppresses Epithelial Mesenchymal Transformation of Hepatocellular Carcinoma Cells via ERK1/2 Signaling. Int J Mol Sci. 24.

      You, A., M. Cao, Z. Guo, B. Zuo, J. Gao, H. Zhou, H. Li, Y. Cui, F. Fang, W. Zhang, T. Song, Q. Li, X. Zhu, H. Yin, H. Sun, and T. Zhang. 2016. Metformin sensitizes sorafenib to inhibit postoperative recurrence and metastasis of hepatocellular carcinoma in orthotopic mouse models. J Hematol Oncol. 9:20.

      Yuzugullu, H., K. Benhaj, N. Ozturk, S. Senturk, E. Celik, A. Toylu, N. Tasdemir, M. Yilmaz, E. Erdal, K.C. Akcali, N. Atabey, and M. Ozturk. 2009a. Canonical Wnt signaling is antagonized by noncanonical Wnt5a in hepatocellular carcinoma cells. Molecular Cancer. 8:90.

      Yuzugullu, H., K. Benhaj, N. Ozturk, S. Senturk, E. Celik, A. Toylu, N. Tasdemir, M. Yilmaz, E. Erdal, K.C. Akcali, N. Atabey, and M. Ozturk. 2009b. Canonical Wnt signaling is antagonized by noncanonical Wnt5a in hepatocellular carcinoma cells. Mol Cancer. 8:90.

      Zhao, J., Y. Hou, C. Yin, J. Hu, T. Gao, X. Huang, X. Zhang, J. Xing, J. An, S. Wan, and J. Li. 2020. Upregulation of histamine receptor H1 promotes tumor progression and contributes to poor prognosis in hepatocellular carcinoma. Oncogene. 39:1724-1738.

      Zheng, H., Y. Yang, C. Ye, P.P. Li, Z.G. Wang, H. Xing, H. Ren, and W.P. Zhou. 2018. Lamp2 inhibits epithelial-mesenchymal transition by suppressing Snail expression in HCC. Oncotarget. 9:3024030252.

    1. eLife Assessment

      This valuable study provides in-vivo evidence that CCR4 regulates the early inflammatory response during atherosclerotic plaque formation. The authors propose that altered T-cell response plays a role in this process, shedding light on mechanisms that may be of interest to medical biologists, biochemists, cell biologists, and immunologists. The work is currently considered incomplete pending textual changes and the inclusion of proper controls.

    2. Reviewer #2 (Public review):

      Summary:

      Tanaka et al. investigated the role of CCR4 in early atherosclerosis, focusing on the immune modulation elicited by this chemokine receptor under hypercholesterolemia. The study found that Ccr4 deficiency led to qualitative changes in atherosclerotic plaques, characterized by an increased inflammatory phenotype. The authors further analyzed the CD4 T cell immune response in para-aortic lymph nodes and atherosclerotic aorta, showing an increase mainly in Th1 cells and the Th1/Treg ratio in Ccr4-/-Apoe-/- mice compared to Apoe-/- mice. They then focused on Tregs, demonstrating that Ccr4 deficiency impaired their immunosuppressive function in in vitro assays. Authors also states that Ccr4-deficient Tregs had, as expected, impaired migration to the atherosclerotic aorta. Adoptive cell transfer of Ccr4-/- Tregs to Apoe-/- mice mimicked early atherosclerosis development in Ccr4-/-Apoe-/- mice. Therefore, this work shows that CCR4 plays an important role in early atherosclerosis but not in advanced stages.

      Strengths:

      Several in vivo and in vitro approaches were used to address the role of CCR4 in early atherosclerosis. Particularly, through the adoptive cell transfer of CCR4+ or CCR4- Tregs, the authors aimed to directly demonstrate the role of CCR4 in Tregs' protection against early atherosclerosis.

      Weaknesses:

      Flow cytometry experiments are not well controlled. Dead cells and doublets were not excluded from analysis.

      Clinical relevance is unclear.

    3. Reviewer #3 (Public review):

      Summary:

      Tanaka and colleagues addressed the role of the C-C chemokine receptor 4 (CCR4) in early atherosclerotic plaque development using ApoE-deficient mice on a standard chow diet as a model. Because several CD4+ T cell subsets express CCR4, they examined whether CCR4-deficiency alters the immune response mediated by CD4+ T cells. By histological analysis of aortic lesions, they demonstrated that the absence of CCR4 promoted the development of early atherosclerosis, with heightened inflammation linked to increased macrophages and pro-inflammatory CD4+ T cells, along with reduced collagen content. Flow cytometry and mRNA expression analysis for identifying CD4+ T cell subsets showed that CCR4 deficiency promoted higher proliferation of pro-inflammatory effector CD4+ T cells in peripheral lymphoid tissues and accumulation of Th1 cells in the atherosclerotic lesions. Interestingly, the increased pro-inflammatory CD4+ T cell response occurred despite the expansion of T CD4+ Foxp3+ regulatory cells (Tregs), found in higher numbers in lymphoid tissues of CCR4-deficient mice, suggesting that CCR4 deficiency interfered with Treg's regulatory actions. In addition, CCR4 deficiency induced an augmented Th1/Treg ratio in the aortic lesions. The CCR4-mediated mechanisms underlying the control of early inflammation and atherosclerosis development were not completely elucidated. In vitro studies suggest that CCR4 expression in Tregs plays a role in controlling DC activation and, in turn, the extent of CD4+T cell activation and proliferation. Dependence on CCR4 expression for Treg migration to the atherosclerotic aorta was not proved. The findings contrast with earlier studies in a murine model of advanced atherosclerosis, where CCR4 deficiency did not alter the development of the aortic lesions. The authors included a thoughtful discussion about hypothetical mechanisms explaining these contrasting results, including putative differences in the role played by the CCL17/CCL22-CCR4 axis along the stages of atherosclerosis development in this murine model.

      Major strengths:

      • Demonstration of CCR4 deficiency's impact on early atherosclerosis. CCR4 deficiency effects on the early atherosclerosis development in the Apoe-/-mice model were demonstrated by a quantitative analysis of the lesion area, inflammatory cell content and the expression profile of several pro- and anti-inflammatory markers.<br /> • Analysis of the T CD4+ response in various lymphoid tissues (peripheral and para-aortic lymph nodes and spleen) and the atherosclerotic aorta during the early phase of atherosclerosis in the Apoe-/-mice model. This analysis, combining flow cytometry and mRNA expression, showed that CCR4 deficiency enhanced T CD4+ cell activation, favouring the amplification of the typical biased Th1-mediated inflammatory response observed in the lymphoid tissues of hypercholesterolemic mice.<br /> • Treg transference experiments. Transference of Treg from Apoe-/- or Ccr4-/- Apoe-/- mice to Apoe-/- mice under a standard chow diet was useful for addressing the relevance of CCR4 expression on Tregs for the atheroprotective effect of this regulatory T cell subset during early atherosclerosis.

      Major weaknesses:

      • The effect of CCR4 deficiency on the Th1/Th17 balance was not evaluated. Although the role of Th17 cells in atherosclerosis remains controversial, RORγt+ cells constituted, on average, more than 10% of the effector TCD45+CD3+CD4+ lymphocytes in the aorta of Apoe-/- mice (Fig 4H). Changes in the Th1/Th17 balance in lymphoid tissues and aortic lesions may influence the type and functional properties of inflammatory cells recruited to the atherosclerotic aorta.

      • Lack of in vivo evidence for Treg suppressive effects on DC activation. The proposed CCR4 requirement for the Treg suppressive activity on DC activation is supported by in vitro co-culture assays, in which CCR4-deficiency partially reverted Treg regulatory actions. Higher expression of CD86, a DC activation marker, was found in spleen DCs from Ccr4-/- Apoe-/- mice compared to Apoe-/- mice (Supplementary Fig 5), which would be worth commenting on and discussing.

      • Methodological limitations. Controls in flow cytometry analysis were suboptimal (no viability and doublets were checked) which may have introduced artefacts, especially when measuring less-represented cell populations within complex samples. In addition, assessing Treg migration to the aorta in atherosclerotic mice faced methodological limitations that hindered statistical comparisons between Tregs from Apoe-/- and Ccr4-/- Apoe-/- mice, leading to inconclusive results. The dependence on CCR4 expression for Treg migration to the atherosclerotic aorta was not established.

      • Treg transference experiments did not allow the detection of a reduction in the aortic lesion area by transferred CCR4 expressing Tregs (comparison between saline and Apoe-/- Tregs groups). Using Apoe-/- mice as recipients, the CCR4-dependent protective effect of Tregs was mostly evidenced by analysis of aortic inflammation, which was valuable. When using Ccr4-/- Apoe-/- mice as recipients, analysis of aortic inflammation was not mentioned.

      Study limitations:

      This investigation has some limitations. Current tools for single-cell characterization have revealed the phenotypic heterogeneity and dynamics of aortic leukocytes, including T cells, which are among the principal aortic leukocytes found in mouse and human atherosclerotic lesions (doi:10.1161/CIRCRESAHA.117.312513). The flow cytometry analysis applied in this study cannot distinguish the generation of particular phenotypes within T CD4+ subsets, including putative phenotypes of no-suppressive T cells expressing low levels of Foxp3, as seems could occur in other chronic inflammatory disorders (doi: 10.1038/nm.3432; doi: 10.1172/JCI79014). Limitations due to the use of a complete CCR4 knockout mouse and putative differences in CCR4-mediated mechanisms along atherosclerosis stages and in human atherosclerosis were commented on by the authors in the discussion.

      Global Impact:

      This work opens the way for a deeper analysis of the contribution of CCR4 and its ligands to the activation and differentiation of T CD4+ lymphocytes during atherosclerosis development, with these lymphocytes being fundamental players in the generation of pro-atherogenic and anti-atherogenic immune responses. Differences in the mechanisms mediated by the CCL17/CCL22-CCR4 axis among early and advanced atherosclerosis highlight the complex landscape to examine and validate in human samples and the need to achieve a deep knowledge for identifying genuine and safe targets capable of promoting protective anti-atherogenic immune responses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Response to the Reviewer #1 (Public review):

      We greatly appreciate the reviewer’s high evaluation of our paper and helpful comments. As expected, we revealed that the CCL17/CCL22–CCR4 axes play an important role in guiding Tregs to the atherosclerotic aorta. Interestingly, we also demonstrated that these axes are critical for Treg-dependent regulation of proinflammatory T cell responses in lymphoid tissues and atherosclerotic aortas, which is a previously unrecognized role for CCR4 in regulating inflammatory immune responses. However, the role of the CCL17/CCL22–CCR4 axes in regulating inflammatory immune responses and atherosclerosis has not been fully elucidated and further investigation is needed.

      Response to the reviewer #2 (Public review):

      We greatly appreciate the reviewer’s high evaluation of our paper and helpful comments and suggestions. We isolated CD4<sup>+</sup>CD25<sup>+</sup> T cells and used them as Tregs in several experiments. As the reviewer pointed out, we realize that CD4<sup>+</sup>CD25<sup>+</sup> T cell population contains some activated effector T cells. However, in consideration of the high expression levels of the most reliable Treg marker Foxp3 in isolated CD4<sup>+</sup>CD25<sup>+</sup> T cells determined by flow cytometry, we believe that our method for separating Tregs would be acceptable.

      Regarding the role of Th17 cells in atherosclerosis, conflicting results have been reported. Therefore, it is unclear whether augmented Th17 cell immune responses contribute to accelerated atherosclerosis in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice.

      As the reviewer pointed out, it is important to consider the clinical relevance of our findings. We analyzed public database to determine if Ccr4 single nucleotide polymorphisms correlate with a higher incidence of atherosclerotic cardiovascular disease. However, no evidence supporting the clinical relevance of our findings was found.

      Response to the Reviewer #3 (Public review):

      We greatly appreciate the reviewer’s high evaluation of our paper and helpful comments and suggestions. In accordance with the reviewer’s suggestion, we described the detailed methods and carefully performed data analysis regarding flow cytometry, which would strengthen the conclusion of this study.

      We understood the importance of reviewer’s claim that CCR4 deficiency does not shift the Th1 cell/Treg balance toward Th1 cell responses in all lymphoid tissues. CCR4 deficiency promoted the accumulation of Th1 cells but did not affect the accumulation of Tregs in the atherosclerotic aorta, which led to the shift of the Th1 cell/Treg balance toward Th1 cell responses. The frequencies of both Tregs and Th1 cells in peripheral lymphoid tissues were increased by CCR4 deficiency, while these CCR4-deficient Tregs exhibited impaired suppressive function. Given this, we speculate that CCR4 deficiency may shift the Th1 cell/Treg balance toward Th1 cell responses in peripheral lymphoid tissues. However, it is difficult to clearly show this. We revised the manuscript accordingly.

      Although the reviewer pointed out the possibility that modulation of the Th1 cell/Th17 cell balance might be responsible for the changes in aortic inflammatory cells in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, the role of Th17 cells in atherosclerosis remain controversial. However, we cannot completely exclude the possibility of the involvement of the Th17 response modulation in accelerated atherosclerosis in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice.

      As the limitation of this study, the phenotypic heterogeneity and dynamics of aortic leukocytes could not be revealed by flow cytometric analysis. Single-cell proteomic and transcriptomic approaches would provide additional important information on various aortic cells including immune cells and vascular cells.

      Reviewer #1 (Recommendations for the authors):

      Issue (1) Ideally, CCR4 could be deleted on Foxp3+ cells and some staining on double positive Rorg+Foxp3+ done. On the other side, a whole gene expression of infiltrated Foxp3 and effector could be also helpful. More challenging, it would be important to see whether those CCR4-specific Trges could or not regulate effector infiltrating cells.

      As the reviewer suggested, single-cell proteomic and transcriptomic approaches would be helpful to reveal the phenotypic heterogeneity and dynamics of aortic leukocytes including Tregs. Also, the use of conditional knockout mice would reveal the precise role of CCR4-expressing Tregs in regulating aortic immune cell infiltration and atherosclerosis.

      Reviewer #2 (Recommendations for the authors):

      Minor Suggestions:

      Issue (1) In supplementary Figure 1, CCR4 expression would be better represented by dot plots rather than histograms.

      We revised Supplementary Figure 1A through 1C.

      Issue (2) The reduction in CD103 expression shown in Figure 2E at 8 weeks should be discussed.

      In Figure 2E, we found that the expression of CD103 in peripheral LN Tregs was slightly lower in 8-week-old Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice than in age-matched Apoe<sup>-/-</sup> mice, while there was no difference in its expression levels between 18-week-old Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. In addition, there was no significant difference in the mRNA expression of this molecule in splenic Tregs between 8-week-old Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. Based on the minor effect of CCR4 deficiency on CD103 expression in Tregs, reduced CD103 expression in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice does not seem to be an important change.

      Issue (3) The increased expression of CD86 by DCs should be discussed.

      The upregulated CD86 expression on DCs in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice might be explained by the data on a Treg-DC coculture experiment showing the impaired cell–cell contacts between CCR4-deficient Tregs and DCs. On the other hand, the expression of another important costimulatory molecule CD80 on DCs was not altered in these mice, which is not consistent with the data on the above coculture experiment. The reason why only CD86 expression on DCs was upregulated in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice remains unclear.

      Issue (4) In Figures 5F-H, using larger dots would enhance visibility.

      We revised the graphs in Figure 5F-H.

      Issue (5) In Figure 5I, since the data is normalized, a one-sample t-test is more appropriate.

      In accordance with the reviewer’s suggestion, we reconsidered the data analysis. Because there was a dramatic difference in the absolute number of Kaede-expressing Tregs accumulated in the aorta among experiments, we were worried that the statistical analysis of the combined data from multiple experiments might draw a wrong conclusion. We have decided to show the representative data from 3 independent experiments in Figure 5I.

      Issue (6) On page 11, line 256, the text mentions IL4 and IL10 being detected by cytokine array; however, the figures do not show these cytokines.

      We are afraid that the reviewer might have misunderstood the data. The cytokine levels of IL-4 and IL-10 could not be detected by cytokine array analysis. Accordingly, we carefully revised the text in the manuscript.

      Issue (7). On page 14, lines 326-330, the text should be revised for clarity.

      We revised the text in the manuscript.

      Issue (8) Several data are marked as "not shown"; some of this information is relevant and should be included in the supplementary figures.

      We showed the data on CCL17 and CCL22 expression in peripheral LNs in Supplementary Figure 2.

      Major Suggestions:

      Issue (1) FoxP3 expression should be evaluated post-isolation of CD4<sup>+</sup>CD25<sup>+</sup> T cells, and FoxP3- CD4<sup>+</sup>CD25<sup>+</sup> T cells should be characterized. Tregs could be more effectively isolated using FoxP3eGFP mice.

      After isolation of CD4<sup>+</sup>CD25<sup>+</sup> T cells (the purity was >95%), we examined Foxp3 expression by flow cytometry and found that most of these cells express Foxp3 (Supplementary Figure 10). Therefore, CD4<sup>+</sup>CD25<sup>+</sup> T cells without Foxp3 expression, which are considered contaminated effector T cells, are minor cells and would not substantially affect the results. Nonetheless, the use of Foxp3-eGFP mice would enable us to isolate Tregs more accurately.

      Issue (2) In Figure 3, it would be interesting to evaluate whether there are RORgt+Tbet+ (IL17+IFNg+) cells. These cells would be pathogenic, whereas RORgt+CD73+ cells would be non-pathogenic.

      We analyzed CD4<sup>+</sup> T cells producing both IL-17 and IFN-γ in the peripheral lymphoid tissues of Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. We found that this cell population was quite rare and that there was no significant difference its proportion between the 2 groups, suggesting the possible minor contribution of this cell population to the atherosclerosis phenotype.

      Author response image 1.

      Issue (3) Different time points after adoptive cell transfer should be evaluated to confirm reduced migration to the atherosclerotic aorta.

      It would be interesting to evaluate Treg migration to the atherosclerotic aorta at different time points after Treg transfer. However, it seems difficult to accurately evaluate the migration of Tregs at later time points because they would proliferate in the aorta.

      Issue (4) The authors could evaluate whether Ccr4 SNPs correlate with an increased risk of atherosclerosis.

      As the reviewer pointed out, it is important to consider the clinical relevance of our findings. However, there is no evidence supporting that Ccr4 single nucleotide polymorphisms correlate with a higher incidence of atherosclerotic cardiovascular disease.

      Issue (5) The authors could evaluate if the transfer of Apoe<sup>-/-</sup> Tregs rescues early atherosclerosis development in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice.

      To confirm whether transfer of CCR4-intact Tregs rescues the development of early atherosclerotic lesions in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, we injected Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice with saline or Tregs from Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice and analyzed the aortic root atherosclerotic lesions of recipient Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. However, we found no significant difference in the aortic sinus plaque area among the 3 groups. We described this result in the results section and included the data in Supplementary Figure 8.

      Reviewer #3 (Recommendations for the authors):

      Analysis of TCD4<sup>+</sup> cell populations in different tissues:

      Issue (1) The description of flow cytometry analysis is incomplete and requires clarification. Please detail the use of controls to ensure correct analysis, including the following: i) cell viability; ii) staining controls to define positive and negative cells; iii) the gating strategy used to identify cell populations in each lymphoid tissue and aorta (please provide them as supplementary figures).

      As we thought that most of the prepared cells would be viable, we did not check their viability. Based on our previous work where various immune cells including Tregs, effector memory T cells, and helper T cell subsets were clearly detected, in this study we performed flow cytometric analysis of these immune cells without preparing negative controls stained with isotype control antibodies. The gating strategy of flow cytometric analysis of various immune cells in peripheral lymphoid tissues was reported in our previous report (J Am Heart Assoc 2024; 13: e031639). We provided the gating strategy of flow cytometric analysis of helper T cells and Tregs in the aorta in Supplementary Figure 9.

      Issue (2) The phenotype/differentiation markers used for analysing T CD4<sup>+</sup> cell subsets differ between lymphoid tissues and aortic lesions; might this influence results? If so, please comment on that.

      As the number of aortic T cells was quite few compared with that in peripheral lymphoid tissues, it seemed difficult to precisely detect aortic T cells including various helper T cell subsets and Tregs by intracellular cytokine staining. Therefore, we decided to analyze these cells by evaluating transcription factors specific for helper T cell subsets. The difference in the markers used for analyzing T cell subsets would not considerably influence the results.

      Issue (3) Considering my observations about the effect of CCR4 deficiency on the T CD4<sup>+</sup> differentiation profile in different tissues, I suggest comparing Th1/Treg and Th17/Treg ratios in all examined tissues. The modulation of the Th17/Th1 balance could shape inflammation.

      The Th1 cell/Treg balance is shifted toward Th1 cell responses in the atherosclerotic aorta of Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, while this balance would not be altered in the peripheral lymphoid tissues. It remains unclear whether CCR4 deficiency affects the Th17 cell/Treg ratio. We do not think that it is important to investigate the effect of CCR4 deficiency on the balance of Th17 cell/Treg or Th17 cell/Th1 cell because the role of Th17 cell responses in atherosclerosis remains controversial.

      Issue (4) Cell numbers of recovered Treg from para-aortic lymphoid nodes and aortic tissues might not allow Treg functional assays. Analysis by flow cytometry of biomarkers of Treg activation state would be more informative than by quantifying mRNA expression levels. In particular, TGFβ analysis at the mRNA level does not provide much more information about the suppressive activity of Treg, and even at the protein level, the recognition of the active form of this cytokine is required. Analysis of PD1 (for exhausted cell phenotype) and Treg apoptosis along the stages of atherosclerosis could also yield useful information.

      We performed flow cytometric analysis of activation markers CTLA-4 and CD103, cell exhaustion marker PD1, and apoptosis in Tregs in the para-aortic LNs of Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, and found no major differences in the expression levels of these molecules or the proportion of apoptotic cells between the 2 groups. We showed these data below.

      Author response image 2.

      Unfortunately, we failed to evaluate the activity of TGF-β in Tregs because an appropriate experimental method for precisely detecting its active form was unavailable.

      Issue (5) Regarding the result´s interpretation, I recommend being precise when concluding to avoid misunderstanding. A shift in the T CD4<sup>+</sup> response in lymphoid tissues might be interpreted as a modulation of the T cell differentiation process, which strongly depends on signals derived from DCs, which were not the focus of this study.

      There are two possible mechanisms for the altered CD4<sup>+</sup> T cell responses in peripheral lymphoid tissues, which include the modulation of their differentiation and proliferation processes. These processes are substantially regulated by DCs whose function could be favorably modulated by CCR4-expressing Tregs as described in the manuscript. Therefore, we think that the interactions between Tregs and DCs are crucial for shifting the CD4<sup>+</sup> T cell responses in peripheral lymphoid tissues, though it remains unclear which process plays a major role in regulating CD4<sup>+</sup> T cell polarization.

      Suppression studies:

      Issue (1) In vitro assays. According to the methodology suppression studies were performed using Treg collected from peripheral lymphoid nodes and spleen, but it is unclear whether these cells were analysed separately or as a pool (this was not clarified in the legend of Figure 5 either). Besides, be precise about which cells were used as antigen-presenting cells in the Treg suppression assay.

      In in vitro Treg suppression assay, we used Tregs purified from peripheral lymph nodes and spleen as a pool. We used splenocytes as antigen-presenting cells in Treg suppression assay. We revised the manuscript accordingly.

      Issue (2) Obtaining CD4<sup>+</sup>CD25<sup>+</sup> and CD4<sup>+</sup>CD25-. The control of the purity and viability of cell preparations from CCR4 deficient and CCR4 sufficient Apoe<sup>-/-</sup> mice should be included as a supplementary material; these purified cells were used in in vitro suppressive assays and in vivo cell transfer experiments, being relevant information to guarantee results. Since this control was performed by flow cytometry, I wonder whether Foxp3 levels were also checked.

      We included the data on the purity and viability of CD4<sup>+</sup>CD25<sup>+</sup> Tregs and CD4<sup>+</sup>CD25<sup>-</sup> T cells from Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice in Supplementary Figure 10. After the isolation of CD4<sup>+</sup>CD25<sup>+</sup> T cells, we examined Foxp3 expression by flow cytometry and found that most of these cells express Foxp3.

      Issue (3) For in vitro assays, IL-2, IL-10, and TGFβ measurement in culture supernatants could confirm and provide more information about Treg function.

      As both CD4<sup>+</sup>CD25<sup>+</sup> Tregs and CD4<sup>+</sup>CD25<sup>-</sup> T cells would produce various cytokines in in vitro Treg suppression assay, it is difficult to determine which cells mainly produce the above cytokines. Therefore, measurement of these cytokines would not provide more information about Treg function.

      Issue (4) It would be interesting to assess whether CCR4-mediated DC-Treg interaction is equally important to regulate Th1 than Th17 and Th2 activation; this likely requires using different settings to favour each activation profile.

      Based on our findings, we speculate that CCR4 may play an important role in regulating not only Th1 cell responses but also Th2 and Th17 cell responses by maintaining the interactions between Tregs and DCs. However, it may not be meaningful to investigate the effect of CCR4 deficiency on these T cell responses because the roles of Th2 and Th17 cell responses in atherosclerosis remain controversial.

      Issue (5) The authors showed that the presence of Treg decreased CD80 and CD86 surface levels in DCs in vitro, remarking a lower capacity of Treg derived from CCR4-deficient mice (Figure 5B). However, the fact that CD86 on splenic CD11c+MHC-II+ DCs in 8-week-old Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice was significantly higher than in Apoe<sup>-/-</sup> was underestimated (Supplementary Figure 4). This data needs reconsideration as it might indicate an in vivo more permissive activation state of DCs in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice than in Apoe<sup>-/-</sup> mice, explaining the augmented effector T cell response observed in these mice (Figure 2).

      Our finding of the upregulated CD86 expression on DCs in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice could be explained by the data on a Treg-DC coculture experiment showing the impaired ability of CCR4-deficient Tregs to downregulate CD80 and CD86 expression on DCs. As the reviewer pointed out, our data may indicate more permissive activation state of DCs and subsequent augmentation of effector T cell responses in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, which may be derived from impaired Treg suppressive function.

      Assays for chemokine levels and influence on T cell activation and traffic:

      Issue (1) Considering the findings described by Döring et al. (reference 24 in the paper), monitoring CCL22, CCL17, and CCL3 levels in the aorta and lymph nodes along atherosclerosis development would help in understanding when and how CCL17/CCL20-CCR4 might influence T cell activation and traffic. I wonder whether these chemokines were assayed by qPCR in lymphoid nodes and aorta from CCR4-deficient and sufficient Apoe<sup>-/-</sup> mice. The authors report that CCR8 (capable also of binding CCL17) was unaltered by CCR4 deficiency in splenic and para-aortic lymph nodes Treg from 8 and 18 weeks-old mice, respectively (Supplementary Figure 5 and 6), although a trend towards a high-level was observed for splenic Treg. It would be informative to evaluate CCR8 Treg levels along with atherosclerosis progress.

      As it is considered that the mRNA expression levels of chemokines do not necessarily reflect their protein expression levels, we did not analyze the mRNA expression of Ccl17 or Ccl22 by quantitative reverse transcription PCR. Instead of this, we evaluated the protein expression of CCL17 and CCL22 not only in the aorta but also in the peripheral lymph nodes of 18-week-old wild-type, Apoe<sup>-/-</sup>, and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice by immunohistochemistry. We found no marked differences in their expression levels in peripheral lymph nodes among these mice and included the data in Supplementary Figure 2.

      As we focused on the role of the CCL17/CCL22–CCR4 axes in atherosclerosis, we did not examine the expression of CCL3 that is not directly related to these axes. The evaluation of CCR8+ Treg proportion is beyond the scope of this study, though we are interested in the change of this population by CCR4 deficiency associated with atherosclerotic lesion development.

      Issue (2) According to IFNγ and IL-17 expressing TCD4<sup>+</sup> subclasses, Th1 and Th17 cell subset levels increase in the spleen (Figure 3B-D) and para-aortic lymphoid nodes (Figure 4E) in CCR4 absence. A comparison of the CCR4 dependence for the migration of Th17 and Th1 cell subsets to the aorta was not performed in this atherosclerosis model; this study could help to understand the mechanisms associated with the aortic inflammation development.

      To evaluate the migration of Th1 or Th17 cells in the aorta, we need to specifically isolate them from the peripheral lymphoid tissues of Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice and adoptively transfer them into recipient Apoe<sup>-/-</sup> mice. However, it is impossible to isolate alive Th1 or Th17 cells because specific cell surface markers that enable us to separate these cells are unavailable.

      Issue (3) The numbers of Kaede Treg cells detected in the aorta were extremely low in both Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice (Figure 5I), opening results to question. Besides, the flow cytometry assay used for determining Kaede Treg cells in tissues was not well described. How were cell viability and formation of doublets examined to avoid artefacts? The gating strategy used to ensure a confident analysis of Kaede Tregs, particularly in the aorta, should be included as supplementary material.

      The extremely low number of Kaede-expressing Tregs migrated in the aorta of Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice may be derived from the small number of the transferred Tregs. As another explanation for this finding, Tregs may rarely migrate in the aorta under hypercholesterolemic conditions. We did not check the viability or doublets of Kaede-expressing Tregs because we thought that such experimental procedures would not considerably affect the results. We provided the gating strategy of flow cytometric analysis of Kaede-expressing Tregs in peripheral lymphoid tissues and aortas in Supplementary Figure 11.

      Other comments:

      Issue (1) As an alternative for statistical data analysis from independent experiments, two-way ANOVA with Tukey's post hoc (for data normally distributed) or the Mack Skillings exact test with Conover´s post hoc multiple comparison test (for a two-way layout in non-parametric conditions) could improve analysis.

      We performed statistical analysis in Figure 5A according to the reviewer’s suggestion.

      Issue (2) For future work, employing recombinant pseudo-receptor proteins capable of neutralizing chemokines (doi: 10.1016/j.jhep.2021.08.029) might help as an alternative to complete knockout mice.

      We thank the reviewer for giving us the information on an interesting approach as an alternative to CCR4-deficient mice.

    1. eLife Assessment

      This important study investigates how signals from the nervous system can influence the response to different food sources. To demonstrate the role of specific neuronal and intestinal regulators in sensing food quality and modulating digestion, the authors present evidence through a combination of genetic screening, RNA-seq analysis, and functional studies. While the findings shed light on an adaptive strategy to integrate food perception with physiological responses, the evidence presented varies between convincing and incomplete, and additional experiments are needed to more fully support their central hypothesis.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al have tried to dissect the neural and molecular mechanisms that C. elegans use to avoid digestion of harmful bacterial food. Liu et al show that C. elegans use the ON-OFF state of AWC olfactory neurons to regulate the digestion of harmful gram-positive bacteria S. saprophyticus (SS). The authors show that when C. elegans are fed on SS food, AWC neurons switch to OFF fate which prevents digestion of S. saprophyticus and this helps C. elegans avoid these harmful bacteria. Using genetic and transcriptional analysis as well as making use of previously published findings, Liu et al implicate the p38 MAPK pathway (in particular, NSY-1, the C. elegans homolog of MAPKKK ASK1) and insulin signaling in this process.

      Strengths:

      The authors have used multiple approaches to test the hypothesis that they present in this manuscript.

      Weaknesses:

      Overall, I am not convinced that the authors have provided sufficient evidence to support the various components of their hypothesis. While they present data that loosely align with their hypothesis, they fail to consider alternative explanations and do not use rigorous approaches to strengthen their overall hypothesis. The selective picking of genes from the RNA sequencing data and forcing the data to fit the proposed hypothesis based on previously published findings, without exploring other approaches, indicates a lack of thoroughness and rigor. These critical shortcomings significantly diminish enthusiasm for the manuscript in its totality. In my opinion, this is the biggest weakness in this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      Using C. elegans as a model, the authors present an interesting story demonstrating a new regulatory connection between olfactory neurons and the digestive system. Mechanistically, they identified key factors (NSY-1, STR-130 et.al) in neurons, as well as critical 'signaling factors' (INS-23, DAF-2) that bridge different cells/tissues to execute the digestive shutdown induced by poor-quality food (Staphylococcus saprophyticus, SS).

      Strengths:

      The conclusions of this manuscript are mostly well supported by the experimental results shown.

      Weaknesses:

      Several issues could be addressed and clarified to strengthen their conclusions.

      (1) The word "olfactory" should be carefully used and checked in this manuscript. Although AWCs are classic olfactory neurons in C. elegans, no data in this manuscript supports the idea that olfactory signals from SS drive the responses in the digestive system. To validate that it is truly olfaction, the authors may want to check the responses of worms (e.g. AWC, digestive shutdown, INS-23 expression) to odors from SS.

      (2) In line 113, what does "once the digestive system is activated" mean? The authors need to provide a clearer statement about 'digestive activation' and 'digestive shutdown'.

      (3) No control data on OP50. This would affect the conclusions generated from Figures 2A, 2B, 2D, 3B, 3C, 3G, 4D-G, 5D-E, 6B-D.

      (4) Do the authors know which factors are released from AWC neurons to drive the digestive shutdown?

    4. Reviewer #3 (Public review):

      Summary:

      The study explores a molecular mechanism by which C. elegans detects low-quality food through neuron-digestive crosstalk, offering new insights into food quality control systems. Liu and colleagues demonstrated that NSY-1, expressed in AWC neurons, is a key regulator for sensing Staphylococcus saprophyticus (SS), inducing avoidance behavior and shutting down the digestive system via intestinal BCF-1. They further revealed that INS-23, an insulin peptide, interacts with the DAF-2 receptor in the gut to modulate SS digestion. The study uncovers a food quality control system connecting neural and intestinal responses, enabling C. elegans to adapt to environmental challenges.

      Strengths:

      The study employs a genetic screening approach to identify nsy-1 as a critical regulator in detecting food quality and initiating adaptive responses in C. elegans. The use of RNA-seq analysis is particularly noteworthy, as it reveals distinct regulatory pathways involved in food sensing (Figure 4) and digestion of Staphylococcus saprophyticus (Figure 5). The strategic application of both positive and negative data mining enhances the depth of analysis. Importantly, the discovery that C. elegans halts digestion in response to harmful food and employs avoidance behavior highlights a physiological adaptation mechanism.

      Weaknesses:

      Major points:

      (1) While NSY-1 positively regulates str-130 expression in AWC neurons and is critical for SS avoidance and survival, the authors should examine whether similar phenotypes are observed in str-130 mutants.

      (2) NSY-1 promotes the AWC-OFF state through str-130, inhibiting SS digestion. The authors should investigate whether STR-130 in AWC neurons regulates bcf-1 expression levels in the intestine.

      (3) The current results rely on str-2 expression levels to indicate the AWC state. Ablating AWC neurons and testing the effects on digestion would provide stronger evidence for their role in digestive regulation.

      (4) The claim that NSY-1 inhibits INS-23 and that INS-23 interacts with DAF-2 to regulate bcf-1 expression (Line 339-340) requires further validation. Neuron-specific disruption of INS-23 and gut-specific rescue of DAF-2 should be tested.

      (5) Figure Reference Errors: Lines 296-297 mention Figure 6E, which does not exist in the main text. This appears to refer to Figure 5E, which has not been described.

    1. eLife Assessment

      This important study examines the effects of acute social stress on brain function, focusing on dynamic shifts in large-scale networks such as the salience and default mode networks. It highlights a robust association between stress-induced changes in salience network activation and stress reactivity in daily life, although evidence linking brain function changes following acute stress to real-life stress is incomplete. The findings are significant for stress biology research and could influence future studies on stress responses.

    2. Reviewer #1 (Public review):

      Summary:

      In their paper, Tutunji et al aim to investigate the dynamic effects of stress on activity of different brain networks (salience network, executive network, and default mode network). Crucially they differentiate between rapid (<1 h) and late (>1) effects of stress. Lastly, they connect acute changes in brain activity with inter-individual differences in stress reactivity in real-life assessed using EMA.

      They first show the expected dynamics in stress-induced brain activity with a transient increase in salience network activity and a decrease in default mode network activity although in contrast to expectations, this did not disappear in the late phase. Notably, the increase in salience network activity was associated with a 'resilience index' derived from EMA that captures whether an individual responds with more or less reduction in positive effect than expected based on the number of above average stress events.

      Linking acute stress to long-term affective stress reactivity is a crucial step to better understand how adaptive or maladaptive stress responses play out in the long term and how they might be related to mental health problems.

      Strengths:

      The link of the acute stress response to stress reactivity in daily life is highly relevant and a major strength of the paper. Moreover, the design of the EMA component assessing a week with low stress and one with high stress (exam week) in all participants and thus including a naturalistic manipulation enables a quantification of stress reactivity that captures 'real life'.

      The authors do not only quantify the magnitude of the acute stress response but take into account an early as well as late response to disentangle the dynamic nature of the stress response. In that way, it is possible to establish which parts of the stress response are relevant for the affective response.

      In addition to reporting changes in network activation, the authors also report behavioral outcomes of the tasks which is crucial to evaluate the meaning and relevance of the neural outcomes.

      Weaknesses:

      Although the authors assess multiple physiological outcomes to the stress task, only the cortisol response is analyzed with regard to its association with the stress-induced changes in network activity. Considering that it is mainly the salience network that shows an increase and this in the early phase that is characterized by the noradrenaline and not so much the cortisol response, an association with a marker of the NA response would be interesting.

      To evaluate the association of the acute stress response with stress reactivity in real life more conclusively it would be interesting to see whether and how the affective response to the acute stress is related to stress reactivity in real life.

      In the introduction, the authors hypothesize that all networks show distinct activation patterns during the stress response and expect all of them to be associated with the stress reactivity during EMA. However, no correction for multiple comparisons across the many tests (each network at two phases) is reported.

      All stress-induced changes in activity are assessed by using other tasks since it is not trivial to measure changes in activation of specific regions without comparing different conditions of a task. Nonetheless, with the chosen approach it is not completely clear whether stress only modulates brain responses to other tasks or changes activation within those networks independently of any other tasks. Moreover, one of the tasks did not elicit the expected activation contrast and it is unclear whether this affects stress-effects.

      Some of the less central results that are discussed in the paper such as the association of the real-life stress reactivity measure with neuroticism, the sex-effect of the cortisol response or the mediation and moderation models of the stress-induced changes in network activity and performance in the tasks seem slightly overinterpreted considering that they are either not quite significant or not hypothesized and thus it is not clear why for example once a mediation and in another outcome a moderation model was chosen.

    3. Reviewer #2 (Public review):

      Summary:

      This study aimed to investigate changes in neural responses over time after acute stress and their association with real-life stress. To this end, functional MRI data was collected from 3 tasks (Oddball, 2-back, Associative retrieval) early and late following stress and control conditions. Emotional ratings during a stressful week before an exam and a non-stressful week without an exam were used to index real-world stress. In total, data from 70 individuals were used for the analyses in the paper. Results showed increased oddball related activation early after stress whereas activation to the associative retrieval was reduced across early and late trials following stress compared with control. Brain activation during the oddball task after stress contrasted against control correlated with the index used to measure stress in the real-world. This is a very ambitious study and the findings that stress has opposite effects on the oddball and the associative retrieval tasks is new. However, I am not convinced that brain responses are correlated with real-world stress from the results presented in the paper. I also have several other concerns listed below.

      Strengths:

      The study uses a unique design based on hypothesis firmly grounded in theories of stress related brain function. Large amounts of data are collected for all of the 70 participants included in the analyses and the hypotheses tested using paired tests have strong statistical power. Data collection methods are sound aiming to reduce stress induced by being in the scanner environment for the first time and reducing variation in cortisol due to circadian rhythm.

      Weaknesses:

      An important argument in the paper is that neural responses associated with stress in the lab correspond to stress in real life. This conclusion is based on a single correlation analysis. This is weak evidence because the correlation is based on 70 individuals and may be driven by outliers. In fact, the correlation between the difference in stress-related SN activation (Stress-Control) and real life stress residual is likely to be driven by outliers. In fig 5b, there are 3 persons with SN values of around 2, which is twice as much as the fourth highest value. There is also 1 person with a Real life stress residual of -3 or -4, which is three to four times as much as the person with the second lowest value. These 4 outliers should be removed before calculating the correlation coefficient. Also, no power analysis is presented in the paper showing what effect size is needed for significant results given a sample size of 70.

      It is not clear why the activation maps from the tasks performed in the scanner are referred to as the SN, ECN, and DMN. They are discussed as if they were resting state networks. They are however not resting state networks because they are the results of contrasting two task conditions to each other and not the results from correlating BOLD time-series data from different regions within subjects. Even though masks corresponding to SN, ECN, and DMN are used to calculate means of all voxels, I think these contrasts should be referred to as the tasks that were used to evoke them. It becomes misleading to call them networks which usually refers to nodes and edges in fMRI studies. The first scan was a resting state scan, but these data are not presented in the paper.

      Introduction<br /> In the introduction it is said that there are genomically driven effects of cortisol 1 to 2 hours after stress. This is repeated in the discussion: "[the late stress phase] is thought to be dominated by genomically driven effects of glucocorticoids". (There is no reference to this statement however.) This idea, that gene expression should only be regulated by corticosteroids following stress seems unrealistic. The increase in cortisol was only around 60% from baseline in the current study which seems to be similar to other studies. This means that the baseline cortisol level is far from zero. Therefore, effects of cortisol on gene expression must occur all the time and be tightly regulated by circadian clocks. To propose that genomically driven effects of cortisol only exist 1 to 2 hours following stress is therefore too simplistic.

      In the last paragraph, it says that n=83. However, the final sample consists of 70 people. Correct this number.

      Methods<br /> The EMA data analysis is difficult to understand. Why are the residuals used instead of means for example? I could not understand how the residual values used in the analysis should be interpreted from the way this section was written. Therefore, I cannot judge whether the index is valid or reliable. Using mean values is more common than using residuals when investigating individual differences in stress responses. The use of residuals needs justification and clarification. The results from an analysis using mean values should also be reported.

      How was AUCi calculated? What software was used to calculate AUCi?

      How was the mediation analysis performed? The only information I found was: "We additionally ran separate models with an interaction term modelled for neural activity in the targeted ROI's to examine the relationship between task performance and neural responses, with random slopes and intercepts also modelled for ROI activity." This is not how mediation analyses are done conventionally. It is common to use structural equation modelling or a series of regression analyses. What is meant by separate models? Was a reduced model compared to a full model with an interaction term? In this case, this is not a mediation analysis. I think the term moderation is better to describe this analysis.

    4. Reviewer #3 (Public review):

      This is a very interesting study that aims to examine the effect of stress induction across about two hours on physiological, behavioral, and neural measures in several brain areas. This aim is of importance for the study of stress response and recovery and their neural bases. There are several strengths to the design, including a within-subject design, adequate sample size, and multiple levels of assessment (including lab-based and real-life), and the authors should really be commended for that. The results indicate an acute cortisol response following stress induction, although HR data show that the manipulation may have been effective only among those who did the stress scan first. Behaviorally, stress induction resulted in effects on one of the tasks. Neurally, temporal changes in response were observed in what is referred to as SN and DMN networks, and associations with real-life stress were evident for SN during early stress response. Together, evidence emerged for some temporal changes in stress response on neural function and its associations with behavior and real-life stress response as indicated by self-report EMA.

      These findings, both positive and null, provide important insight to the field, and the authors should be praised for that. At the same time, it is important to emphasize that some aspects or findings complicate interpretation and limit the extent of inference, that many places in the manuscript could benefit from clarification, and that more discussion should be given to the null findings.

      All in all, given the importance of the questions and the strengths of the design, this study could provide a major contribution to future research. But, to accurately and optimally guide research, it is important to accurately describe and interpret both what was tested and found, and what was not found. Some more specific points are noted below, where improvements could be made to facilitate extraction of insight by the reader, and thus increase the impact of the study on the field.

    1. eLife Assessment

      This study uses state-of-the-art methods to label endogenous dopamine receptors in a subset of Drosophila mushroom body neuronal types. The authors report that Dop1R1 and Dop2R receptors, which have opposing effects on intracellular cAMP, are present in axons termini of Kenyon cells, as well as those of two classes of dopaminergic neurons that innervate the mushroom body indicative of autocrine modulation by dopaminergic neurons. Additional experiments showing opposing effects of starvation on Dop1R1 and Dop2R levels in mushroom body neurons are consistent with a role for dopamine receptor levels increasing the efficiency of learned food-odour associations in starved flies. Supported by solid data, this is an important contribution to the field.

    2. Reviewer #1 (Public review):

      Summary:

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      Strengths:

      The split-GFP approach allows visualization of subcellular enrichment of dopamine receptors in the plasma membrane of GAL4-expressing neurons allowing for high level of specificity.

      The authors resolve the presynaptic localization of DopR1 and Dop2R, in "giant" Drosophila neurons differentiated from cytokinesis-arrested neuroblasts in culture as its not clear in the lobes and calyx.

      Starvation induced opposite responses of dopamine receptor expression in the PPL1 and PAM DANs provides key insights into models of appetitive learning.<br /> Starvation induced increase in D2R allows for increased negative feedback that the authors test in D2R knockout flies where appetitive memory is diminished.<br /> This dual autoreceptor system is an attractive model for how amplitude and kinetics of dopamine release can be fine tunes and controlled depending on the cellular function and this paper presents a good methodology to do it and a good system where dynamics of dopamine release can be tested at the level of behavior.

      Weaknesses:

      Key weaknesses have been resolved: 

      1) Receptor expression is consistent between time of the day and the authors picked two time points. The authors mention that the states of animals could affect LI (e.g. feeding state and anesthesia for sorting, see methods) were kept constant. These data and discussion are helpful. <br /> 2) Giant fiber system is argued to be a great model and authors have added additional references. However I am not very deeply familiar with these references or the giant fiber system so I am not completely clear but the argument seems reasonable. <br /> 3) The revised manuscript, shows data in the γ KCs (Figure 4C, Figure 5 - figure supplement 1) in addition to α/β KCs, so it appears there is consistency between lobes. <br /> 4) The new data for Dop1R1 and Dop2R in MBON-γ1pedc>αβ helps with thinking about dopamine receptor co-localization and it would be a herculean talk to do this for all the regions but still keeps room open for different scenarios. 

      The papers discussion has been expanded to account for different possibilities which will help the readers get a more complete picture. I appreciate the review efforts and detailed response to reviewer comments.

    3. Reviewer #2 (Public review):

      Summary:

      Hiramatsu et al. investigated how cognate neurotransmitter receptors with antagonizing downstream effects localize within neurons when co-expressed. They focus on mapping the dopaminergic Dop1R1 and Dop2R receptors, corresponding to the mammalian D1- and D2-like dopamine receptors, which have opposing effects on intracellular cAMP levels, in neurons of the Drosophila mushroom body (MB). To visualize specific receptors in single neuron types within the crowded MB neuropil, the authors use existing dopamine receptor alleles tagged with 7 copies of split GFP to target the reconstitution of GFP tags specifically in the neurons of interest, providing a readout of receptor localization.

      The authors demonstrate that both Dop1R1 and Dop2R are enriched, to differing degrees, in the axonal compartments of Kenyon cells cholinergic presynaptic inputs and in different dopamine neurons (DANs) that project axons to the MB. Co-localization studies of dopamine receptors with the presynaptic marker Brp suggest that Dop1R1, and to a greater extent Dop2R, localize near release sites. This pattern in DANs suggests Dop1R1 and Dop2R serve as dual-feedback autoreceptors. Finally, they provide evidence that the balance of Dop1R1 and Dop2R in the axons of two different DAN populations is differentially modulated by starvation, which plays a role in regulating appetitive behaviors.

      In their revised manuscript, Hiramatsu et al. revisited the localization and functional integrity of Dop1R1 and Dop2R within the Drosophila mushroom body. This revision strengthens their claims with new high-resolution imaging data and additional behavioral assays, supporting the functional integrity of 7X split GFP-tagged receptors and their distinct localizations within neural circuits.

      The revised manuscript by Hiramatsu et al. demonstrates substantial improvements in experimental design and data presentation, effectively addressing concerns raised during the initial review. The addition of advanced imaging techniques and behavioral data confirms the functionality of tagged receptors, while providing deeper insights into their spatial and functional dynamics within neural circuits modulating responses to environmental changes like starvation. This study makes an important contribution to neuroscience, enhancing our understanding of dopamine receptor distribution in circuits underlying learning and memory.

      Strengths:

      The authors use reconstitution of GFP fluorescence of split GFP tags integrated at the endogenous locus of dopamine receptors, providing a precise readout of receptor localization. This method preserves endogenous transcriptional and post-transcriptional regulation, a critical feature for protein localization studies.

      The choice of the Drosophila mushroom body as a model system is excellent, as it is well-studied, its connectome is carefully reconstructed, and its role in behaviors and associative memory enables linking receptor localization patterns to circuit function and behavior. This approach allows the authors to demonstrate that antagonizing dopamine receptors can act as autoreceptors within the axonal compartments of MB-innervating DANs. Moreover, they show that starvation differentially modulates the balance of these receptors in distinct DANs, highlighting the role of this regulation in circuit function and behavior.

      The incorporation of higher-resolution Airyscan microscopy and functional assays in the revision provide evidence that tagged receptors retain functionality and predominantly localize at presynaptic sites within Kenyon cells and DANs. These findings support the dual autoreceptor feedback model proposed.

      Weaknesses:

      While the revision significantly strengthens the manuscript, the absence of specific antibodies against these receptors remains a limitation. This is understandable given the challenges of generating antibodies against such proteins. However, the use of more direct validation methods, such as specific antibodies (if available), and employing higher-resolution techniques like expansion microscopy, could further validate and enhance the robustness of the findings.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study uses state-of-the-art methods to label endogenous dopamine receptors in a subset of Drosophila mushroom body neuronal types. The authors report that DopR1 and Dop2R receptors, which have opposing effects in intracellular cAMP, are present in axons termini of Kenyon cells, as well as those of two classes of dopaminergic neurons that innervate the mushroom body indicative of autocrine modulation by dopaminergic neurons. Additional experiments showing opposing effects of starvation on DopR1 and DopR2 levels in mushroom body neurons are consistent with a role for dopamine receptor levels increasing the efficiency of learned food-odour associations in starved flies. Supported by solid data, this is a valuable contribution to the field.

      We thank the editors for the assessment, but request to change “DopR2” to “Dop2R”. The dopamine receptors in Drosophila have confusing names, but what we characterized in this study are called Dop1R1 (according to the Flybase; aka DopR1, dDA1, Dumb) and Dop2R (ibid; aka Dd2R). DopR2 is the name of a different dopamine receptor.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      Strengths:

      The split-GFP approach allows visualization of subcellular enrichment of dopamine receptors in the plasma membrane of GAL4-expressing neurons allowing for a high level of specificity.

      The authors resolve the presynaptic localization of DopR1 and Dop2R, in "giant" Drosophila neurons differentiated from cytokinesis-arrested neuroblasts in culture as it is not clear in the lobes and calyx.

      Starvation-induced opposite responses of dopamine receptor expression in the PPL1 and PAM DANs provide key insights into models of appetitive learning.

      Starvation-induced increase in D2R allows for increased negative feedback that the authors test in D2R knockout flies where appetitive memory is diminished.

      This dual autoreceptor system is an attractive model for how amplitude and kinetics of dopamine release can be fine-tuned and controlled depending on the cellular function and this paper presents a good methodology to do it and a good system where the dynamics of dopamine release can be tested at the level of behavior.

      Weaknesses:

      LI measurements of Kenyon cells and lobes indicate that Dop2R was approximately twice as enriched in the lobe as the average density across the whole neuron, while the lobe enrichment of Dop1R1 was about 1.5 times the average, are these levels consistent during different times of the day and the state of the animal. How were these conditions controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.

      To answer this question, we repeated the experiment in two replicates at different times of day and confirmed that the receptor localization was consistent (Figure 3 – figure supplement 1); LI measurements showed that Dop2R is enriched more in the lobe and less in the calyx compared to Dop1R1 (Figure 3D). The states of animals that could affect LI (e.g. feeding state and anesthesia for sorting, see methods) were kept constant. 

      The authors assume without discussion as to why and how presynaptic enrichment of these receptors is similar in giant neurons and MB.

      In the revision, we added a short summary to recapitulate that the giant neurons exhibit many characteristics of mature neurons (Lines #152-156): "Importantly, these giant neurons exhibit characteristics of mature neurons, including firing patterns (Wu et al., 1990; Yao & Wu, 2001; Zhao & Wu, 1997) and acetylcholine release (Yao et al., 2000), both of which are regulated by cAMP and CaMKII signaling (Yao et al., 2000; Yao & Wu, 2001; Zhao & Wu, 1997)." In addition, we found punctate Brp accumulations localized to the axon terminals of the giant neurons (former Figure 4D and 4E). Therefore, the giant neuron serves as an excellent model to study the presynaptic localization of dopamine receptors in isolated large cells.

      Figures 1-3 show the expensive expression of receptors in alpha and beta lobes while Figure 5 focusses on PAM and localization in γ and β' projections of PAM leading to the conclusion that presynaptic dopamine neurons express these and have feedback regulation. Consistency between lobes or discussion of these differences is important to consider.

      In the revised manuscript, we show data in the γ KCs (Figure 4C, Figure 5 - figure supplement 1) in addition to α/β KCs, and demonstrate the consistent synaptic localization of Dop1R1 and Dop2R as in α/β KCs (Figure 4B and 5A). 

      Receptor expression in any learning-related MBONs is not discussed, and it would be intriguing as how receptors are organized in those cells. Given that these PAMs input to both KCs and MBONs these will have to work in some coordination.

      The subcellular localization of dopamine receptors in MBONs indeed provides important insights into the site of dopaminergic signaling in these neurons (Takemura et al., 2017; Pavlowsky et al., 2018; Pribbenow et al., 2022). Therefore, we added new data for Dop1R1 and Dop2R in MBON-γ1pedc>αβ (Figure 6). Interestingly, these receptors are localized to in the dendritic projection in the γ1 compartment as well as presynaptic boutons (Figure 6). 

      Although authors use the D2R enhancement post starvation to show that knocking down receptors eliminated appetitive memory, the knocking out is affecting multiple neurons within this circuit including PAMs and KCs. How does that account for the observed effect? Are those not important for appetitive learning? 

      In the appetitive memory experiment (Figure 9C), we knocked down Dop2R only in the select neurons of the PPL1 cluster, and this manipulation does not directly affect Dop2R expression in PAMs and KCs.

      Starvation-induced enhancement of Dop2R expression in the PPL1 neurons (Figure 8F) would attenuate their outputs and therefore disinhibit expression of appetitive memory in starved flies (Krashes et al., 2009). Consistently, Dop2R knock-down in PPL1 impaired appetitive memory in starved flies (Figure 9C). We revised the corresponding text to make this point clearer (Lines #224227).

      The evidence for fine-tuning is completely based on receptor expression and one behavioral outcome which could result from many possibilities. It is not clear if this fine-tuning and presynaptic feedback regulation-based dopamine release is a clear possibility. Alternate hypotheses and outcomes could be considered in the model as it is not completely substantiated by data at least as presented.

      The reviewer’s concern is valid, and the presynaptic dopamine tuning by autoreceptors may need more experimental support. We therefore additionally discussed another possibility (Lines #289-291): “Alternatively, these presynaptic receptors could potentially receive extrasynaptic dopamine released from other DANs. Therefore, the autoreceptor functions need to be experimentally clarified by manipulating the receptor expression in DANs.”

      Reviewer #2 (Public Review):

      Summary:

      Hiramatsu et al. investigated how cognate neurotransmitter receptors with antagonizing downstream effects localize within neurons when co-expressed. They focus on mapping the localization of the dopaminergic Dop1R1 and Dop2R receptors, which correspond to the mammalian D1- and D2-like dopamine receptors, which have opposing effects on intracellular cAMP levels, in neurons of the Drosophila mushroom body (MB). To visualize specific receptors in single neuron types within the crowded MB neuropil, the authors use existing dopamine receptor alleles tagged with 7 copies of split GFP to target reconstitution of GFP tags only in the neurons of interest as a read-out of receptor localization. The authors show that both Dop1R1 and Dop2R, with differing degrees, are enriched in axonal compartments of both the Kenyon Cells cholinergic presynaptic inputs and in different dopamine neurons (DANs), which project axons to the MB. Co-localization studies of dopamine receptors with the presynaptic marker Brp suggest that Dop1R1 and, to a larger extent Dop2R, localize in the proximity of release sites. This localization pattern in DANs suggests that Dop1R1 and Dop2R work in dual-feedback regulation as autoreceptors. Finally, they provide evidence that the balance of Dop1R1 and Dop2R in the axons of two different DAN populations is differentially modulated by starvation and that this regulation plays a role in regulating appetitive behaviors.

      Strengths:

      The authors use reconstitution of GFP fluorescence of split GFP tags knocked into the endogenous locus at the C-terminus of the dopamine receptors as a readout of dopamine receptor localization. This elegant approach preserves the endogenous transcriptional and post-transcriptional regulation of the receptor, which is essential for studies of protein localization.

      The study focuses on mapping the localization of dopamine receptors in neurons of the mushroom body. This is an excellent choice of system to address the question posed in this study, as the neurons are well-studied, and their connections are carefully reconstructed in the mushroom body connectome. Furthermore, the role of this circuit in different behaviors and associative memory permits the linking of patterns of receptor localization to circuit function and resulting behavior. Because of these features, the authors can provide evidence that two antagonizing dopamine receptors can act as autoreceptors within the axonal compartment of MB innervating DANs. The differential regulation of the balance of the two receptors under starvation in two distinct DAN innervations provides evidence of the role that regulation of this balance can play in circuit function and behavioral output.

      Weaknesses:

      The approach of using endogenously tagged alleles to study localization is a strength of this study, but the authors do not provide sufficient evidence that the insertion of 7 copies of split GFP to the C terminus of the dopamine receptors does not interfere with the endogenous localization pattern or function. Both sets of tagged alleles (1X Venus and 7X split GFP tagged) were previously reported (Kondo et al., 2020), but only the 1X Venus tagged alleles were further functionally validated in assays of olfactory appetitive memory. Despite the smaller size of the 7X split-GFP array tag knocked into the same location as the 1X venus tag, the reconstitution of 7 copies of GFP at the C terminus of the dopamine receptor, might substantially increase the molecular bulk at this site, potentially impeding the function of the receptor more significantly than the smaller, single Venus tag. The data presented by Kondo et al. 2020, is insufficient to conclude that the two alleles are equivalent.

      In the revision, we validated the function of these engineered receptors by a new set of olfactory learning experiments. Both these receptors in KCs were shown to be required for aversive memory (Kim et al., 2007, Scholz-Kornehl et al., 2016). As in the anatomical experiments, we induced GFP110 expression in KC of the flies homozygous for 7xGFP<sub>11</sub>-tagged receptors using MB-Switch and 3 days of RU486 feeding o. We confirmed STM performance of these flies were not significantly different from the control (Figure 2 – figure supplement 1). Thus, these fusion receptors are functional.

      The authors' conclusion that the receptors localize to presynaptic sites is weak. The analysis of the colocalization of the active zone marker Brp whole-brain staining with dopamine receptors labeled in specific neurons is insufficient to conclude that the receptors are localized at presynaptic sites. Given the highly crowded neuropil environment, the data cannot differentiate between the receptor localization postsynaptic to a dopamine release site or at a presynaptic site within the same neuron. The known distribution of presynaptic sites within the neurons analyzed in the study provides evidence that the receptors are enriched in axonal compartments, but co-labeling of presynaptic sites and receptors in the same neuron or super-resolution methods are needed to provide evidence of receptor localization at active zones.  The data presented in Figures 5K-5L provides compelling evidence that the receptors localize to neuronal varicosities in DANs where the receptors could play a role as autoreceptors.

      Given the highly crowded environment of the mushroom body neuropil, the analysis of dopamine receptor localization in Kenyon cells is not conclusive. The data is sufficient to conclude that the receptors are preferentially localizing to the axonal compartment of Kenyon cells, but co-localization with brain-wide Brp active zone immunostaining is not sufficient to determine if the receptor localizes juxtaposed to dopaminergic release sites, in proximity of release sites in Kenyon cells, or both.

      To better resolve the microcircuits of KCs, we triple-labeled the plasma membrane and DAR::rGFP in KCs, and Brp, and examined their localizations with high-resolution imaging with  Airyscan. This strategy revealed the receptor clusters associated with Brp accumulation within KCs (Figure 4). To further verify the association of DARs and active zones within KCs, we co-expressed Brp<sup>short</sup>::mStraw and GFP<sub>1-10</sub> and confirmed their colocalization (Figure 5A), suggesting presynaptic localization of DARs in KCs. With these additional characterizations, we now discuss the significance of receptors at the presynaptic sites of KCs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      For Figure 1, the authors show PAM, PPL1 neurons, and the ellipsoid body as a validation of their tools (Dop1R1-T2A-GAL4 and Dop2R-T2A-GAL4) and the idea that these receptors are colocalized. However, it appears that the technique was applied to the whole brain so it would be great to see the whole brain to understand how much labelling is specific and how stochastic. Methods could include how dissection conditions were controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.

      The expression patterns of the receptor T2A-GAL4 lines (Figure 1A and 1B) are consistent in the multiple whole brains (Kondo et al., 2020, Author response image 1).

      Author response image 1.

      The significance of the expression of these two receptors in an active zone is not clearly discussed and presynaptic localization is not elaborated on. Would something like expansion microscopy be useful in resolving this? It would be important to discuss that as giant neurons in culture don't replicate many aspects of the MB system.

      In the revised manuscript, we elaborated discussion regarding the function of the two antagonizing receptors at the AZ (Lines #226-275).

      Does MB-GeneSwitch > GFP1-1 reliably express in gamma lobes? Most of the figures show alpha/beta lobes.

      Yes. MB-GeneSwitch is also expressed in γ KCs, but weakly. 12 hours of RU486 feeding, which we did in the previous experiments, was insufficient to induce GFP reconstitution in the γ KCs. By extending the time of transgene induction, we visualized expression of Dop1R1 and Dop2R more clearly in γ KCs. Their localization is similar to that in the α/β KCs (Figure 4C, Figure 5 - figure supplement 1).

      Figure 6, y-axis says protein level. At first, I thought it was related to starvation so maybe authors can be more specific as the protein level doesn't indicate any aspect of starvation.

      We appreciate this comment, and the labels on the y-axis were now changed to “rGFP levels” (Figure 8C and 8F, Figure 8 - figure supplement 1B, 1D and 1F).

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The title of the manuscript focuses on the tagging of the receptors and their synaptic enrichment.

      Given that the alleles used in the study were generated in a previously published study (Kondo et al, 2020), which describes the receptor tagging and that the data currently provided is insufficient to conclude that the receptors are localizing to synapses, the title should be changed to reflect the focus on localizing antagonistic cognate neurotransmitter receptors in the same neuron and their putative role as autoreceptors in DANs.

      Following this advice, we removed the methodology from the title and revised it to “Synaptic enrichment and dynamic regulation of the two opposing dopamine receptors within the same neurons”.

      Minor issues with text and figures:

      Figure 1

      A conclusion from Figure 1 is that the two receptors are co-expressed in Kenyon cells. Please provide panels equivalent to the ones shown in D-G, with Kenyon cells cell bodies, or mark these cells in the existing panels, if present. Line 111 refers to panel 1D as the Kenyon cells panel, which is currently a PAM panel.

      We added images for coexpression of these receptors in the cell bodies of KCs (Figure 1 - figure supplement 1) and revised the text accordingly (Lines #89-90).

      Given that most of the study centers on visualizing receptor localization, it would benefit the reader to include labels in Figure 1 that help understand that these panels reflect expression patterns rather than receptor localization. For instance, rCD2::GFP could be indicated in the Dop1R1-LexA panels.

      As suggested, labels were added to indicate the UAS and lexAop markers (Figure 1D, 1E, 1G-1I and Figure 1 – figure supplement 1).

      Given that panels D-E focus on the cell bodies of the neurons, it could be beneficial for the reader to present the ellipsoid body neurons using a similar view that only shows the cell bodies. Similarly, one could just show the glial cell bodies .

      We now show the cell bodies of ring neurons (Figure 1G) and ensheathing glia (Figure 1I).

      For panel 1E, please indicate the subset of PPL1 neurons that both expressed Dop1R1 and Dop2R, as indicated in the text, as it is currently unclear from the image.

      Dop1R1-T2A-LexA was barely detected in all PPL1 (Figure 1E). We corrected the confusing text (Lines #95-96).

      Figure 2

      The cartoon of the cell-type-specific labeling should show that the tag is 7XFP-11 and the UAScomponent FP-10, as the current cartoon leads the reader to conclude that the receptors are tagged with a single copy of split GFP. The detail that the receptors are tagged with 7 copies of split GFP is only provided through the genotype of the allele in the resource table.  This design aspect should be made clear in the figure and the text when describing the allele and approach used to tag receptors in specific neuron types.

      We now added the construct design in the scheme (Figure 2A) and revised the corresponding text (Line #101-103).

      Panel A. The arrow representing the endogenous promoter in the yellow gene representation should be placed at the beginning of the coding sequence. Currently, the different colors of what I assume are coding (yellow) and non-coding (white) transcript regions are not described in the legend.  I would omit these or represent them in the same color as thinner boxes if the authors want to emphasize that the tag is inserted at the C terminus within the endogenous locus.

      The color scheme was revised to be more consistent and intuitive (Figure 2A).

      Figure 3

      Labels of the calyx and MB lobes would benefit readers not as familiar with the system used in the study. In addition, it would be beneficial to the reader to indicate in panel A the location of the compartments analyzed in panel H (e.g., peduncle, α3).

      Figure 3A was amended to clearly indicate the analyzed MB compartments.

      Adding frontal and sagittal to panels B-E, as in Figure 2, would help the reader interpret the data. 

      In Figure 3B, “Frontal” and “Sagittal” were indicated.

      Panel F-G. A scale bar should be provided for the data shown in the insets. Could the author comment on the localization of Dop1R1 in KCs? The data in the current panel suggests that only a subset of KCs express high levels of receptors in their axons, as a portion of the membrane is devoid of receptor signals. This would be in line with differential dopamine receptor expression in subsets of Kenyon cells, as shown in Kondo et al., 2020, which is currently not commented on in the paper. 

      We confirmed that the majority of the KCs express both Dop1R1 and Dop2R genes (Figure 1 - figure supplement 1). LIs should be compared within the same cells rather than the differences of protein levels between cell types as they also reflect the GAL4 expression levels. 

      Panel H. Some P values are shown as n.s. (p> 0.05). Other non-significant p values in this panel and in other figures throughout the paper are instead reported (e.g. peduncle P=0.164). For consistency, please report the values as n.s. as indicated in the methods for all non-significant tests in this panel and throughout the manuscript.

      We now present the new dataset, and the graph represents the appropriate statistical results (Figure 3D; see the methods section for details).

      The methods of labeling the receptors through the expression of the GeneSwitch-controlled GFP1-10 in Kenyon cells induced by RU486 are not provided in the methods. Please provide a description of this as referenced in the figure legend and the genotypes used in the analysis shown in the panels.

      The method of RU486 feeding has been added. We apologize for the missing method.

      Figure 4

      Please provide scale bars for the inset in panels A-B.

      Scale bars were added to all confocal images.

      The current analysis cannot distinguish between postsynaptic and presynaptic dopamine receptors in KCs, and the figure title should reflect this.

      We now present the new data dopamine receptors in KCs and clearly distinguish Brp clusters of the KCs and other cell types (Figure 4, Figure 5).

      The reader could benefit from additional details of using the giant neuron model, as it is not commonly used, and it is not clear how to relate this to interpret the localization of dopaminergic receptors within Kenyon cells. The use of the venus-tagged receptor variant should be introduced in the text, as using a different allele currently lacks context. Figures 4F-4J show that the receptor is localizing throughout the neuron. Quantifying the fraction of receptor signal colocalizing with Brp could aid in interpreting the data.  However, it would still not be clear how to interpret this data in the context of understanding the localization of the receptors in neurons within fly brain circuits. In the absence of additional data, the data provided in Figure 4 is inconclusive and could be omitted, keeping the focus of the study on the analysis of the two receptors in DANs. Co-expressing a presynaptic marker in Kenyon cells (e.g., by expressing Brp::SNAP)  in conjunction with rGFP labeled receptor would provide additional evidence of the relationship of release sites in Kenyon cells and tagged dopamine receptors in these same cells and could add evidence in support to the current conclusion.

      Following the advice, we added a short summary to recapitulate that the giant neurons exhibit many characteristics of mature neurons (Lines #152-156): "Importantly, these giant neurons exhibit characteristics of mature neurons, including firing patterns (Wu et al., 1990; Yao & Wu, 2001; Zhao & Wu, 1997) and acetylcholine release (Yao et al., 2000), both of which are regulated by cAMP and CaMKII signaling (Yao et al., 2000; Yao & Wu, 2001; Zhao & Wu, 1997)." Therefore, the giant neuron serves as an excellent model to study the presynaptic localization in large cells in isolation.

      To clarify polarized localization of Brp clusters and dopamine receptors but not "localizing throughout the neuron", we now show less magnified data (Figure 5C). It clearly demonstrates punctate Brp accumulations localized to the axon terminals of the giant neurons (former Figure 4D and 4E). This is the same membrane segment where Dop1R1 and Dop2R are localized (Figure 5C). Therefore, the association of Brp clusters and the dopamine receptors in the isolated giant neurons suggests that the subcellular localization in the brain neurons is independent of the circuit context. 

      As the giant neurons do not form intermingled circuits, venus-tagged receptors are sufficient for this experiment and simpler in genetics.

      Following the suggestion to clarify the AZ association of the receptors in KCs, we coexpressed Brpshort-mStraw and GFP1-10 in KCs and confirmed their colocalization (Figure 5A).

      Figure 6

      The data and analysis show that starvation induces changes in the α3 compartment in PPL1 neurons only, while the data provided shows no significant change for PPL1 neurons innervating other MB compartments. This should be clearly stated in lines 174-175, as it is implied that there is a difference in the analysis for compartments other than α3. Panel L of Figure 6 - supplement 1 shows no significant change for all three compartments analyzed and should be indicated as n.s. in all instances, as stated in the methods. 

      We revised the text to clarify that the starvation-induced differences of Dop2R expression were not significant (Lines #217-219). The reason to highlight the α3 compartment is that both Dop1R1 and Dop2R are coexpressed in this PPL1 neuron (Figure 8D).

      Additional minor comments:

      There are a few typos and errors throughout the manuscript. The text should be carefully proofread to correct these. Here are the ones that came to my attention:

      Please reference all figure panels in the text. For instance, Figure 3A is not mentioned and should be revised in line 112 as Figure 3A-E.

      Lines 103-104. The sentence "LI was visualized as the color of the membrane signals" is unclear and should be revised. 

      Figure 4 legend - dendritic claws should likely be B and C and not B and E.

      Lines 147 - Incorrect figure panels, should be 5C-L or 5D-E.

      Line 241 - DNAs should be DANs.

      Methods - please define what the abbreviation CS stands for.

      We really appreciate for careful reading of this reviewer. All these were corrected.

    1. eLife Assessment

      Wang et al. presented visual (dot) motion and/or the sound of a walking person and found solid evidence that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm, with some demonstration that the gait rhythm is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition). The valuable findings will be of wide interest to those examining biological motion perception and oscillatory processes more broadly.

    2. Reviewer #1 (Public review):

      Shen et al. conducted three experiments to study the cortical tracking of the natural rhythms involved in biological motion (BM), and whether these involve audiovisual integration (AVI). They presented participants with visual (dot) motion and/or the sound of a walking person. They found that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm. The gait rhythm specifically is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition, Experiments 1a/b), which is independent of the specific step frequency (Experiment 1b). Furthermore, audiovisual integration during tracking of gait was specific to BM, as it was absent (that is, the audiovisual congruency effect) when the walking dot motion was vertically inverted (Experiment 2). Finally, the study shows that an individual's autistic traits are negatively correlated with the BM-AVI congruency effect.

    3. Reviewer #2 (Public review):

      The authors evaluate spectral changes in electroencephalography (EEG) data as a function of the congruency of audio and visual information associated with biological motion (BM) or non-biological motion. The results show supra-additive power gains in the neural response to gait dynamics, with trials in which audio and visual information was presented simultaneously producing higher average amplitude than the combined average power for auditory and visual conditions alone. Further analyses suggest that such supra-additivity is specific to BM and emerges from temporoparietal areas. The authors also find that the BM-specific supra-additivity is negatively correlated with autism traits.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shen et al. conducted three experiments to study the cortical tracking of the natural rhythms involved in biological motion (BM), and whether these involve audiovisual integration (AVI). They presented participants with visual (dot) motion and/or the sound of a walking person. They found that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm. The gait rhythm specifically is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition,

      Experiments 1a/b), which is independent of the specific step frequency (Experiment 1b). Furthermore, audiovisual integration during tracking of gait was specific to BM, as it was absent (that is, the audiovisual congruency effect) when the walking dot motion was vertically inverted (Experiment 2). Finally, the study shows that an individual's autistic traits are negatively correlated with the BM-AVI congruency effect.

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      Weaknesses:

      On revision, the authors are careful not to overinterpret an analysis where the statistical test is not independent from the data (channel) selection criterion.

      Thanks for the suggestion and we have done this according to your recommendations below.

      Reviewer #1 (Recommendations for the authors):

      Re: the double-dipping concern: I appreciate the revision. Just to clarify: my concern rests with the selection of *electrodes* based on the interaction test for the 1Hz condition. The 2Hz condition analogous test yields no significant electrodes. You perform subsequent tests (t-tests and 3-way interaction) on the data averaged across the electrodes that were significant for the 1Hz condition. Therefore, these tests will be biased to find a pattern reflecting an interaction at 1Hz, while no similar bias exists for an effect at 2Hz. Therefore, there is a bias to observe a 3-way interaction, and simple effects compatible with a 2-way interaction only for 1Hz, not for 2Hz (which is exactly what you found). There is no good statistical alternative here, I appreciate that, but the bias exists nonetheless. I think the wording is improved in this revision, and the evidence is convincing even in light of this bias.

      We are grateful for your thoughtful comments on the analytical methods. We appreciate your concerns regarding the potential bias of examining 3-way interaction based on electrodes yielding a 2-way interaction effect. To address this issue, we have conducted a bias-free analysis based on electrodes across the whole brain. The results showed a similar pattern of 3-way interaction as previously reported (p = 0.051), suggesting that the previous findings might not be caused by electrode selection. Given that the main results of Experiment 2 were not based on whole-brain analysis, we did not involve this analysis in the main text, and we have removed the three-way interaction results based on selected electrodes from the manuscript to reduce potential concerns. It is also noteworthy that, when performing analyses based on channels independent of the interaction effect at 1 Hz (i.e., significant congruency effects in the upright and inverted conditions, respectively, at 2Hz), we got similar results as reported in the main text (i.e., non-significant interaction and correlation at 2 Hz). These results were presented in the supplementary file in previous versions and mentioned in the correlation part of the Results section (see Fig. S2). Once again, we sincerely appreciate your careful review of our research. We hope the abovementioned points adequately address your concern.

      Reviewer #2 (Public review):

      Summary:

      The authors evaluate spectral changes in electroencephalography (EEG) data as a function of the congruency of audio and visual information associated with biological motion (BM) or non-biological motion. The results show supra-additive power gains in the neural response to gait dynamics, with trials in which audio and visual information was presented simultaneously producing higher average amplitude than the combined average power for auditory and visual conditions alone. Further analyses suggest that such supra-additivity is specific to BM and emerges from temporoparietal areas. The authors also find that the BM-specific supra-additivity is negatively correlated with autism traits.

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      In the first revisions of the paper, the manuscript better relays the results and anticipates analyses, and this version adequately resolves some concerns I had about analysis details. In a further revision, it is clarified better how the results relate to the various competing hypotheses on how biological motion is processed.

      Weaknesses:

      Still, it is my view that the findings of the study are basic neural correlate results that offer only minimal constraint towards the question of how the brain realizes the integration of multisensory information in the service of biological motion perception, and the data do not address the causal relevance of observed neural effects towards behavior and cognition. The presence of an inversion effect suggests that the supraadditivity is related to cognition, but that leaves open whether any detected neural pattern is actually consequential for multi-sensory integration (i.e., correlation is not causation). In other words, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated in the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion that offer some constraint toward mechanism, and it is possible that the effects are behaviorally relevant, but based on the current task and associated analyses this has not been shown (or could not have been, given the paradigm).

      Reviewer #2 (Recommendations for the authors):

      Thank you for your revisions; I have updated the Strengths section, and reworded the weaknesses section. I now concede that the neural effects observed offer some constraint towards what the neural mechanisms for AV integration for BM are, whereas in my previous review, I said too strongly that these results do not offer any information about mechanism.

      Thank you again for your insightful thoughts and comments on our research. They have contributed greatly to enhancing the discussion of the article and provided valuable inspiration for future exploration of causal mechanisms.

    1. eLife Assessment

      These studies make a fundamental contribution to our understanding of axon-guidance mechanisms, focusing on the role of UNC-6/Netrin in the long-range growth and targeting of axons. Using state-of-the-art genetics and in vivo imaging, the authors provide compelling support for the finding that UNC-6/Netrin can act via both chemotaxis and haptotaxis. This work will be of interest to a wide variety of cell and developmental biologists and neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the mechanism of axon growth directed by the conserved guidance cue UNC-6/Netrin. Experiments were designed to distinguish between alternative models in which UNC-6/Netrin functions as either a short range (haptotactic) cue or a diffusible (chemotactic) signal that steers axons to their final destinations. In each case, axonal growth cones execute ventrally directed outgrowth toward a proximal source of UNC-6/Netrin. This work concludes that UNC-6/Netrin functions as both a haptotactic and chemotactic cue to polarize the UNC-40/DCC receptor on the growth cone membrane facing the direction of growth. Ventrally directed axons initially contact a minor longitudinal nerve tract (vSLNC) at which UNC-6/Netrin appears to be concentrated before proceeding in the direction of the ventral nerve cord (VNC) from which UNC-6/Netrin is secreted. Time lapse imaging revealed that growth cones appear to pause at the vSLNC before actively extending ventrally directed filopodia that eventually contact the VNC. Growth cone contacts with the vSLNC were unstable in unc-6 mutants but were restored by expression of a membrane tethered UNC-6 in vSLNC neurons. In addition, expression of membrane tethered UNC-6/Netrin in the VNC was not sufficient to rescue initial ventral outgrowth in an unc-6 mutant. Finally, dual expression of membrane tethered UNC-6/Netrin in both vSLNC and VNC partially rescued the unc-6 mutant axon guidance defect, thus suggesting that diffusible UNC-6 is also required. This work is important because it potentially resolves the controversial question of how UNC-6/Netrin directs axon guidance by proposing a model in which both of the competing mechanisms, e.g., haptotaxis vs chemotaxis, are successively employed. The impact of this work is bolstered by its use of powerful imaging and genetic methods to test models of UNC-6/Netrin function in vivo thereby obviating potential artifacts arising from in vitro analysis.

      Strengths:

      A strength of this approach is the adoption of the model organism C. elegans to exploit its ready accessibility to live cell imaging and powerful methods for genetic analysis.

      Weaknesses:

      In the revised version of this manuscript, the authors have redressed the weaknesses highlighted in my review of the original paper.

    3. Reviewer #2 (Public review):

      Nichols et al studied the role of axon guidance molecules and their receptors and how these work as long-range and/or local cues, using in-vivo time-lapse imaging in C. elegans. They found that the Netrin axon guidance system, work in different modes when acting as a long-range (chemotaxis) cue vs local cue (haptotaxis). As an initial context, they take advantage of the postembryonic-born neuron, PDE, to understand how its axon grows and then is guided into its target. They found that this process occurs in various discrete steps, during which the growth cone migrates and pauses at specific structures, such as the vSLNC. The role of the UNC-6/Netrin and UNC-40/DCC axon guidance ligand-receptor pair was then looked at in terms of its requirement for (1) initial axon outgrowth direction, (2) stabilization at the intermediate target, (3) directional branching from the sublateral region or (4) ventral growth from intermediate target to the VNC. They found that each step is disrupted in the unc-6/Netrin and unc-40/DCC mutants and observed how the localization of these proteins changed during the process of axon guidance in wild type and mutant contexts. These observations were further supported by analysis of a mutant important for the regulation of Netrin signaling, the E3 ubiquitin ligase madd-2/Trim9/Trim67. Remarkably, the authors identified that this mutant affected axonal adhesion and stabilization, but not directional growth. Using membrane-tethered UNC-6 to specific localities, they then found this to be a consequence of the availability of UNC-6 at specific localities within the axon growth path. Altogether, this data and in-vivo analysis provide compelling evidence of the mechanistic foundation of Netrin-mediated axon guidance and how it works step by step.

      The conclusions are well-supported, with both imaging and quantification of each step of axon guidance and localization of UNC-6 and UNC-40. Using a different type of neuron to validate their findings further supports their conclusions and strengthens their model. They also probe the role of the axon guidance ligand-receptor pair SLT-1/Slit and SAX-3/ROBO in this process and find it to work in parallel to UNC-6. This work sets up the stage for future analysis of other axon guidance molecules or regulators using time-lapse in-vivo imaging to better understand their role as long-range and/or local cues.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript from Nichols, Lee, and Shen tackles an important question of how unc6/netrin promotes axon guidance: i.e. haptotaxis vs chemotaxis. This has recently been a large topic of investigation and discussion in the axon guidance field. Using live cell imaging of unc6/netrin and unc40/DCC in several neurons that extend axons ventrally during development, as well as TM localized mutants of Unc6, they suggest that unc6 promotes first haptotaxis of the emerging growth cone followed by chemotaxis of the growth cone. This is timely, as a recent preprint from the Lundquist group, using a similar strategy to make only a TM anchored unc6 similarly found that this could rescue only the haptotaxis like growth of the PDE neuron, but not the second phase of growth. However, their conclusions were quite different based on the overexpression of unc6 everywhere rescuing the second phase, and thus they conclude that a gradient is not present.

      Strengths:

      As this has been quite a controversy in both the invertebrate and vertebrate fields, one strength of this paper is that they use a unc6-neon green to demonstrate unc6 localization, and show localization. Further, they provide localisation of the transmembrane tether version of netrin, showing its restriction to nerve cords.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the mechanism of axon growth directed by the conserved guidance cue UNC-6/Netrin. Experiments were designed to distinguish between alternative models in which UNC-6/Netrin functions as either a short-range (haptotactic) cue or a diffusible (chemotactic) signal that steers axons to their final destinations. In each case, axonal growth cones execute ventrally directed outgrowth toward a proximal source of UNC-6/Netrin. This work concludes that UNC-6/Netrin functions as both a haptotactic and chemotactic cue to polarize the UNC-40/DCC receptor on the growth cone membrane facing the direction of growth. Ventrally directed axons initially contact a minor longitudinal nerve tract (vSLNC) at which UNC-6/Netrin appears to be concentrated before proceeding in the direction of the ventral nerve cord (VNC) from which UNC-6/Netrin is secreted. Time-lapse imaging revealed that growth cones appear to pause at the vSLNC before actively extending ventrally directed filopodia that eventually contact the VNC. Growth cone contacts with the vSLNC were unstable in unc-6 mutants but were restored by the expression of a membrane-tethered UNC-6 in vSLNC neurons. In addition, the expression of membrane-tethered UNC-6/Netrin in the VNC was not sufficient to rescue initial ventral outgrowth in an unc-6 mutant. Finally, dual expression of membrane-tethered UNC-6/Netrin in both vSLNC and VNC partially rescued the unc-6 mutant axon guidance defect, thus suggesting that diffusible UNC-6 is also required. This work is important because it potentially resolves the controversial question of how UNC-6/Netrin directs axon guidance by proposing a model in which both of the competing mechanisms, e.g., haptotaxis vs chemotaxis, are successively employed. The impact of this work is bolstered by its use of powerful imaging and genetic methods to test models of UNC-6/Netrin function in vivo thereby obviating potential artifacts arising from in vitro analysis.

      Strengths:

      A strength of this approach is the adoption of the model organism C. elegans to exploit its ready accessibility to live cell imaging and powerful methods for genetic analysis.

      Weaknesses:

      A membrane-tethered version of UNC-6/Netrin was constructed to test its haptotactic role, but its neuron-specific expression and membrane localization are not directly determined although this should be technically feasible. Time-lapse imaging is a key strength of multiple experiments but only one movie is provided for readers to review.

      Thank you for your comments. We have now used SNAP labeling to directly visualize the localization of membrane tethered UNC-6 and confirmed UNC-6 is only detectable on the sublateral and ventral nerve cords (Figure S3A). These data have been added to the manuscript on page 15, lines 342-347. We have also provided a representative movie for each imaged genotype (Videos S2-10).

      Reviewer #2 (Public Review):

      Nichols et al studied the role of axon guidance molecules and their receptors and how these work as long-range and/or local cues, using in-vivo time-lapse imaging in C. elegans. They found that the Netrin axon guidance system works in different modes when acting as a long-range (chemotaxis) cue vs local cue (haptotaxis). As an initial context, they take advantage of the postembryonic-born neuron, PDE, to understand how its axon grows and then is guided into its target. They found that this process occurs in various discrete steps, during which the growth cone migrates and pauses at specific structures, such as the vSLNC. The role of the UNC-6/Netrin and UNC-40/DCC axon guidance ligand-receptor pair was then looked at in terms of its requirement for

      (1) initial axon outgrowth direction

      (2) stabilization at the intermediate target

      (3) directional branching from the sublateral region or

      (4) ventral growth from the intermediate target to the VNC.

      They found that each step is disrupted in the unc-6/Netrin and unc-40/DCC mutants and observed how the localization of these proteins changed during the process of axon guidance in wild-type and mutant contexts. These observations were further supported by analysis of a mutant important for the regulation of Netrin signaling, the E3 ubiquitin ligase madd-2/Trim9/Trim67. Remarkably, the authors identified that this mutant affected axonal adhesion and stabilization, but not directional growth. Using membrane-tethered UNC-6 to specific localities, they then found this to be a consequence of the availability of UNC-6 at specific localities within the axon growth path. Altogether, this data and in-vivo analysis provide compelling evidence of the mechanistic foundation of Netrin-mediated axon guidance and how it works step by step.

      The conclusions are well-supported, with both imaging and quantification of each step of axon guidance and localization of UNC-6 and UNC-40. Using a different type of neuron to validate their findings further supports their conclusions and strengthens their model. It's not yet known whether this model holds true for other ligand-receptor pairs, but the current work sets the stage for future analysis of other axon guidance molecules using time-lapse in-vivo imaging. There are still two outstanding questions that are important to address to support the authors' model and conclusions.

      (1) The results of UNC-6-TM expression at different locations are clear and support the conclusions but need to consider that there's no diffusible UNC-6 available. What would happen if UNC-6 is tethered to the membrane in an otherwise completely 'normal' UNC-6 gradient. Does the axon guidance ensue normally or does it get stuck in the respective site of the membrane tethered-UNC-6 and doesn't continue to outgrow properly? This is an important control (expression of the UNC-6-TM at the vSLNC or VNC in the wild type background) that would help clarify this question and gain a better insight into the separability of both axon guidance steps and the ability to manipulate these.

      Thank you for your comments. We expressed UNC-6<SUP>TM</SUP> at vSLNC and VNC in wild-type animals and examined adult morphology of both HSN and PDE in the control conditions you suggested. These data are available in Tables 1 and 2 with no statistical differences compared to wildtype animals. Second, we also provide still images of developing PDE axons near the vSLNC (Figure S3D) to confirm that this axon guidance step is intact when UNC-6<SUP>TM</SUP> is overexpressed in specific regions. Together, these data suggest that the TM rescue constructs do not interfere with endogenous axon guidance pathways. We have added these results to the manuscript on page 15, lines 347-349.

      (2) Axon guidance systems do not work in a vacuum and are generally competing against each other. For example, the SLT-1/Slit and SAX-3/ROBO axon guidance ligand-receptor pair is also required for PDE, and other post-embryonic neurons, axon guidance. It would be interesting to test mutants for these genes with the membrane tethered-UNC-6 to determine if the different steps of axon guidance are disrupted and if so, to what degree these are disrupted.

      Thank you for this suggestion. We have performed time-lapse imaging on slt-1 mutants and unc-6; slt-1 double mutants. These data are available in a new figure, Figure 3. Indeed, we found that slt-1 mutants showed abnormal direction of axon emergence and stabilization at the VNC but normal stabilization at vsLNC and axonal branching (Fig.3). These data can be found in the manuscript from pages 11-12, lines 248-269.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript from Nichols, Lee, and Shen tackles an important question of how unc6/netrin promotes axon guidance: i.e. haptotaxis vs chemotaxis. This has recently been a large topic of investigation and discussion in the axon guidance field. Using live cell imaging of unc6/netrin and unc40/DCC in several neurons that extend axons ventrally during development, as well as TM localized mutants of Unc6, they suggest that unc6 promotes first haptotaxis of the emerging growth cone followed by chemotaxis of the growth cone. This is timely, as a recent preprint from the Lundquist group, using a similar strategy to make only a TM anchored unc6 similarly found that this could rescue only the haptotaxis-like growth of the PDE neuron, but not the second phase of growth. However, their conclusions were quite different based on the overexpression of unc6 everywhere rescuing the second phase, and thus they conclude that a gradient is not present.

      Strengths:

      As this has been quite a controversy in both the invertebrate and vertebrate field, one strength of this paper is that they use an unc6-neon green to demonstrate unc6 localization, and show a gradient of localization.

      Weaknesses:

      This is important, although it could be strengthened by first showing a more zoomed-out image of unc6 in the animal, and second demonstrating the localization of the transmembrane anchored unc6 mutants, to help define what may be the "diffusible Unc6".

      Thank you for your comments. We have performed both of these experiments. In Figure 6A, we provide a zoomed out image of PDE growth cone interacting with UNC-6::mNG prior to reaching the vSLNC. Notably, we do not observe an obvious gradient that extends into this more dorsal region of the animal. We have also shown the membrane localization of UNC-6<sup>TM</sup> through SNAP labeling in Figure S3A. These data have been added to the manuscript on page 15, lines 342-347.

      I suggest two additional experimental or analysis suggestions: First, the authors clarify the phenotype of ventral emergence of the growth cone. Though the manuscript images suggest that no matter the mutant there is ventral emergence of the growth cone, but then later defects, yet they claim ventral emergence defects with the UNC6 tethered mutants, but there is no comparison of rose plots. This is confusing and needs to be addressed.

      Thank you for your comment. We have now included images (i.e. slt-1(eh15) and unc-6(ev400); slt-1(eh15) genotypes in Figure 3) and movies showing misoriented axon emergence. We have also provided an additional quantification that allows for statistical comparison of emergence angle across genotypes. This quantification takes the sine function of the angle to quantify the relative emergence trajectory across the dorsal-ventral axis. A value of 1 indicates 90° dorsal emergence, and -1 indicates 90° ventral emergence. Statistical comparisons across genotypes demonstrate that axons in both unc-6 and slt-1 mutants are misoriented relative to wild-type axons. These comparisons can be found in Figures S1B, 3C, S2B, S3C.

      Second, I have concerns that the analysis of unc40 polarization may be misleading in some cases when there appears to indeed be accumulation in the growth cone, but since the only analysis shown is relative to the rest of the cell, that can be lost.

      Thank you for sharing your concerns about the UNC-40 polarization quantifications. We have separately compared the value of the integrated density of UNC-40::GFP in each cellular domain (vSLNC-contacting area and the dorsal soma) between genotypes. While we did not include these comparisons in the original manuscript, we have now included them in the revised manuscript. Overall, these data support our conclusions that UNC-40 mispolarization occurs across the entire cell (Fig. S1F,G; S2E-H; S3E,F).

    1. eLife Assessment

      This important study offers convincing evidence that fmo-4 plays essential roles in established lifespan interventions and downstream of its paralog fmo-2. The work is of substantial benefit for our understanding of this enzyme family, underscoring their importance in longevity and stress resistance. The study also suggests a connection between fmo-4 and dysregulation of calcium signalling, with conclusions and interpretations based on solid genetic methodology and evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This interesting and well-written article by Tuckowski et al. summarizes work connecting the flavin-containing monooxygenase FMO-4 with increased lifespan through a mechanism involving calcium signaling in the nematode Caenorhabditis elegans.

      The authors have previously studied another fmo in worms, FMO-2, prompting them to look at additional members of this family of proteins. They show that fmo-4 is up in dietary restricted worms and necessary for the increased lifespan of these animals as well as of rsks-1 (s6 kinase) knockdown animals. They then show that overexpression of fmo-4 is sufficient to significantly increase lifespan, as well as healthspan and paraquat resistance. Further, they demonstrate that overexpression of fmo-4 solely in the hypodermis of the animal recapitulates the entire effect of fmo-4 OE.

      In terms of interactions between fmo-2 and fmo-4 they show that fmo-4 is necessary for the previously reported effects of fmo-2 on lifespan, while the effects of fmo-4 do not depend on fmo-2.

      Next the authors use RNASeq to compare fmo-4 OE animals to wild type. Their analyses suggested the possibility that FMO-4 was modulating calcium signaling, and through additional experiments specifically identified the calcium signaling genes crt-1, itr-1, and mcu-1 as important fmo-4 interactors in this context. As previously published work has shown that loss of the worm transcription factor atf-6 can extend lifespan through crt-1, itr-1 and mcu-1, the authors asked about interactions between fmo-4 and atf-6. They showed that fmo-4 is necessary for both lifespan extension and increased paraquat resistance upon RNAi knockdown of atf-6.

      Overall this clearly written manuscript summarizes interesting and novel findings of great interest in the biology of aging, and suggests promising avenues for future work in this area.

      Strengths:

      This paper contains a large number of careful, well executed and analysed experiments in support of its existing conclusions, and which also point toward significant future directions for this work. In addition it is clear and very well written.

      Weaknesses:

      Within the scope of the current work there are no major weaknesses. That said, the authors themselves note pressing questions beyond the scope of this study that remain unanswered. For instance, the mechanistic nature of the interactions between FMO-4 and the other players in this story, for example in terms of direct protein-protein interactions, is not at all understood yet. Further, powerful tools such as GCaMP expressing animals will enable a much more detailed understanding of what exactly is happening to calcium levels, and where and when it is happening, in these animals.

    3. Reviewer #2 (Public review):

      Summary:

      Members of a conserved family of flavin-containing monooxygenases (FMOs) play key roles in lifespan extension induced by diet restriction and hypoxia. In C. elegans, fmo-2 has received the majority of attention, but there are multiple fmo genes in both worms and mammals, and how overlapping or distinct the functional roles of these paralogs are remains unclear. Here Tuckowski et al. identify that a new family member, fmo-4, is also a positive modulator of lifespan. Based on differential requirements of fmo-2 and fmo-4 in stress resistance and lifespan extension paradigms, however, the authors conclude that fmo-4 acts through mechanisms that are distinct from fmo-2. Ultimately, the authors place fmo-2 genetically within a pathway involving atf-6, calreticulin, the IP3 receptor, and mitochondrial calcium uniporter, which was previously shown to link ER calcium homeostasis to mitochondrial homeostasis and longevity. The authors thus achieve their overarching aim to reveal that different FMO family members regulate stress resistance and lifespan through distinct mechanisms. Furthermore, because the known enzymatic activity of FMOs involves oxygenating xenobiotic and endogenous metabolites, these findings highlight a potential new link between redox/metabolic homeostasis and ER-mitochondrial calcium signaling.

      Strengths:

      The authors demonstrate links between multiple conserved life-extending signaling pathways and fmo-4, expanding both the significance and mechanistic diversity of FMO-family genes in aging and stress biology.

      The authors use genetics to discover an interesting and unanticipated new link between FMOs and calcium pathways known to regulate lifespan.

      The genetic epistasis patterns for lifespan and stress resistance phenotypes are generally clean and compelling.

      Weaknesses:

      The authors achieve a necessary and valuable first step with regard to linking FMO-4 to calcium homeostasis, but the mechanisms involved remain preliminary at this stage. Specifically, the genetic interactions between fmo-4 and conserved mediators of calcium transport and signaling are convincing, but a putative molecular mechanism by which the activity of FMO-4 would alter subcellular calcium transport remains unclear and potentially indirect. The authors effectively highlight this gap as a key pursuit for subsequent studies.

      The authors have shown that carbachol and EDTA produce the expected effects on a cytosolic calcium reporter in neurons, supporting the utility of the chemical approach in general, but validating that carbachol, EDTA and fmo-4 itself have an impact on calcium in the tissues and subcellular compartments relevant to the lifespan phenotypes would still be valuable in supporting the overall model. Notably, however, the hypodermal-specific role of FMO-4 suggests potential cell non-autonomous regulation of lifespan, such that this pathway may ultimately involve complex inter-cellular signaling that would necessitate substantially more time and effort.

      Employing mutants and more sophisticated genetic tools for modulating calcium transport or signaling (in addition to RNAi) would strengthen key conclusions and/or help to elucidate tissue- or age-specific aspects of the proposed mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      The authors assessed the potential involvement of fmo-4 in a diverse set of longevity interventions, showing that this gene is required for DR and S6 kinase knockdown related lifespan extension. Using comprehensive epistasis experiments they find this gene to be a required downstream player in the longevity and stress resistance provided by fmo-2 overexpression. They further showed that fmo-4 ubiquitous overexpression is sufficient to provide longevity and paraquat (mitochondrial) stress resistance, and that overexpression specifically in the hypodermis is sufficient to recapitulate most of these effects.

      Interestingly, they find that fmo-4 overexpression sensitizes worms to thapsigargin during development, an effect that they link with a potential dysregulation in calcium signalling. They go on to show that fmo-4 expression is sensitive to drugs that both increase or decrease calcium levels, and these drugs differentially affect lifespan of fmo-4 mutants compared to wild-type worms. Similarly, knockdown of genes involved in calcium binding and signalling also differentially affect lifespan and paraquat resistance of fmo-4 mutants.

      Finally, they suggest that atf-6 limits the expression of fmo-4, and that fmo-4 is also acting downstream of benefits produced by atf-6 knockdown.

      Strengths:

      • comprehensive lifespans experiments: clear placement of fmo-4 within established longevity interventions.<br /> • clear distinction in functions and epistatic interactions between fmo-2 and fmo-4 which lays a strong foundation for a longevity pathway regulated by this enzyme family.

      Weaknesses:

      • no obvious transcriptomic evidence supporting a link between fmo-4 and calcium signalling: either for knockout worms or fmo-4 overexpressing strains.<br /> • no direct measures of alterations in calcium flux, signalling or binding that strongly support a connection with fmo-4.<br /> • no measures of mitochondrial morphology or activity that strongly support a connection with fmo-4.<br /> • lack of a complete model that places fmo-4 function downstream of DR and mTOR signalling (first Results section), fmo-2 (second Results section) and at the same time explains connection with calcium signalling.

      Comments on revisions:

      The authors have addressed and fixed all the private comments we had made. In terms of the public comments, I think nothing has changed in terms of strengths and weaknesses. They have multiple independent results (drugs, RNAi and transcriptomics) that suggest a connection between fmo-4 and calcium regulation, but there is no strong evidence for what this connection is. The work still lacks direct measures of calcium, ER or mitochondrial function in relation to fmo-4 (which they acknowledge in the discussion). The first four sections strongly place fmo-4 within established longevity interventions, but their model doesn't explain how calcium regulation would fit into these.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      Comment 1: Within the scope of the current work there are no major weaknesses. That said, the authors themselves note pressing questions beyond the scope of this study that remain unanswered. For instance, the mechanistic nature of the interactions between FMO-4 and the other players in this story, for example in terms of direct protein-protein interactions, is not at all understood yet.

      We thank the reviewer for the positive review, and fully agree and acknowledge that there are unanswered questions for future studies that are beyond the scope of this manuscript.

      Reviewer 2:

      Comment 1: The effects of carbachol and EDTA on intracellular calcium levels are inferred, especially in the tissues where fmo-4 is acting. Validating that these agents and fmo-4 itself have an impact on calcium in relevant subcellular compartments is important to support conclusions on how fmo-4 regulates and responds to calcium.

      We thank the reviewer for this important suggestion. We agree that carbachol and EDTA can be broad agents and validating that they are altering calcium levels is very useful. While this is technically challenging, we attempted to address this by using neuronally expressed GCaMP7f calcium indicator worms and measuring their GFP fluorescence upon exposure to carbachol and EDTA. Assessing both short term and long term exposure to these agents, we were able to show that carbachol increases GFP fluorescence, indicating an increase in calcium levels, and EDTA decreases GFP fluorescence, indicating a decrease in calcium levels. Unfortunately, because FMO-4 is not neuronally expressed, we were not able to test the effects of FMO-4 on calcium in this strain, which would require hypodermal expression and possibly short-term modification of fmo-4 expression to test. We have made sure to temper our language about the indirect measures we used.

      Comment 2: Experiments are generally reliant on RNAi. While in most cases experiments reveal positive results, indicating RNAi efficacy, key conclusions could be strengthened with the incorporation of mutants.

      We appreciate and value this suggestion and agree that mutants could be helpful to strengthen our conclusions. We address this caveat in the discussion of the revised manuscript. We explain that we were concerned about knocking out key calcium regulating genes like itr-1 and mcu-1 that either already result in some level of sickness in the worms when knocked down (itr-1) or could lead to confounding metabolic changes if knocked out. We do find that our RNAi lifespan results are robust and reproducible, but we also understand and recognize the caveats that come with using RNAi knockdown instead of full deletion mutants.

      Reviewer 3:

      Comment 1: no obvious transcriptomic evidence supporting a link between fmo-4 and calcium signaling: either for knockout worms or fmo-4 overexpressing strains.

      We thank the reviewer for this feedback. While there is some transcriptomic evidence, we agree that it is not overwhelming evidence. We do think that this evidence, combined with the phenotype observed under thapsigargin (i.e., significant reduction in worm size and significant delay or prevention of development), in addition to the genetic connections to calcium regulation, provide additional compelling evidence that FMO-4 interacts with calcium signaling.

      Comment 2: no direct measures of alterations in calcium flux, signalling or binding that strongly support a connection with fmo-4.

      As described in reviewer 2 comment 1, we have successfully used GCaMP7f worms to assess calcium flux upon exposure to carbachol and EDTA. This approach confirmed the changes in calcium expected from these compounds. Unfortunately, because FMO-4 is not neuronally expressed, we were not able to test the effects of FMO-4 on calcium in this strain, which would require hypodermal expression and possibly short-term modification of fmo-4 expression to test. We have made sure to temper our language about the indirect measures we used.

      Comment 3: no measures of mitochondrial morphology or activity that strongly support a connection with fmo-4.

      This is a great point, and something we are currently working on to include for a future manuscript. 

      Comment 4: lack of a complete model that places fmo-4 function downstream of DR and mTOR signalling (first Results section), fmo-2 (second Results section) and at the same time explains connection with calcium signalling.

      We thank the reviewer for this helpful feedback. We have included a more complete working model in our revision.

      Recommendations for the authors:

      Reviewer 1:

      Comment 1: "We utilized fmo-4 (ok294) knockout (KO) animals on five conditions reported to extend lifespan in C. elegans." Here I believe "fmo-4 (ok294)" should be "fmo-4(ok294)". (No space).

      We thank the reviewer for this helpful revision. We have made this change as suggested.

      Comment 2: "Wild-type (WT) worms on DR experience a ~35% lifespan extension compared to fed WT worms, but when fmo-4 is knocked out this extension is reduced to ~10% and this interaction is significant by cox regression (p-value < 4.50e-6)." Here "cox regression" should be "Cox regression".

      We have made this change as suggested.

      Comment 3: "Having established this role, we continued lifespan analyses of fmo-4 KO worms exposed to RNAi knockdown of the S6-kinase gene rsks-1 (mTOR signaling), the von hippel lindau gene vhl-1 (hypoxic signaling), the insulin receptor daf-2 (insulin-like signaling), and the cytochrome c reductase gene cyc-1 (mitochondrial electron transport chain, cytochrome c reductase) (Fig 1C-F)." Here "von hippel lindau" should be "Von Hippel-Lindau".

      We have made this change as suggested.

      Comment 4: In three instances in the caption of Figure 5, the "4" in fmo-4 is not italicized when it should be.

      We have made this change as suggested.

      Comment 5: In two instances in the caption of Figure 7, the "4" in fmo-4 is not italicized when it should be, and in one instance in the caption of Figure 7, the "6" in atf-6 is not italicized when it should be.

      We have made this change as suggested.

      Comment 6: "Supplemental Data 3 provides the results of the Log-rank test and Cox regression analysis, which were run in Rstudio." Here Rstudio should be RStudio.

      We have made this change as suggested.

      Comment 7: In the references, within article titles italicization (e.g. of Caenorhabditis elegans) is frequently missing. While this is often an artifact introduced by reference management software, it should be corrected in the final manuscript.

      We thank the reviewer for all the helpful revision suggestions. We have made sure all the references are properly italicized where necessary.

      Reviewer 2:

      Comment 1: While FMO-4 is clearly placed in the ER calcium pathway genetically, the molecular mechanism by which FMO-4 would alter ER calcium is unclear. Notably, Tuckowski et al. highlight this gap in the discussion as well.

      We thank the reviewer for identifying this important caveat. We hope to address the molecular mechanism by which FMO-4 alters ER calcium in upcoming projects.

      Comment 2: Determining whether overexpression of catalytically dead FMO-4 or introduction of an inactivating point mutant into the endogenous locus phenocopy FMO-4 OE and KO animals would help distinguish between mechanisms involving protein-protein interactions or downstream metabolic regulation.

      We thank the reviewer for this valuable suggestion. This is an experiment we are hoping to do in the near future to better understand molecular mechanisms and protein-protein interactions.

      Reviewer 3:

      Comment 1: When measuring the effect of thapsigargin on development of fmo-4 mutants it would be great to use a developmental assay rather than quantifying normalized worm area. Also please add scale bars to Figure 3G and 4H, it seems that fmo-4 overexpression decreases worm size even in control conditions, clarify if this is the case.

      We thank the reviewer for this feedback. In addition to quantifying normalized worm area in Figure 3G-I, we have added a developmental assay (Figure 3J) that shows the development time of wild-type worms on DMSO or thapsigargin as well as the fmo-4 OE worms on DMSO or thapsigargin. These data validate that the fmo-4 OE worm development is either delayed significantly or even prevented when the worms are treated with thapsigargin.

      We have added scale bars to Figure 3G and 4H as suggested.

      We also appreciate the reviewer’s observation of the fmo-4 overexpression worms appearing smaller than wild-type worms in control conditions. We looked through the replicates and found that just one replicate showed a significant decrease in worm size, as observed in our unrevised manuscript. We repeated this experiment twice more to gather more data and determined that the fmo-4 overexpression worms were ultimately not significantly different in size compared to wild-type worms. We have included the new images and quantifications in Figure 3G-I and Figure 4H-J in the revised manuscript.

      Comment 2: correct or replace Supplementary Table 2, which is not showing a DAVID analysis as the title and text would suggest. We should see biological/molecular processes, effect sizes, p-values, ...

      We thank the reviewer for identifying this issue. We have added more detail to the Supplementary Table 2 so that it is clearer what is being shown in each tab.

      Comment 3: clarify the data presented in Supplementary Data 2 because it does not clearly explain what is shown

      This is a great point, and we have added more detail to the Supplementary Data 2 to make sure the data are more clearly explained in each tab.

      Comment 4: in Figure 5B the fluorescent images do not seem to reflect the quantification in panel 5C.

      Thank you for this feedback. We re-analyzed our data to make sure the proper fluorescent images are included with their matching quantifications in Figure 5B-C.

      Comment 5: where is Supplementary Data 3?

      We thank the reviewer for noticing this. Supplementary Data 3 was accidentally missing from the first submission, and has now been added.

      Comment 6: conceptually the last results section (regarding atf-6) does not add much to the story, I would consider removing these results

      We appreciate this feedback. We have decided to keep Figure 7 because we think it helps to validate fmo-4’s role in calcium movement from the ER. While we show genetic interactions between fmo-4 and key genes involved in calcium regulation (crt-1, itr-1, and mcu-1), we think that showing how fmo-4 also interacts with atf-6, a known regulator of calcium homeostasis, strengthens and supports the genetic mechanisms of fmo-4 proposed in this manuscript.

      Comment 7: the model proposed in Figure 7E is not convincingly supported by the results:<br /> o the arrows connecting atf-6, fmo-4 and crt-1 (calreticulin) suggest that fmo-4 is downstream of atf-6 and upstream of crt-1: Berkowitz 2020 showed that atf-6 knockdown downregulates calreticulin, so unless the authors show that this downregulation is mediated directly by fmo-4, the more likely explanation is that atf-6 knockdown affects calcium levels which in turn induces fmo-4 expression.

      We thank the reviewer for this helpful feedback. We have addressed this by updating our proposed model. We used a solid arrow leading from the reduction of atf-6 to induction of fmo-4, as this is supported by our data in Figure 7A-B. We then used dashed arrows between fmo-4 and crt-1 as well as between atf-6 and crt-1 to indicate that more data is needed to clarify this part of the pathway.

      Comment 8: Avoid pointing at a mitochondrial connection in the title as the only evidence supporting this interaction comes from the mcu-1 RNAi epistasis.

      We appreciate the reviewer’s suggestion. We added another piece of evidence suggesting an interaction between fmo-4 and the mitochondria to Supplementary Figure 7G-H. Here we show that while fmo-4 OE worms are resistant to paraquat stress, knocking down vdac-1 (a calcium regulator located in the outer mitochondrial membrane), abrogates this effect. We have kept mitochondria in our title but have made sure to temper our language in the main text to avoid pointing to a strong mitochondrial connection, since we have two pieces of evidence connecting fmo-4 to the mitochondria.

    1. eLife Assessment

      This important study substantially advances our understanding of the circadian clock in Antarctic krill, a key species in the Southern Ocean ecosystem. Through logistically challenging shipboard experiments conducted across seasons, the authors provide compelling evidence for their conclusions. The study will be of broad interest to marine biologists and ecologists.

    2. Reviewer #1 (Public review):

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night).

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important.

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data.

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons.

      Strengths

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology. Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts.

      Weaknesses

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion.

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369-372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards.

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.

      Other aspects<br /> (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced.<br /> (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319.

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear.

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).

    4. Author response:

      Reviewer #1 (Public review):  

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night). 

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important. 

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data. 

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species. 

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill as well as the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.  

      Reviewer #2 (Public review):  

      Summary: 

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons. 

      Strengths 

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology.

      Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts. 

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses 

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion. 

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on 

      We would like to thank Reviewer 2 for pointing out that the experimental design and the rationale behind it are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We think that the suggestion to include a schematic figure early in the manuscript is excellent and we plan to implement this in a revised version of the manuscript.  

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards. 

      We believe that there is a slight misunderstanding in the way that what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks that are caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking). 

      A higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards directed swimming but also could mean a horizontal increase in activity, for example representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (Fig. 2), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset as well as horizontal directed swimming for feeding and foraging throughout the night.

      We will formulate the description of the activity metric more clearly in the revised version of the manuscript.

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.  

      We agree that this part is not directly related to the data presented in the manuscript and will therefore omit this part in the revised version of the manuscript to keep the discussion concise and focused on the results. 

      Other aspects 

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced. 

      We thank the Reviewer for pointing this out and will provide an explanation for the term “bimodal swimming” in a revised version of the manuscript. 

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319. 

      We would like to thank the Reviewer for pointing this out and agree that it would be interesting to add the idea of an endogenous control of midnight sinking to the discussion. We plan to implement this in a revised version of the manuscript. 

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear. 

      In our study we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24 h cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which is in line with our findings at the individual level. We will revisit the mentioned section for more clarity in a revised version.   

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).  

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations and the fishery is actively targeting E. superba and monitoring their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2018), and fishing operations would stop if non-target species were being caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that the backscattering signal shown in Figure 5 is predominantly caused by E. superba. 

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four different seasons (namely S1-S4), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments. 

      We will include these aspects in the Methods section in a revised version of the manuscript in order to improve understanding.

    1. eLife Assessment

      In this study, the authors present compelling data illustrating a potential mechanism for a hitherto not described form of extracellular vesicle biogenesis. Their model suggests that small extracellular vesicles are secreted from cells within larger vesicles, termed amphiectosomes, which subsequently rupture to release their smaller vesicle contents. This discovery represents an important advancement in the field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implicates that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      In their revised version, the authors have addressed the majority of my criticisms. I have no further concerns regarding this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      authors had previously identified that a colorectal cancer cell line generates small extracellular vesicles (sEVs) via a mechanism where a larger intracellular compartment containing these sEVs is secreted from the surface of the cell and then tears to release its contents. Previous studies had suggested that intraluminal vesicles (ILVs) inside endosomal multivesicular bodies and amphisomes can be secreted by fusion of the compartment with the plasma membrane. The 'torn bag mechanism' considered in this manuscript is distinctly different, because it involves initial budding off of a plasma membrane-enclosed compartment (called the amphiectosome in this manuscript, or MV-lEV). The authors successfully set out to investigate whether this mechanism is common to many cell types and to determine some of the subcellular processes involved.

      The strengths of the study are:

      (1) The high-quality imaging approaches used, including live-cell imaging and EN, which seem to show good examples of the proposed mechanism.<br /> (2) They screen several cell lines for these structures, also search for similar structures in vivo, and show the tearing process by real-time imaging.<br /> (3) Regarding the intracellular mechanisms of ILV production, the authors also try to demonstrate the different stages of amphiectosome production and differently labelled ILVs using immuno-EM.

      Several of the techniques employed are technically challenging to do well, and so these are critical strengths of the manuscript.

      Overall, I think the authors have been successful in identifying amphiectosomes secreted from multiple cell lines and cells in vivo, and in demonstrating that the ILVs inside them have at least two origins (autophagosome membrane and late endosomal multivesicular body) based on the markers that they carry. Inevitably, it remains unclear how universal this mechanism is in vivo and its overall contribution to EV function.<br /> I think there could be a significant impact on the EV field and consequently on our understanding of cell-cell signalling based on these findings. It will flag the importance of investigating the release of amphiectosomes in other studies, especially as the molecular mechanisms involved in this type of 'ectosomal-style' release will be different from multivesicular compartment fusion to the plasma membrane and should be possible to be manipulated independently.<br /> In general, the EV field has struggled to link up analysis of the subcellular biology of sEV secretion and the biochemical/physical analysis of the sEVs themselves, so from that perspective, the manuscript provides a novel angle on this problem.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometer.

      Reviewer #1: These papers evaluated the effectiveness of various methods to eliminate EVs from FBS, emphasizing the challenges associated with the presence of EVs in FBS. They also caution against using FBS in EV studies due to these issues. However, I did not find a clear indication regarding the size distributions of EVs in FBS in these papers.

      Please provide accurate reference supporting the claim that 'lEVs > 350-500 nm are negligible in FBS.' The papers cited by the authors do not address this specific point.

      In the revised manuscript, we addressed the point that due to sterile filtering of FBS, it cannot contain large >0.22 µm EVs

      Our response to Reviewer #1 point 2. When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      Reviewer #1: This is an important point that is not mentioned in the original main text, figure legend or method. Please address.

      We agree and we apologize for it. We added this information to the revised manuscript.

      Our response to Reviewer #1 point 3. Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      Reviewer #1: These images may also depict the engulfment of EVs in FBS. Hence, it is crucial to utilize EV-free or EV-depleted FBS.

      As we mentioned earlier, we added the information to the revised manuscript that sterile filtering of the FBS presumably removed particles >0.22 µm EVs

      Our response to Reviewer #1 point 4. In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      Reviewer #1: I agree that these fluorescent-labeled assays conclusively indicate that the MV-lEVs are originating from the cells. However, the images of concerns are the non- fluorescent-labeled images in (Figure 1, 2P-S, S5, Figure 4 P-U and Figure 3). The MV-lEVs may derive from both the cells and FBS.

      Please see above our response to points 1-3.

      Our response to Reviewer #1 point 5. In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      Reviewer #1 If this is a concern, the authors should use EV-depletive FBS.

      As we discussed above, sterile filtration of FBS removes particles >0.22 µm. In addition, based on our preliminary experiments, EV-depleted serum may effect cell physiology. 

      Our response to Reviewer #1 point 6. Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      Reviewer #1This is a fair comment that needs to be included in the manuscript.

      As you suggested, this comment is now included in the revised manuscript

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      Reviewer #1 Please also justify the statement questioned in (3) as these arguments are interconnected.

      We hope you find our above responses to your comment acceptable.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      Reviewer #1 I have indicated that this statement is found in lines 104-106, where the authors argue, 'The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...' If the authors acknowledge the inaccuracy of this statement, please provide a justification for this argument.

      For clarity, we modified the description of data shown in Fig2 in the revised manuscript.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      Reviewer #1:Please do so.

      We carefully considered the suggestion, but we realized that it was not feasible for us to perform gene silencing in the case of all our used antibodies before resubmission of our revised manuscript. However, we repeated the Western blot for mouse anti-CD81 (Invitrogen MAA5-13548) and replaced the previous Western blot by it in the revised manuscript (Fig.2-S4H)

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3-S2B).

      Reviewer #1: The blots referenced here (Fig2-S3; Fig2-S4B; Fig3-S2B) were conducted using total cell lysates, not EV extracts. Only one blot in Fig3-S2B includes an actin control. All remaining blots should incorporate actin controls for consistency.

      Fig2-S3 (corresponding to Fig2-S4 in the revised manuscript) only shows reactivity of the used antibodies. This Western blot is not intended to serve as a basis of any quantitative conclusions. Fig2-S4 (corresponding to Fig2-S5 in the revised manuscript) includes the actin control. Fig3-S2B shows the complete membrane, which was cut into 4 pieces, and the immune reactivity of different antibodies was tested. The actin band was included on the anti-LC3B blot. For clarity, we rephrased the figure legend.

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2-S4, respectively.

      Reviewer #1: In the original Figure 2- S4B, the blots were sectioned into 12 pieces. If lanes "i," "ii," and "iii" were run on the same blot, the authors are advised to eliminate the grids between these lanes.

      Grids separating the lanes have been eliminated on Fig.2_S4 (now Fig.2_S5 in the revised manuscript).

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In Supplementary Figure 2-S4, we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      Reviewer #1: Does LC3RFP colocalize with MV-IFV markers in HEK293T-PalmGFP-LC3RFP cell line? This experiment aims to clarify the conclusion made in lines 104-106, where the authors assert that 'The concurrent existence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...'

      In the case of PalmGFP-LC3RFP cells, LC3-RFP is overexpressed. Simultaneous assessment of this overexpressed protein with non-overexpressed, fluorescent antibod-detected molecules proved to be challenging because of spectral overlaps and inappropriate signal-noise ratios. Furthermore, in association with EVs, the number of antibody-detected molecules is substantially lower than in cells. Therefore, even though we tried, we could not successfully perform these experiments.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2-S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO<sub>4</sub> 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2.-S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO<sub>4</sub> 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H<sub>2</sub>O<sub>2</sub> and NaBH<sub>4</sub> to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      Reviewer #1: Please include this justification in the revised version.

      We included this justification in the revised manuscript.

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      Reviewer #1: Please provide all the images.

      Original LASX files are provided (DOI: 10.6019/S-BIAD1456 ).

      Reviewer #1: The images raising concerns regarding the contamination of EVs in FBS primarily consist of transmission electron microscopy (TEM) images, namely, Figure 1, 2P-S, S5, and Figure 4 P-U, along with the quantification of EV numbers in Figure 3. These concerns persist despite the use of fluorescent-labeled experiments. While fluorescent-labeled MV-lEVs are conclusively identified as originating from the cells, the MV-lEVs observed in Figure 1, 2P-S, S5, and Figure 4 P-U and Figure 3 may derive from both the cells and FBS.

      Large EVs (with diameter >800 nm) derived from FBS were not present in our experiments, as discussed above.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

      The response of Reviewer #1: Please show these data in the revised manuscript. Moreover, do cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      We have added new confocal microscopic images to Fig2-S3 showing amphiectosomes released also by the H9c2 (ATCC) cardiomyoblast cell line. To preserve the ultrastructure of MV-lEVs in complex organs like kidney and liver, fixation with 4% glutaraldehyde with 1% OsO4 appears to be essential. This fixation does not allow for immune detection to assess LC3B and CD63 positive MV-lEVs in the ultrathin sections.

      Reviewer #2 (Public Review):

      Summary:

      The authors had previously identified that a colorectal cancer cell line generates small extracellular vesicles (sEVs) via a mechanism where a larger intracellular compartment containing these sEVs is secreted from the surface of the cell and then tears to release its contents. Previous studies have suggested that intraluminal vesicles (ILVs) inside endosomal multivesicular bodies and amphisomes can be secreted by the fusion of the compartment with the plasma membrane. The 'torn bag mechanism' considered in this manuscript is distinctly different because it involves initial budding off of a plasma membrane-enclosed compartment (called the amphiectosome in this manuscript, or MV-lEV). The authors successfully set out to investigate whether this mechanism is common to many cell types and to determine some of the subcellular processes involved.

      The strengths of the study are:

      (1) The high-quality imaging approaches used, seem to show good examples of the proposed mechanism.

      (2) They screen several cell lines for these structures, also search for similar structures in vivo, and show the tearing process by real-time imaging.

      (3) Regarding the intracellular mechanisms of ILV production, the authors also try to demonstrate the different stages of amphiectosome production and differently labelled ILVs using immuno-EM.

      Several of these techniques are technically challenging to do well, and so these are critical strengths of the manuscript.

      The weaknesses are:

      (1) Most of the analysis is undertaken with cell lines. In fact, all of the analysis involving the assessment of specific proteins associated with amphiectosomes and ILVs are performed in vitro, so it is unclear whether these processes are really mirrored in vivo. The images shown in vivo only demonstrate putative amphiectosomes in the circulation, which is perhaps surprising if they normally have a short half-life and would need to pass through an endothelium to reach the vessel lumen unless they were secreted by the endothelial cells themselves.

      Our previous results analyzing PFA-fixed, paraffin embedded sections of colorectal cancer patients provided direct evidence that MV-lEV secretion also occurs in humans in vivo (PMID: 31007874). Regarding your comment on the presence of amphiectosomes in the circulation despite their short half-lives, we would like to point out that Fig1.X shows a circulating lymphocyte which releases MV-lEV within the vessel lumen. Furthermore, in the revised manuscript, an additional Fig.1-S1 is provided. Here, we show the release of MV-lEVs both by an endothelial and a sub-endothelial cell (Fig.1-S1G). In addition, these images show the simultaneous presence of MV-lEVs and sEVs in the circulation (Fig.1-S1.A,C,D,H and I). The transmission electron micrographs of mouse kidney and liver sections provide additional evidence that the MV-lEVs are released by different types of cells, and the “torn bag release” also takes place in vivo (Fig.1.V).

      (2) The analysis of the intracellular formation of compartments involved in the secretion process (Figure 2-S5) relies on immuno-EM, which is generally less convincing than high-/super-resolution fluorescence microscopy because the immuno-labelling is inevitably very sporadic and patchy. High-quality EM is challenging for many labs (and seems to be done very well here), but high-/super-resolution fluorescence microscopy techniques are more commonly employed, and the study already shows that these techniques should be applicable to studying the intracellular trafficking processes.

      As you suggested, in the revised manuscript, we present additional super-resolution microscopy (STED) data. The intracellular formation of amphisomes, the fragmentation of LC3B-positive membranes and the formation of LC3B-positive ILVs were captured (Fig. 3B-F).

      (3) One aspect of the mechanism, which needs some consideration, is what happens to the amphisome membrane, once it has budded off inside the amphiectosome. In the fluorescence images, it seems to be disrupted, but presumably, this must happen after separation from the cell to avoid the release of ILVs inside the cell. There is an additional part of Figure 1 (Figure 1Y onwards), which does not seem to be discussed in the text (and should be), that alludes to amphiectosomes often having a double membrane.

      We agree with your comment regarding the amphisome membrane and we added a sentence to the Discussion of the revised manuscript. Fig1Y onwards is now discussed in the manuscript. In addition, we labelled the surface of living HEK293 cells with wheat germ agglutinin (WGA), which binds to sialic acid and N-acetyl-D-glucosamine. After removing the unbound WGA by washes, the cells were cultured for an additional 3 hours, and the release of amphiectosomes was studied. The budding amphiectosome had WGA positive membrane providing evidence that the external limiting membrane had a plasma membrane origin (Fig.3G)

      (4) The real-time analysis of the amphiectosome tearing mechanism seemed relatively slow to me (over three minutes), and if this has been observed multiple times, it would be helpful to know if this is typical or whether there is considerable variation.

      Thank you for this comment. In the revised manuscript, we highlight that the first released LC3 positive ILV was detected as early as within 40 sec.

      Overall, I think the authors have been successful in identifying amphiectosomes secreted from multiple cell lines and demonstrating that the ILVs inside them have at least two origins (autophagosome membrane and late endosomal multivesicular body) based on the markers that they carry. The analysis of intracellular compartments producing these structures is rather less convincing and it remains unclear what cells release these structures in vivo.

      I think there could be a significant impact on the EV field and consequently on our understanding of cell-cell signalling based on these findings. It will flag the importance of investigating the release of amphiectosomes in other studies, and although the authors do not discuss it, the molecular mechanisms involved in this type of 'ectosomal-style' release will be different from multivesicular compartment fusion to the plasma membrane and should be possible to be manipulated independently. Any experiments that demonstrate this would greatly strengthen the manuscript.

      We appreciate these comments of the reviewer. Experiments are on their way to elucidate the mechanism of the “ectosomal style” exosome release and will be the topic of our next publication.

      In general, the EV field has struggled to link up analysis of the subcellular biology of sEV secretion and the biochemical/physical analysis of the sEVs themselves, so from that perspective, the manuscript provides a novel angle on this problem.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors describe a novel mode of release of small extracellular vesicles. These small EVs are released via the rupture of the membrane of so-called amphiectosomes that resemble "morphologically" Multivesicular Bodies.

      These structures have been initially described by the authors as released by colorectal cancer cells (https://doi.org/10.1080/20013078.2019.1596668). In this manuscript, they provide experiments that allow us to generalize this process to other cells. In brief, amphiectosomes are likely released by ectocytosis of amphisomes that are formed by the fusion of multivesicular endosomes with autophagosomes. The authors propose that their model puts forward the hypothesis that LC3 positive vesicles are formed by "curling" of the autophagosomal membrane which then gives rise to an organelle where both CD63 and LC3 positive small EVs co-exist and would be released then by a budding mechanism at the cell surface that appears similar to the budding of microvesicles /ectosomes. Very correctly the authors make the distinction from migrasomes because these structures appear very similar in morphology.

      Strengths:

      The findings are interesting despite that it is unclear what would be the functional relevance of such a process and even how it could be induced. It points to a novel mode of release of extracellular vesicles.

      Weaknesses:

      This reviewer has comments and concerns concerning the interpretation of the data and the proposed model. In addition, in my opinion, some of the results in particular micrographs and immunoblots (even shown as supplementary data) are not of quality to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Highlight MV-IEV, ILV and limiting membrane in Figure-1G, N, and U.

      Based on the suggestion, we revised Figure1

      (2) Figure 1-Y-AF are not mentioned in the text.

      In the revised manuscript, we discuss Figure 1Y-AF

      (3) The term "IEVs" in Figure 2-S2 is not defined.

      We modified the figure legend: we changed MV-lEV to amphiectosome

      (4) Need to quantify co-localization in Figure 2-S2.

      As suggested, we carried out the co-localisation analysis (Fig2-S2I), and Fig2-S2 was re-edited

      Reviewer #2 (Recommendations For The Authors):

      I have two recommendations for improving the manuscript through additional experiments:

      (1) I think the description of the intracellular processes taking place in order to form amphiectosomes would be much stronger if some super-resolution imaging could be undertaken. This should label the different compartments before and after fusion with specific markers that highlight the protein signature of the different limiting and ILV membranes much more clearly than immuno-EM. It will also help in characterising the double-membrane structure of amphiectosomes at the point of budding and reveal whether the patchy labelling of the inner membrane emerges after amphiectosome release (the schematic model currently suggests that it happens before).

      Thank you for your suggestion. STED microscopy was applied and results are shown in new Fig3 and the schematic model was modified accordingly.

      (2) The implications of the manuscript would be more wide-ranging if the authors could test genetic manipulations that are believed to block exosome or ectosome release, eg. Rab27a or Arrdc1 knockdown. This may allow them to determine whether MV-lEVs can be released independently of the classical exosome release mechanism because they use a different route to be released from the plasma membrane. This experiment is not essential, but I think it would start to address the core regulatory mechanisms involved, and if successful, would easily allow the authors to determine the ratio of CD63-positive sEVs being secreted via classical versus amphiectosome routes.

      The suggestion is very valuable for us and these studies are being performed in a separate project.

      I think there are several other ways in which the manuscript could be improved to better explain some of the approaches, findings and interpretation:

      (1) Include some explanation in the text of certain key tools, particularly:

      a. Palm-GFP and whether its expression might alter the properties of the plasma membrane since this is used in a lot of experiments and is the only marker that seems to uniformly label the outer membrane of amphiectosomes. One concern might be that its expression drives amphiectosome secretion.

      We found evidence for amphiectosome release also in the case of several different cells not expressing Palm-GFP. We believe, this excludes the possibility that Palm-GFP expression is the inducer of the amphiectosome release. Both by fluorescent and electron microscopy, the Palm-GFP non expressing cells showed very similar MV-lEVs. In addition, in the case of non-transduced HEK293 and fluorescent WGA-binding, we made similar observations.

      b. Lactadherin - does this label the amphiectosomes after their release or does the wash-off step mean that it only labels cells, which subsequently release amphiectosomes?

      Lactadherin labels the amphiectosomes after their release and fixation. Living cells cannot be labelled by lactadherin as PS is absent in the external plasma membrane layer of living cells. We used WGA on HEK293 cells to further support the plasma membrane origin of the external membrane of amphiectosomes.

      (2) Explain the EM and confocal imaging approaches more clearly. Most importantly, is a 3D reconstruction always involved to confirm that 'separated' amphiectosomes are not joined to cells in another Z-plane.

      Thank you for your suggestion. We have modified the manuscript accordingly

      (3) Presenting triple-labelled images with red, green and yellow channels does not allow individual labelling to be determined without single-channel images and even then, it is much more informative to use three distinguishable colours that make a different colour with overlap, eg. CMY? Fig.2_S2D and E do not display individual channels, so definitely need to be changed.

      In case of Fig.2_S2D, we now show the individual channels, the earlier E image has been removed. In case of the STED images, CMY colors had been used, as you suggested.

      (4) Please discuss in the text the data in Figure 1Y onwards concerning single/double membranes on MV-lEVs.

      In the revised manuscript, we discuss the question on single/double membranes and we refer to Figure 1Y-AF

      (5) On line 162, reword 'intraluminal TSPAN4 only' to 'one in which TSPAN4 is only intraluminal' to make it clear that other proteins are also marking the intraluminal region, not TSPAN4 only.

      We modified the text accordingly.

      (6) Points for further discussion and further conclusions:

      a. In vivo experiments - discuss the limitations of this part of the analysis - it seems that none of the amphiectosome markers have been analysed in this part of the study and the MV-lEVs are only in the circulation.

      b. Can the authors give any further indication of the levels of MV-lEVs relative to free sEVs from any of their studies?

      Using our current approach, it is not possible to determine the levels of MV-lEVs to free sEV. Without analyzing serial ultrathin sections, determination of the relative ratio of MV-lEVs and sEVs would depend on the actual section plane. In future projects, we will determine the ratio of LC3 positive and negative sEVs by single EV analysis techniques (such as SP-IRIS). In the revised manuscript, additional TEM images are included to provide evidence for the simultaneous presence of sEVs and MV-lEVs and MV-lEVs both inside and outside of the circulation.

      c. Please discuss the single versus double membrane issue (relating to experiments proposed above).

      We discuss this question in more details in the revised manuscript.

      d. Please point out that the release mechanism (plasma membrane budding) will involve different molecular mechanisms to establish exosome release, and this might provide a route to determine relative importance.

      We are currently running a systemic analysis of the release mechanism of amphiectosomes, and this will be the topic of a separate manuscript.

      Reviewer #3 (Recommendations For The Authors):

      * The model is not supported.

      * The data is not of quality.

      * The appropriate methods are not exploited.

      We are sorry, we cannot respond to these unsupported critiques.

    1. eLife Assessment

      This important study showing that sleep deprivation increases functional synapses while depleting silent synapses supports previous findings that excitatory signaling increases during wakefulness. This manuscript focuses in particular on AMPA/NMDA ratios. An interesting, although speculative, aspect of the manuscript is the inclusion of a model for the accumulation of sleep needs that is based upon the MEF2C transcription factor but also links to the sleep-regulating SIK3-HDAC4/5 pathway. The authors have clarified some questions raised in the previous review, rendering this a solid piece of work that poses questions for future studies.

    2. Reviewer #2 (Public review):

      Summary:

      Here Vogt et al., provide new insights into the need for sleep and the molecular and physiological response to sleep loss. The authors expand on their previously published work (Bjorness et al., 2020) and draw from recent advances in the field to propose a neuron-centric molecular model for the accumulation and resolution of sleep need and basis of restorative sleep function. While speculative, the proposed model successfully links important observations in the field and provides a framework to stimulate further research and advances on the molecular basis of sleep function. In my review, I highlight the important advances of this current work, the clear merits of the proposed model, and indicate areas of the model that can serve to stimulate further investigation.

      Strengths: Reviewer comment on new data in Vogt et al., 2024<br /> Using classic slice electrophysiology, the authors conclude that wakefulness (sleep deprivation (SD)) drives a potentiation of excitatory glutamate synapses, mediated in large part by "un-silencing" of NMDAR-active synapses to AMPAR-active synapses. Using a modern single nuclear RNAseq approach the authors conclude that SD drives changes in gene expression primarily occurring in glutamatergic neurons. The two experiments combined highlight the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons. This view is entirely consistent with a large body of extant and emerging literature and provides important direction for future research.

      Consistent with prior work, wakefulness/SD drives an LTP-type potentiation of excitatory synaptic strength on principle cortical neurons. It has been proposed that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity. This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon by introducing the concept of silent synapses. The new data show that in mice well rested, a substantial number of synapses are "silent", containing an NMDAR component but not AMPARs. Silent synapses provide a type of reservoir for learning in that activity can drive the un-silencing, increasing the number of functional synapses. SD depletes this reservoir of silent synapses to essentially zero, explaining how SD can exhaust learning capacity. Recovery sleep led to restoration of silent synapses, explaining how recovery sleep can renew learning capacity. In their prior work (Bjorness et al., 2020) this group showed that SD drives an increase in mEPSC frequency onto these same cortical neurons, but without a clear change in pre-synaptic release probability, implying a change in the number of functional synapses. This prediction is now born out in this new dataset.

      The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies. First, this conclusion is corroborated by two independent, contemporary snRNAseq analysis recently published in iScience 2024 doi: 10.1016/j.isci.2024.110752 and Neuroscience Research 2024 https://doi.org/10.1016/j.neures.2024.03.004. A recently published analysis on the effects of SD in drosophila imaged synapses in every brain region in a cell-type dependent manner (Weiss et al., PNAS 2024), concluding that SD drives brain wide increases in synaptic strength almost exclusively in excitatory neurons. Further, Kim et al., Nature 2022, heavily cited in this work, show that the newly described SIK3-HDAC4/5 pathway promotes sleep depth via excitatory neurons and not inhibitory neurons.

      The new experiments provided in Fig1-3 are expertly conducted and presented. This reviewer has no comments of concern regarding the execution and conclusions of these experiments.

      Reviewer comment on model in Vogt et al., 2024

      To the view of this reviewer the new model proposed by Vogt et al., is an important contribution. The model is not definitively supported by new data, and in this regard should be viewed as a perspective, providing mechanistic links between recent molecular advances, while still leaving areas that need to be addressed in future work. New snRNAseq analysis indicates SD drives expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function. SD induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes. As pointed out by the authors, sleep problems are commonly reported in ASD, but the emphasis has been on sleep amount. This new analysis highlights the need to understand the impact on sleep's functional output (synapses) to fully understand the role of sleep problems in ASD.

      Importantly, SD induced gene expression in excitatory neurons overlap with genes regulated by the transcription factor MEF2C and HDAC4/5 (Fig. 4). In their prior work, the authors show loss of MEF2C in excitatory neurons abolished the SD transcriptional response and the functional recovery of synapses from SD by recovery sleep. Recent advances identified HDAC4/5 as major regulators of sleep depth and duration (in excitatory neurons) downstream of the recently identified sleep promoting kinase SIK3. In Zhou et al., and Kim et al., Nature 2022, both groups propose a model whereby "sleep-need" signals from the synapse activate SIK3, which phosphorylates HDAC4/5, driving cytoplasmic targeting, allowing for the de-repression and transcriptional activation of "sleep genes". Prior work shows that HDAC4/5 are repressors of MEF2C. Therefore, the "sleep genes" derepressed by HDAC4/5 may be the same genes activated in response to SD by MEF2C. The new model thereby extends the signaling of sleep need at synapses (through SIK3-HDAC4/5) to the functional output of synaptic recovery by expression of synaptic/sleep genes by MEF2C. The model thereby links aspects of expression of sleep need with the resolution of sleep need by mediating sleep function: synapse renormalization.

      Weaknesses:

      Areas for further investigation.<br /> In the discussion section Vogt et al., explore the links between excitatory synapse strength, arguably the major target of "sleep function", and NREM slow-wave activity (SWA), the most established marker of sleep need. SIK3-HDAC4/5 have major effects on the "depth" of sleep by regulation NREM-SWA. The effects of MEF2C loss of function on NREM SWA activity are less obvious, but clearly impact the recovery of glutamatergic synapses from SD. The authors point out how adenosine signaling is well established as a mediator of SWA, but the links with adenosine and glutamatergic strength are far from clear. The mechanistic links between SIK3/HDAC4/5, adenosine signaling, and MEF2C, are far from understood. Therefore, the molecular/mechanistic links between a synaptic basis of sleep need and resolution with NREM-SWA activity requires further investigation.

      Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity. The authors point out that constitutively nuclear (cn) HDAC4/5 (acting as a repressor) will mimic MEF2C loss of function. This is reasonable, however, there are notable differences in the reported phenotypes of each. Notably, cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022). Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020). These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes. Likely HDAC4/5 have functionally important interactions with other transcription factors, and likewise for MEF2C, suggesting areas for future analysis.

      One emerging theme may be that the SIK3-HDAC4/5 axis are major regulators of the sleep state, perhaps stabilizing the NREM state once the transition from wakefulness occurs. MEF2C is less involved in regulating sleep per se, and more involved in executing sleep function, by promoting the restorative synaptic modifications to resolve sleep need.

      Finally, advances in the roles of the respective SIK3-HDAC4/5 and MEF2C pathways point towards transcription of "sleep genes", as clearly indicated in the model of Fig.4. Clearly more work is needed to understand how the expression of such genes ultimately lead to resolution of sleep need by functional changes at synapses. What are these sleep genes and how do they mechanistically resolve sleep need? Thus, the current work provides a mechanistic framework to stimulate further advances in understanding the molecular basis for sleep need and the restorative basis of sleep function.

      Comments on revisions:

      No further comments or concerns. I believe that the manuscript has been suitably revised, and the concerns raised by reviewers have been addressed. I am completely satisfied by the revisions and responses provided by the authors.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important study showing that sleep deprivation increases functional synapses while depleting silent synapses supports previous findings that excitatory signaling increases during wakefulness. This manuscript focuses in particular on AMPA/NMDA ratios. An interesting, although speculative, aspect of the manuscript is the inclusion of a model for the accumulation of sleep need that is based upon the MEF2C transcription factor but also links to the sleep-regulating SIK3-HDAC4/5 pathway. The authors have clarified some questions raised in the previous review, but the evidence for major claims was still found to be incomplete, requiring additional experimentation.

      The major claims of this study are: 1) SD increases the AMPA/NMDA receptor ratio and RS restores it; 2) SD decreases silent synapses compared to CS and RS restores their number after SD; 3) the majority of SD-induced DEGs are found in ExIT cells (glutamate pyramidal neurons projecting within the telencephalon); 4) ExIT SD-induced DEGs are enriched for genes encoding synaptic shaping components and for autism spectrum disorder risk and; 5) these DEGs are also enriched for DEGs induced by Mef2c loss of function restricted to forebrain glutamate neurons (ExIT cells comprise a subset of these) and by over-expression of constitutively nuclear HDAC4 that represses MEF2c transcriptional function. The last claim is consistent with an intracellular signaling model (presented as a hypothesis to be tested, in figure 4B).

      [The above is added to the start of the discussion section.]

      The specific claims are supported by solid evidence provided in this manuscript. The statistical support is now more clearly presented, with several changes in response to queries by reviewer 1.

      The technical issues raised by reviewer 1 do not detract from the claims, thus supported. The rationale for this assessment is expanded below in response to reviewer 1.

      Summary:

      This manuscript by Vogt et al examines how the synaptic composition of AMPA and NMDA receptors changes over sleep and wake states. The authors perform whole-cell patch clamp recordings to quantify changes in silent synapse number across conditions of spontaneous sleep, sleep deprivation, and recovery sleep after deprivation. They also perform single nucleus RNAseq to identify transcriptional changes related to AMPA/NMDA receptor composition following spontaneous sleep and sleep deprivation. The findings of this study are consistent with a decrease in silent synapse number during wakefulness and an increase during sleep. However, these changes cannot be conclusively linked to sleep/wake states. Measurements were performed in motor cortex, and sleep deprivation was achieved by forced locomotion, raising the possibility that recent patterns of neuronal activity, rather than sleep/wake states, are responsible for the observed results.

      Strengths:

      This study examines an important question. Glutamatergic synaptic transmission has been a focus of studies in the sleep field, but AMPA receptor function has been the primary target of these studies. Silent synapses, which contain NMDA receptors but lack AMPA receptors, have important functional consequences for the brain. Exploring the role of sleep in regulating silent synapse number is important to understanding the role of sleep in brain function. The electrophysiological approach of measuring the failure rate ratio, supported by AMPA/NMDA ratio measurements, is a rigorous tool to evaluate silent synapse number.

      The authors also perform snRNAseq to identify genes differentially expressed in the spontaneous sleep and sleep deprivation groups. This analysis reveals an intriguing pattern of upregulated genes controlled by HDAC4 and Mef2c, along with synaptic shaping component genes and genes associated with autism spectrum disorder, across cell types in the sleep deprivation group. This unbiased approach identifies candidate genes for follow-up studies. The finding that ASD-risk genes are differentially expressed during SD also raises the intriguing possibility that normal sleep function is disrupted in ASD.

      Weaknesses:

      A major consideration to the interpretation of this study is the use of forced locomotion for sleep deprivation. Measurements are made from motor cortex, and therefore the effects observed could be due to differences in motor activity patterns across groups, rather than lack of sleep per se.

      Experimentally induced lack of sleep always involves differences in motor activity. As previously noted in revision 1, motor learning is unlikely to occur in this paradigm and inspection of the video (in supplementary materials) shows no repetitive motor behavioral sequences during the sleep deprivation, nor can this be considered exercise due to the very slow speed of treadmill movement employed. The obvious major difference between groups is a lack of sleep per se. (See below in the “Recommendations for authors”, reviewer 1 for comments on localized wake activity inducing localized sleep-need responses)

      Considering that other groups have failed to find a difference in AMPA/NMDA ratio in mice with different spontaneous sleep/wake histories (Bridi et al., Neuron 2020), confirmation of these findings in a different brain region would greatly strengthen the study.

      The study of Bridi et al., Neuron 2020, is not comparable to our study for several important reasons. First, their compared groups were from different circadian phases (180 degrees out of phase), whereas in our study, the circadian times for each group were matched (ZT=6hours). Second, experimentally induced sleep loss did not occur whereas it was a focus of our study. Third, spontaneous sleep/wake cannot be accurately matched amongst subjects whereas in our study, sleep loss was matched exactly between groups.

      We agree that assessment of AMPA/NMDA ratio and silent synapse number in sleep deprived compared to ad libitum sleep in other areas of the neocortex is of great interest and something we hope to pursue. It would not be surprising to find differences as preliminarily reported by Bahl, et al., Nat Commun. 2024 Jan 26;15(1):779. However, such data would not further strengthen our already well supported evidence for the differences we report in the motor cortex.

      The electrophysiological measurements and statistical analyses raise several questions. Input resistance (cutoffs and actual values) are not provided, making it difficult to assess recording quality.

      As stated in our first reply, these data were omitted (an admitted oversight on our part) but are now supplied in the methods section as, “Series resistance values for the recording pipette ranged between 8 and 15 MOhm and experiments with changes larger than 25% were not used for further analyses”. We have now also added the Rs/Rm (as a separate column) for each recorded neuron in table 1.

      Parametric one-way ANOVAs were used, although the data do not appear to be normally distributed.

      We have now removed all the One-way ANOVA tests for clarity (non-parametric tests were previously supplied in addition to the one-way ANOVA tests). Determination of significance with Kruskal-Wallis non-parametric test has not altered statistical support for our conclusions.

      Reviewer 1 correctly points out that we had not tested for normality of our distributions- the distributions are likely to be normal but the sample size is too small to confidently make this call  for the ratio data which is why we removed the one-way ANOVA’s entirely from table 1.

      Two-way ANOVA’s are used to assess AMPA and EPSC amplitudes and failure rates (table 1 tab 2&5)  across sleep conditions. As now indicated (table 1, tab 2&5), the distributions of AMPA and NMDA amplitudes and FRs passed the D'Agostino & Pearson test for normality and QQ plots provide illustration supporting this claim.

      In addition, for the AMPA/NMDA and FRR measurements (Figures 1E, F), the SD group (rather than the control sleep group) was used as the control group for post-hoc comparisons, but it is unclear why.

      The label of “control group” is arbitrary. CS and RS groups are similar (sleep density for RS>CS as expected).  Since this appears to be confusing, we now compare all groups to one another in table 1 with the same statistical outcome (additional comparison of CS to RS).

      While the data appear in line with the authors' conclusions, the number of mice (3/group) and cells recorded is low, and adding more would better account for inter-animal variability and increase the robustness of the findings.

      Of course, the larger the sample, the better the approximation to the population. Our sample sizes yielded significant differences at the usual p<=0.05 threshold with non-parametric testing. A larger sample size could allow for normality testing of the distributions of the data, but fortunately, this was not necessary to support our conclusions.

      The snRNAseq data are intriguing. However, several genes relevant to the AMPA/NMDA ratio are mentioned, but the encoded proteins would be expected to have variable effects on AMPA/NMDA receptor trafficking and function, making the model presented in Figure 4C oversimplified. A more thorough discussion of the candidate genes and pathways that are upregulated during sleep deprivation, the spatiotemporal/posttranslational control of protein expression, and their effects on AMPA/NMDA trafficking vs function is warranted.

      We have not studied the candidate genes at this point and do not yet understand their potential role(s) in sleep-related AMPA/NMDA functional ratio, only that their expression levels are altered with sleep condition. We agree with the reviewer that the data are intriguing and in need of further investigation. An important first step that can help direct such studies is the identification and preliminary characterization of good candidate genes with respect their cell type specificity, significance and fold change as we have done. Their potential roles likely depend on “the spatiotemporal/posttranslational control” and other factors as reviewer 1 notes.

      Reviewer #2 (Public review):

      Here Vogt et al., provide new insights into the need for sleep and the molecular and physiological response to sleep loss. The authors expand on their previously published work (Bjorness et al., 2020) and draw from recent advances in the field to propose a neuron-centric molecular model for the accumulation and resolution of sleep need and basis of restorative sleep function. While speculative, the proposed model successfully links important observations in the field and provides a framework to stimulate further research and advances on the molecular basis of sleep function. In my review, I highlight the important advances of this current work, the clear merits of the proposed model, and indicate areas of the model that can serve to stimulate further investigation.

      Strengths:

      Reviewer comment on new data in Vogt et al., 2024

      Using classic slice electrophysiology, the authors conclude that wakefulness (sleep deprivation (SD)) drives a potentiation of excitatory glutamate synapses, mediated in large part by "un-silencing" of NMDAR-active synapses to AMPAR-active synapses. Using a modern single nuclear RNAseq approach the authors conclude that SD drives changes in gene expression primarily occurring in glutamatergic neurons. The two experiments combined highlight the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons. This view is entirely consistent with a large body of extant and emerging literature and provides important direction for future research.

      Consistent with prior work, wakefulness/SD drives an LTP-type potentiation of excitatory synaptic strength on principle cortical neurons. It has been proposed that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity. This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon by introducing the concept of silent synapses. The new data show that in mice well rested, a substantial number of synapses are "silent", containing an NMDAR component but not AMPARs. Silent synapses provide a type of reservoir for learning in that activity can drive the un-silencing, increasing the number of functional synapses. SD depletes this reservoir of silent synapses to essentially zero, explaining how SD can exhaust learning capacity. Recovery sleep led to restoration of silent synapses, explaining how recovery sleep can renew learning capacity. In their prior work (Bjorness et al., 2020) this group showed that SD drives an increase in mEPSC frequency onto these same cortical neurons, but without a clear change in pre-synaptic release probability, implying a change in the number of functional synapses. This prediction is now born out in this new dataset.

      The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies. First, this conclusion is corroborated by an independent, contemporary snRNAseq analysis recently available as a pre-print (Ford et al., 2023 BioRxiv https://doi.org/10.1101/2023.11.28.569011). A recently published analysis on the effects of SD in drosophila imaged synapses in every brain region in a cell-type dependent manner (Weiss et al., PNAS 2024), concluding that SD drives brain wide increases in synaptic strength almost exclusively in excitatory neurons. Further, Kim et al., Nature 2022, heavily cited in this work, show that the newly described SIK3-HDAC4/5 pathway promotes sleep depth via excitatory neurons and not inhibitory neurons.

      The new experiments provided in Fig1-3 are expertly conducted and presented. This reviewer has no comments of concern regarding the execution and conclusions of these experiments.

      Reviewer comment on model in Vogt et al., 2024

      To the view of this reviewer the new model proposed by Vogt et al., is an important contribution. The model is not definitively supported by new data, and in this regard should be viewed as a perspective, providing mechanistic links between recent molecular advances, while still leaving areas that need to be addressed in future work. New snRNAseq analysis indicates SD drives expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function. SD induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes. As pointed out by the authors, sleep problems are commonly reported in ASD, but the emphasis has been on sleep amount. This new analysis highlights the need to understand the impact on sleep's functional output (synapses) to fully understand the role of sleep problems in ASD.

      Importantly, SD induced gene expression in excitatory neurons overlap with genes regulated by the transcription factor MEF2C and HDAC4/5 (Fig. 4). In their prior work, the authors show loss of MEF2C in excitatory neurons abolished the SD transcriptional response and the functional recovery of synapses from SD by recovery sleep. Recent advances identified HDAC4/5 as major regulators of sleep depth and duration (in excitatory neurons) downstream of the recently identified sleep promoting kinase SIK3. In Zhou et al., and Kim et al., Nature 2022, both groups propose a model whereby "sleep-need" signals from the synapse activate SIK3, which phosphorylates HDAC4/5, driving cytoplasmic targeting, allowing for the de-repression and transcriptional activation of "sleep genes". Prior work shows that HDAC4/5 are repressors of MEF2C. Therefore, the "sleep genes" derepressed by HDAC4/5 may be the same genes activated in response to SD by MEF2C. The new model thereby extends the signaling of sleep need at synapses (through SIK3-HDAC4/5) to the functional output of synaptic recovery by expression of synaptic/sleep genes by MEF2C. The model thereby links aspects of expression of sleep need with the resolution of sleep need by mediating sleep function: synapse renormalization.

      Weaknesses:

      Areas for further investigation.

      In the discussion section Vogt et al., explore the links between excitatory synapse strength, arguably the major target of "sleep function", and NREM slow-wave activity (SWA), the most established marker of sleep need. SIK3-HDAC4/5 have major effects on the "depth" of sleep by regulating NREM-SWA. The effects of MEF2C loss of function on NREM SWA activity are less obvious, but clearly impact the recovery of glutamatergic synapses from SD. The authors point out how adenosine signaling is well established as a mediator of SWA, but the links with adenosine and glutamatergic strength are far from clear. The mechanistic links between SIK3/HDAC4/5, adenosine signaling, and MEF2C, are far from understood. Therefore, the molecular/mechanistic links between a synaptic basis of sleep need and resolution with NREM-SWA activity require further investigation.

      Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity. The authors point out that constitutively nuclear (cn) HDAC4/5 (acting as a repressor) will mimic MEF2C loss of function. This is reasonable, however, there are notable differences in the reported phenotypes of each. Notably, cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022).

      We speculate that the effect of cnHDAC4/5 to reduce NREM-SWA together with the reduction of NREM amount may be due to a localized increase in neuronal excitability of arousal centers, which would be expected to mask NREM-SWA. Rebound NREM-SWA may reflect the relative rebound increase of NREM-SWA still present under chronic masking conditions (induced by cnHDAC4/5) of increased arousal system excitability. A similar effect to overcome NREM-SWA masking was reported in a Kcna2 KO mouse (a Shaker homologue) by Douglas, et al. (2007, BMC Biol).

      Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020). These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes. Likely HDAC4/5 have functionally important interactions with other transcription factors, and likewise for MEF2C, suggesting areas for future analysis.

      This is not a surprising outcome since both MEF2c and HDAC4/5 are transcription factors whose function(s) are determined by multiple other factors a subset of which are relevant to sleep conditions while other determining factors are not necessarily relevant to sleep. These factors can include their phosphorylation state, genomic accessibility, and interaction with other transcription factors. All these other factors are known to be both cell type specific and determined by intracellular conditions, that in turn, are affected by extracellular conditions and ligands. We certainly agree there is much future analysis needed.

      One emerging theme may be that the SIK3-HDAC4/5 axis are major regulators of the sleep state, perhaps stabilizing the NREM state once the transition from wakefulness occurs. MEF2C is less involved in regulating sleep per se, and more involved in executing sleep function, by promoting restorative synaptic modifications to resolve sleep need.

      A useful way to restate the above might be to distinguish between control of arousal levels determining the behavioral states, wake or sleep (including REM sleep) and control of sleep function. The term, sleep, is typically used to describe the behavioral state of sleep that acts as a permissive gate to sleep function (that resolves sleep need). The sleep state should not be conflated with sleep function. There is abundant evidence that control of arousal can be dissociated from sleep need and sleep function.

      Finally, advances in the roles of the respective SIK3-HDAC4/5 and MEF2C pathways point towards transcription of "sleep genes", as clearly indicated in the model of Fig.4. Clearly more work is needed to understand how the expression of such genes ultimately lead to resolution of sleep need by functional changes at synapses.

      We are in full agreement. We also note the SIK3-HDAC4/5 pathway may have more than one role, i.e., to affect arousal centers to alter behavioral state and, more generally, to control MEF2c’s transcriptional activity thus controlling sleep-related, glutamate, synaptic phenotype.

      What are these sleep genes and how do they mechanistically resolve sleep need? Thus, the current work provides a mechanistic framework to stimulate further advances in understanding the molecular basis for sleep need and the restorative basis of sleep function.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) I appreciate the authors' thoughtful discussion of the use of forced locomotion for their sleep deprivation technique in their response, as well as the additional information that was provided regarding use of the treadmill in the manuscript. However, given that previous studies have failed to find a difference in AMPA/NMDA ratio following spontaneous sleep vs wake, confirmation of the findings in a non-motor brain region with the same SD technique (or confirmation within motor cortex with a different technique, although the authors correctly point out that other techniques also increase locomotor activity) would greatly strengthen the paper.

      Addressed above

      Notably, differences in motor activity patterns, not necessarily overall amount of locomotion, may induce differential synaptic changes between groups. This point at least warrants acknowledgement and discussion, but this has not been incorporated into the text of the manuscript.

      We will incorporate the following into the discussion:

      There is evidence that learning of a motor task  or experience of forced altered motor activity can result in localized increases in NREM (slow wave sleep)-slow wave activity (Huber R, Ghilardi MF, Massimini M, Tononi G. Local sleep and learning. Nature. 2004;430(6995):78-81); Huber et al., 2006) in the motor cortex. Since SWS-SWA is considered a marker for sleep homeostasis, the altered motor activity induced increase of SWS-SWA was considered evidence for sleep-related function. Our earlier work has clearly shown that the treadmill method of SD increases frontal cortical SWS-SWA rebound, indicating a sleep-homeostatic process (Bjorness et al., 2016; Bjorness et al., 2020). Furthermore, we have also shown that this means of experimental SD causes similar glutamate synaptic changes as those observed using other means of SD like gentle handling (Liu, et al., JoNS 2010).

      (2) The number of mice and cells used for electrophysiology in this study remains low; more animals should be included to account for inter-animal variability.

      For this study, increasing the number of mice and cells will have p<0.05 chance of altering our conclusions by rejecting the null hypotheses of the electrophysiology findings.

      (3) The additional methodological information provided allays some of my concerns regarding the electrophysiological data. However, information about the input resistance (cutoffs used and/or actual values) is still not provided, which is important for assessing recording quality.

      We have now supplied the experimentally determined input resistance for each neuron used in this study (a separate column in table 1, tabs marked, “data”).

      (4) It is not meaningful to compare raw AMPA or NMDA responses because stimulus electrode placement will differ between cells, potentially activating different numbers of afferents. Presenting these comparisons (Figure 1C) has the potential to mislead the reader.

      This is not misleading (it didn’t mislead reviewer 1) as we described the conditions. As expected by reviewer 1, the variability using “raw AMPA or NMDA responses…” was too great, but did indicate an interaction between receptor responses and sleep condition. This provided (as stated in the results section) rationale to examine, and to only draw conclusions from the AMPA/NMDA amplitude and FR ratios.

      (5) I appreciate clarification on the statistics and the authors' response has answered some of my questions. However, this also raises additional questions. What test was used to determine normality (and therefore whether to perform a parametric vs nonparametrictest)?

      Described above.

      Why was the FRR data analysis changed to a parametric test, when it does not appear that the data are normally distributed?

      Showing the parametric test was a mistake on our part- there are not enough samples to conclusively conclude the distributions are normal as reviewer 1 correctly suspects. However, the non-parametric Kruskal-Wallis tests that we also show  in table 1 indicate significant differences between conditions and the non-parametric, two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli, indicates significant differences between CS-SD and RS-SD but not for CS-RS, supporting our conclusions. The (unsupported) parametric tests are now removed in Table 1 leaving behind the non-parametric test.

      Why were post-hoc tests chosen to compare to a control group rather than all pairwise comparisons,

      We now provide post-hoc all-pairwise comparisons to give the same results using the BKY analysis.

      and why was the SD rather than CS group used as the control in Figures 1E and F?

      Why were different post-hoc tests chosen for the data in Figures 1E, F?

      There was no need for this and we now, only show statistics that are used to draw our conclusions for the AMPA/NMDA EPSC ratios data shown in Figure 1E and Failure Rate Ratios data shown in Figure 1F (the conclusions are supported by the non-parametric post-hoc test and remain unchanged).

      (6) Genes in the SSC, ASD, Mef2cKO, and HD4cn categories are almost exclusively upregulated in the SD group compared to the CS group (Figure 4A). As the authors point out in their response, "No claim of mechanism linking the changed expression to altered AMPAR or NMDAR activity can be made at this point," largely due to the fact that we do not know the spatiotemporal or posttranslational modification patterns of the translated proteins, and how they affect receptor trafficking vs function. This is in agreement with my original point: as written (and as illustrated in Figure 4C), the manuscript implies that upregulation during SD increases the AMPA/NMDA ratio via receptor trafficking,

      The model indicates a likely (but not necessarily exclusive) role for AMPA/NMDA trafficking to explain the functional electrophysiological data that we do report and which is not in dispute. The SSC-DEGs in ExIT cells are consistent with sleep-altered AMPA/NMDA trafficking but remain only a correlation. However, the point is taken and Figure 4c has been revised to only reflect what we have observed electrophysiologically and the speculated mechanism(s) mediated by observed SSC-DEGs are illustrated with “?’s”.

      while in reality the picture is likely much more complicated, and therefore a more thorough discussion is warranted. Some discussion was provided in the authors' response but does not appear to have been incorporated into the text or Figure 4C.

      As indicated above the proposed model is changed in Figure 4c to more explicitly indicate which aspects reflect our electrophysiological data and which aspects reflect only an association of observations. 

      Minor comments:

      (1) Please justify only using male mice

      We had to start somewhere with our limited resources. Our intentions are to follow up with similar experiments using female mice, should funding be realized.

      (2) The model in Figure 4C is oversimplified and remains problematic, for the reasons stated in comment #6, above.

      See responses above.

      (3) Figure 4D remains confusing

      We agree. The unnecessary addition of adenosine effects on cholinergic arousal centers (experimentally well supported), have been removed from the figure to provide a more focused indication of how SWS-SWA can be related to either MEF2c and/or to ADORA1 activation through reduction of glutamate synaptic strength. ADORA1 activation elicits reduced glutamate synaptic activity through pre- and postsynaptic inhibition whereas MEF2c activation is essential to reduce sleep elicited, glutamate EPSC reduction. Reduced glutamate synaptic strength, whatever the cause, is associated with increased SWS-SWA.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study by Aguirre-Botero et al. shows the dynamics of 3D11 anti-CSP monoclonal antibody (mAb) mediated elimination of rodent malaria Plasmodium berghei (Pb) parasites in the liver. The authors show that the anti-CSP mAb could protect against intravenous (i.v.) Pb sporozoite challenge along with the cutaneous challenge, but requires higher concentration of antibody. Importantly, the study shows that the anti-CSP mAb not only affects sporozoite motility, sinusoidal extravasation, and cell invasion but also partially impairs the intracellular development inside the liver parenchyma, indicating a late effect of this antibody during liver stage development. While the study is interesting and conducted well, the only novel yet very important observation made in this manuscript is the effect of the anti-CSP mAb on liver stage development.

      Major

      This observation is highlighted in the manuscript title but is supported by only limited data. A such it needs to be substantiated and a mechanism should be investigated.  The phenomenon of intracellular effects of the anti-CSP mAb should be analyzed in much more detail. For example, can the authors demonstrate uptake of the Ab together with the parasite during hepatocyte invasion? What cellular mechanism leads to elimination?

      Lines 234 - 243; 308 - 325: These results are the gist of the entire study and also defined the title of the manuscript. Thus, it would be pre-mature to claim the substantial effect of 3D11 antibody in late killing of the parasite in the infected hepatocytes just by looking at the decreased GFP fluorescence. The authors need to at least verify the fitness of the liver stages by measuring the size of the developing parasites as well as using different parasite specific markers (UIS4, MSP1, HSP70 etc.) in immunofluorescence assays on the infected liver sections and in vitro infections. 

      We greatly appreciate the comments. We have taken the suggestions into consideration and deepened the characterization of 3D11's late killing of parasites. We first analyzed the presence of 3D11 in the intracellular parasite after the invasion and compared it with the CSP expression on the surface of control parasites (new Fig. 4F). Next, we tested a potential action of 3D11 added in the cell culture after the invasion (new Fig. 4G). The two new panels and the text accompanying them are shown below.

      “Post-invasion labeling of 3D11 bound to the membrane of intracellular parasites revealed a strong staining surrounding the parasite at 2 and 15h, but only punctual traces of 3D11 at 44h (Figure 4F, 3D11, 3D11). Of note, CSP was detected surrounding the control parasites at all time-points indicating that the lack of staining at 44h is not due to a decrease in the CSP amount on the parasite surface (Figure 4F, CSP, Control).  To evaluate the potential post-invasion entry of 3D11 into the PV of infected cells and posterior neutralization of intracellular parasites, we incubated invaded cells from 2 to 44 h with 3D11, but no effect on the parasite intracellular development was observed (Figure 4G, 2h p.i.). 3D11 incubated for 2 h with sporozoites and cells elicited, as expected, a dose-dependent inhibition of parasite development. Altogether, our results indicate that the late inhibition of parasite development is already achieved at 15h and likely caused by antibodies dragged inside cells bound to sporozoites before or during the invasion.”

      Finally, we better characterized the parasite loss of fitness caused by 3D11 in infected cells by quantifying the parasite size, GFP intensity and the presence and intensity of UIS4, a parasitophorous vacuole membrane developmental marker at 2, 4 and 44h as described below in the new figure 5 and accompanying text.

      “To further characterize the killing of intracellular parasites by 3D11 in HepG2 cells, we next evaluated the expression of the parasitophorous vacuole membrane (PVM) marker, UIS4 37, to infer the parasite intracellular development at 2, 4 and 44h. HepG2 cells were incubated with Pb-GFP expressing sporozoites in the absence (Control, Figure 5) or presence of 1.25 µg/mL of 3D11 during the first two hours of incubation (3D11, Figure 5). The chosen 3D11 concentration led to ~50% decrease in cell invasion (Figure 4C, 2h) and ~30% decrease in the post-invasion number of EEFs (Figure 4D), leaving enough parasites to be analyzed by microscopy. To distinguish between extracellular and intracellular parasites at 2h, washed and fixed samples were incubated with mouse 3D11 mAb (1µg/mL) and revealed with a fluorescent anti-mouse secondary antibody (Figure 5A, 3D11 in blue). Samples were then permeabilized and incubated with a goat anti-UIS4 polyclonal antibody revealed with a fluorescent anti-goat secondary antibody (Figure 5A, UIS4 in red). DNA was stained with Hoechst (Figure 5A, DNA in white).

      Extracellular GFP+ sporozoites were identified by their 3D11+UIS4- phenotype (Figure 5A, 2h, extracellular). Conversely, intracellular parasites were identified by their 3D11- phenotype and stained positive or negative for UIS4 (Figure 5A, 2h and 44h, intracellular). UIS4+ PVM is normally associated with a productive cell infection 37. However, a small number of EEFs can develop in the absence of UIS4 37, likely inside the host cell nucleus (Figure 5A, 44h, intranuclear).

      In the control and 3D11-treated groups, the percentage of intracellular UIS4- parasites decreased 2 to 3-fold from 2 to 44h, as expected of a parasite population negative for a marker of productive infection (Figure 5B). However, while at 2h in the control group, this population represented 14% of intracellular parasites, in the 3D11-treated group, it reached 48% (Figure 5B). This ~3-fold increase in the UIS4 negative population could explain the late killing of intracellular sporozoites by 3D11. Whether this population is constituted by intracellular transmigratory sporozoites lacking a PVM or parasites surrounded by a PVM, but incapable of secreting UIS4 still needs to be determined. At 44h, surviving EEFs in the 3D11-treated samples presented a similar area and UIS4 staining intensity than control parasites (Figure 5C, D). However, as observed by flow cytometry (Figure 4D), the GFP intensity of 3D11-treated parasites was significantly lower than control EEFs, indicating that 3D11 can somehow affect protein expression with undetermined effects in the genesis of red blood cell infecting stages.”

      Minor<br /> • Line 44 - 43: The statement is applicable only to the rodent infecting Plasmodium parasites. The authors need to clarify that.

      This is an important clarification. We have modified the text that now reads:

      “The sporozoite surface is covered by a dense coat of the circumsporozoite protein (CSP), shown to be an immunodominant protective antigen using a rodent malaria model”

      • Line 68: Replace the second 'against' after the CSP with 'of'.

      It is done.

      • Line 141 - 143: The 3D11 mAb does affect the homing and killing in the blood of cutaneous injected sporozoites. The authors need to clearly state that the statement is true only for i.v. injected sporozoites.

      Thank you for the comment. Now the text reads:

      “Altogether, these data indicate that 3D11 rather than having an early effect on i.v. inoculated sporozoites in the blood circulation, e.g. by inhibiting the homing or killing the parasite in the blood, requires more than 4 h to eliminate most parasites in the liver.”

      • Figure 3B: The numbers of sporozoites detected in the experiment varies from 0 h (line 172) to 2 h (line 184). Therefore, the numbers need to be mentioned on all the bars of each timepoint.

      We have now added the numbers at the top of the graph from Figure 3B.

      • Figure 3C: If the authors have used flk1-GFP mice, then how well they were able to detect the Pb-PfCSP GFP parasites in the vessel vs. parenchyma in the intravital imaging? The representative images for Pb-PfCSP GFP should also be included.

      Since 3D11 does not target PbPf parasites most of them are motile in the movies, making them easily distinguishable from the endothelial cells. In addition, the stronger GFP intensity of sporozoites makes them detectable in the sinusoids. Representative images were added in the new Figure S3.

      • It is not mentioned anywhere how the viability of the sporozoites was determined. This has to be described especially in the methods section.

      • Also, the flow acquisition and data analysis of the sporozoites and infected HepG2 cells must be described in the method section.

      We briefly mentioned it in the results (line 228- 230): “In addition, by comparing the total number of recovered GFP+ sporozoites at 2 h in the two studied conditions, we measured the early lethality (%viable sporozoites, Figure 4B) of the anti-CSP Ab on the extracellular forms of the parasite (Figure 4A).”

      A more detailed description has been added in the methods section that now reads:

      “After 2 h, the supernatant was collected, and the culture was washed 2x with 0.5 volume of PBS. The cells were subsequently trypsinized. The supernatant plus the washing steps and the trypsinized cells were analyzed by flow cytometry to quantify the amount of GFP+ events inside and outside cells (Figure 3A and Figure S4). Viability was then quantified by the sum of the total number of sporozoites (GPF+ events) in the supernatant, inside and outside the cells. We calculated the percentage of parasite viability by dividing the average of the total number of sporozoites in the treated samples by the average in controls using three technical replicates for each condition. Additionally, we quantified the percentage of infected cells using the total number of GFP+ events in the HepG2 gate (Figure S4). To compare the biological replicates, we further normalized to the control of each experiment. For the samples used to analyze parasite development, the cells were incubated for 15 or 44 h after sporozoite addition, and the medium was changed after 2 and 24 h. The cells were trypsinized and the percentage of intracellular parasites was determined by flow cytometry as described above (Figure S4). The prolonged effect between 2 h and 15/44 h was calculated by normalizing the percentage of infected cells at 15/44 h to that of 2 h. For all flow cytometry measurements, the same volume was acquired.”

      • Figure 4: The flow layouts should be included for at least comparing the 0 vs. 5 μg/ml of 3D11 mAb concentrations.

      Flow layouts were added in the supplementary figure 4.

      • Line 651 (Figure S1 legend): Typographical error '14'.

      Thank you for noticing. We corrected it.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Aguirre-Botero and collaborators report on the dynamics of Plasmodium parasite elimination in the liver using the 3D11 anti-CSP monoclonal antibody (mAb). By using microscopy and bioluminescence imaging in the P. berghei rodent malaria model, the authors first demonstrate that higher antibody concentrations are required for protection against intravenous sporozoite challenge, when compared to cutaneous challenge, which is not surprising. The study also shows that the 3D11 mAb reduces sporozoite motility, impairs hepatic sinusoidal barrier crossing, and more relevantly inhibits intracellular development of liver stages through its cytotoxic activity. These findings highlight the role of this specific monoclonal antibody, 3D11 mAb against CSP, in targeting sporozoites in the liver.
>

      Major Comments

      The study provides valuable insights into the mechanisms of protection conferred by the 3D11 anti-CSP monoclonal antibody against P. berghei sporozoites and this finding allow the field to speculate that other monoclonal antibodies against CSP of P. Falciparum may act similarly. However, an important experiment is missing that would significantly strengthen the conclusions. Specifically, the authors should perform experiments where the monoclonal antibody is added immediately after the sporozoites have completed invasion. This should be done both in vitro and in vivo to show whether the antibody has any effect on intracellular development of liver stages when added after invasion.

      While the claims are generally supported by the data presented, to comprehensively conclude the late cytotoxic effects of 3D11, the additional experiment of post-invasion antibody application is relevant. This would help determine if the observed effects are due to the antibody's action during invasion or its continued action post-invasion.

      The data and methods are presented in a manner that allows for reproducibility. The use of microscopy and bioluminescence imaging is well-documented. The experiments appear adequately replicated, and statistical analyses are appropriate.

      We thank reviewer 2 for these important suggestions. To be sure that the effect might not come from the internalization of the antibodies after sporozoite invasion, we tested the amount of 3D11 bound to the parasite following invasion (new Fig. 4F) and the potential post-invasion neutralizing effect of 3D11 in vitro. The results obtained are presented below.

      “Post-invasion labeling of 3D11 bound to the membrane of intracellular parasites revealed a strong staining surrounding the parasite at 2 and 15h, but only punctual traces of 3D11 at 44h (Figure 4F, 3D11, 3D11). Of note, CSP was detected surrounding the control parasites at all time-points indicating that the lack of staining at 44h is not due to a decrease in the CSP amount on the parasite surface (Figure 4F, CSP, Control).  To evaluate the potential post-invasion entry of 3D11 into the PV of infected cells and posterior neutralization of intracellular parasites, we incubated invaded cells from 2 to 44 h with 3D11, but no effect on the parasite intracellular development was observed (Figure 4G, 2h p.i.). 3D11 incubated for 2 h with sporozoites and cells elicited, as expected, a dose-dependent inhibition of parasite development. Altogether, our results indicate that the late inhibition of parasite development is already achieved at 15h and likely caused by antibodies dragged inside cells bound to sporozoites before or during the invasion.”

      Minor Comments

      The text and figures are clear and accurate. Some minor typographical errors should be corrected.

      Thank you for the remark; we have verified the text again to remove typographical errors.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Aguirre-Botero et al have studied the effect of a potent monoclonal antibody against the circumsporozoite protein, the major surface protein of the malaria sporozoite. This is an elegantly designed, performed, and analyzed study. They have efficiently delineated the mode of action of anti-CSP repeat mAb and confirmed previous in vitro work (not cited) that demonstrated the same intracellular effect. 

      Specific comments

      Line 51: The authors claim a correlation between high antibody levels and protection. However, they did not provide direct proof that these antibodies were responsible for protection, nor did they establish a cut-off level of anti-CSP antibodies that would distinguish between protected and unprotected individuals.

      We thank reviewer 3 for the comments. Indeed, we agree with reviewer 3, these are correlative studies where the causality cannot be established. We modified the ensuing sentence to specify the causality between anti-CSP mAbs and in vivo protection against sporozoite infection. Now the text reads:

      “Extensive research has demonstrated a positive correlation between high levels of anti-CSP antibodies (Abs) induced by the RTS,S/AS01 vaccine and efficacy against malaria(11-13). Remarkably, anti-CSP monoclonal Abs (mAbs) have been proven to protect in vivo against malaria in various experimental settings, including, mice(14-21), monkeys(23), and humans(24-26)”

      Line 326: The late intrahepatic effect of mAb against the CSP repeat has been previously reported (see Figure 2, Nudelman et al, J Immunol, 1989). The effect was shown to affect the transition from liver trophozoites to liver schizonts. This study should be cited and discussed.

      Thank you for this important remark. We included this seminal reference and now the modified text reads:

      “Notably, a similar effect has been previously reported using sera from mice immunized with PfCSP or mAb against P. yoelii (Py) CSP. Incubation of Pf or Py sporozoites with the immune sera or mAbs not only affected sporozoite invasion in vitro but continued to affect intracellular forms for several days after invasion(38,39). Additionally, using anti-PfCSP sera, it was also observed that late EEFs from sera-treated sporozoites had abnormal morphology(38). Altogether, it was thus concluded that the anti-CSP Abs present in the sera had a long-term effect on the parasites(38,39).”

    2. eLife Assessment

      This important study shows that a monoclonal antibody against the repetitive region of the circumsporozoite protein (CSP) of the Malaria-causing parasite P. berghei has neutralizing activity on parasite invasion and development. The authors present convincing in vivo data confirming previous in vitro work, that suggested the intracellular post -invasion effect for this antibody. The findings offer insights into the inhibitory action of this anti-CSP antibody, which could inform the development of more effective malaria vaccines and therapeutic antibodies."

      [Editors' note: this paper was reviewed by Review Commons.]

    3. Reviewer #1 (Public review):

      The study by Aguirre-Botero et al. shows the dynamics of 3D11 anti-CSP monoclonal antibody (mAb) mediated elimination of rodent malaria Plasmodium berghei (Pb) parasites in the liver. The authors show that the anti-CSP mAb could protect against intravenous (i.v.) Pb sporozoite challenge along with the cutaneous challenge, but requires higher concentration of antibody. Importantly, the study shows that the anti-CSP mAb not only affects sporozoite motility, sinusoidal extravasation, and cell invasion but also partially impairs the intracellular development inside the liver parenchyma, indicating a late effect of this antibody during liver stage development. While the study is interesting and conducted well, the only novel yet very important observation made in this manuscript is the effect of the anti-CSP mAb on liver stage development.

      Comments on latest version:

      No further comments.

    4. Reviewer #1 (Public review):

      The study by Aguirre-Botero et al. shows the dynamics of 3D11 anti-CSP monoclonal antibody (mAb) mediated elimination of rodent malaria Plasmodium berghei (Pb) parasites in the liver. The authors show that the anti-CSP mAb could protect against intravenous (i.v.) Pb sporozoite challenge along with the cutaneous challenge, but requires higher concentration of antibody. Importantly, the study shows that the anti-CSP mAb not only affects sporozoite motility, sinusoidal extravasation, and cell invasion but also partially impairs the intracellular development inside the liver parenchyma, indicating a late effect of this antibody during liver stage development. While the study is interesting and conducted well, the only novel yet very important observation made in this manuscript is the effect of the anti-CSP mAb on liver stage development.

      Comments on latest version:

      No further comments.

    5. Reviewer #3 (Public review):

      Summary:

      Aguirre-Botero et al have studied the effect of a potent monoclonal antibody against the circumsporozoite protein, the major surface protein of the malaria sporozoite. This is an elegantly designed, performed, and analyzed study. They have efficiently delineated the mode of action of anti-CSP repeat mAb and confirmed previous in vitro work (not cited) that demonstrated the same intracellular effect.

      Major comments from the previous round of review:

      Line 51: The authors claim a correlation between high antibody levels and protection. However, they did not provide direct proof that these antibodies were responsible for protection, nor did they establish a cut-off level of anti-CSP antibodies that would distinguish between protected and unprotected individuals.

      Line 326: The late intrahepatic effect of mAb against the CSP repeat has been previously reported (see Figure 2, Nudelman et al, J Immunol, 1989). The effect was shown to affect the transition from liver trophozoites to liver schizonts. This study should be cited and discussed.

      Significance:

      A well-done study that elucidates the mechanisms of a protective monoclonal antibody against malaria sporozoites. These data are important and will interest a large audience of researchers working in infectious diseases and immunology.

      Comments on latest version:

      With the addition of new experiments and proper addition of missing references and minor text correction, the manuscript has been improved.

    1. eLife Assessment

      This important study assessed the effects of food intake on sharp wave-ripples in the hippocampus of mice during subsequent sleep. Solid evidence supports the conclusion that sharp wave-ripples are enhanced by food consumption. This work will likely interest researchers studying multiple functions including memory, metabolism, and brain-body physiology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Kaya et al. studies the effect of food consumption on hippocampal sharp wave ripples (SWRs) in mice. The authors use multiple foods and forms of food delivery to show that the frequency and power of SWRs increase following food intake, and that this effect depends on the caloric content of food. The authors also studied the effects of the administration of various food-intake-related hormones on SWRs during sleep, demonstrating that ghrelin negatively affects SWR rate and power, but not GLP-1, insulin, or leptin. Finally, the authors use fiber photometry to show that GABAergic neurons in the lateral hypothalamus, increase activity during a SWR event.

      Strengths:

      The experiments in this study seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript that food intake enhances hippocampal SWRs. Taken together, this study is likely to be impactful to the study of the impact of feeding on sleep behavior, as well as the phenomena of hippocampal SWRs in metabolism.

      Weaknesses:

      Details of experiments are missing in the text and figure legends. Additionally, the writing of the manuscript could be improved.

    3. Reviewer #2 (Public review):

      Summary:

      Kaya et al uncover an intriguing relationship between hippocampal sharp wave-ripple production and peripheral hormone exposure, food intake, and lateral hypothalamic function. These findings significantly expand our understanding of hippocampal function beyond mnemonic processes and point a direction for promising future research.

      Strengths:

      Some of the relationships observed in this paper are highly significant. In particular, the inverse relationship between GLP1/Leptin and Insulin/Ghrelin are particularly compelling as this aligns well with opposing hormone functions on satiety.

      Weaknesses: I would be curious if there were any measurable behavioral differences that occur with different hormone manipulations.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Kaya et al. explores the effects of feeding on sharp wave-ripples (SWRs) in the hippocampus, which could reveal a better understanding of how metabolism is regulated by neural processes. Expanding on prior work that showed that SWRs trigger a decrease in peripheral glucose levels, the authors further tested the relationship between SWRs and meal consumption by recording LFPs from the dorsal CA1 region of the hippocampus before and after meal consumption. They found an increase in SWR magnitude during sleep after food intake, in both food restricted and ad libitum fed conditions. Using fiber photometry to detect GABAergic neuron activity in the lateral hypothalamus, they found increased activity locked to the onset of SWRs. They conclude that the animal's satiety state modulates the amplitude and rate of SWRs, and that SWRs modulate downstream circuits involved in regulating feeding. These experiments provide an important step forward in understanding how metabolism is regulated in the brain. However, currently, the paper lacks sufficient analyses to control for factors related to sleep quality and duration; adding these analyses would further support the claim that food intake itself, as opposed to sleep quality, is primarily responsible for changes in SWR activity. Adding this, along with some minor clarifications and edits, would lead to a compelling case for SWRs being modulated by a satiety state. The study will likely be of great interest in the field of learning and memory while carrying broader implications for understanding brain-body physiology.

      Strengths:

      The paper makes an innovative foray into the emerging field of brain-body research, asking how sharp wave-ripples are affected by metabolism and hunger. The authors use a variety of advanced techniques including LFP recordings and fiber photometry to answer this question. Additionally, they perform comprehensive and logical follow-up experiments to the initial food-restricted paradigm to account for deeper sleep following meal times and the difference between consumption of calories versus the experience of eating. These experiments lay the groundwork for future studies in this field, as the authors pose several follow-up questions regarding the role of metabolic hormones and downstream brain regions.

      Weaknesses:

      Major comments:

      (1) The authors conclude that food intake regulates SWR power during sleep beyond the effect of food intake on sleep quality. Specifically, they made an attempt to control for the confounding effect of delta power on SWRs through a mediation analysis. However, a similar analysis is not presented for SWR rate. Moreover, this does not seem to be a sufficient control. One alternative way to address this confound would be to subsample the sleep data from the ad lib and food restricted conditions (or high calorie and low calorie, etc), to match the delta power in each condition. When periods of similar mean delta power (i.e. similar sleep quality) are matched between datasets, the authors can then determine if a significant effect on SWR amplitude and rate remains in the subsampled data.

      (2) Relatedly, are the animals spending the same amount of time sleeping in the ad lib vs. food restricted conditions? The amount of time spent sleeping could affect the probability of entering certain stages of sleep and thus affect SWR properties. A recent paper (Giri et al., Nature, 2024) demonstrated that sleep deprivation can alter the magnitude and frequency of SWRs. Could the authors quantify sleep quantity and control for the amount of time spent sleeping by subsampling the data, similar to the suggestion above?

      (3) Plot 5I only reports significance but does not clearly show the underlying quantification of LH GABAergic activity. Upon reading the methods for how this analysis was conducted, it would be informative to see a plot of the pre-SWR and post-SWR integral values used for the paired t-test whose p-values are currently shown. For example, these values could be displayed as individual points overlaid on a pair of box-and-whisker plots of the pre- and post-distribution within the session (perhaps for one example session per mouse with the p-value reported, to supplement a plot of the distribution of p-values across sessions and mice). If these data are non-normal, the authors should also use a non-parametric statistical test.

      Minor comments:

      (4) A brief explanation (perhaps in the discussion) of what each change in SWR property (magnitude, rate, duration) could indicate in the context of the hypothesis may be helpful in bridging the fields of metabolism and memory. For example, by describing the hypothesized mechanistic consequence of each change, could the authors speculate on why ripple rate may not increase in all the instances where ripple power increases after feeding? Why do the authors speculate that ripple duration does not increase, given that prior work (Fernandez-Ruiz et al. 2019) has shown that prolonged ripples support enhanced memory?

      (5) The authors suggest that "SWRs could modulate peripheral metabolism" as a future implication of their work. However, the lack of clear effects from GLP-1, leptin and insulin complicates this interpretation. It might be informative for readers if the authors expanded their discussion of what specific role they speculate that SWRs could play in regulating metabolism, given these negative results.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Kaya et al. studies the effect of food consumption on hippocampal sharp wave ripples (SWRs) in mice. The authors use multiple foods and forms of food delivery to show that the frequency and power of SWRs increase following food intake, and that this effect depends on the caloric content of food. The authors also studied the effects of the administration of various food-intake-related hormones on SWRs during sleep, demonstrating that ghrelin negatively affects SWR rate and power, but not GLP-1, insulin, or leptin. Finally, the authors use fiber photometry to show that GABAergic neurons in the lateral hypothalamus, increase activity during a SWR event.

      Strengths:

      The experiments in this study seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript that food intake enhances hippocampal SWRs. Taken together, this study is likely to be impactful to the study of the impact of feeding on sleep behavior, as well as the phenomena of hippocampal SWRs in metabolism.

      Weaknesses:

      Details of experiments are missing in the text and figure legends. Additionally, the writing of the manuscript could be improved.

      We thank the reviewer for their favorable assessment of the work and its potential impact. We will add all requested details in the text and figure legends and will revise the wording of the manuscript to improve its clarity.

      Reviewer #2 (Public review):

      Summary:

      Kaya et al uncover an intriguing relationship between hippocampal sharp wave-ripple production and peripheral hormone exposure, food intake, and lateral hypothalamic function. These findings significantly expand our understanding of hippocampal function beyond mnemonic processes and point a direction for promising future research.

      Strengths:

      Some of the relationships observed in this paper are highly significant. In particular, the inverse relationship between GLP1/Leptin and Insulin/Ghrelin are particularly compelling as this aligns well with opposing hormone functions on satiety.

      Weaknesses:

      I would be curious if there were any measurable behavioral differences that occur with different hormone manipulations.

      We thank the reviewer for their favorable assessment of the work and its contribution to our understanding of non-mnemonic hippocampal function. Whether there are behavioral differences that occur following administration of the different hormones is a great question, yet unfortunately our study design did not include fine behavioral monitoring to the degree that would allow answering it. While some previous studies have partially addressed the behavioral consequences of the delivery of these hormones (we will include a reference to these studies in the revised manuscript), how these changes may interact with the hippocampal and hypothalamic effects we observe is a very interesting next step.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Kaya et al. explores the effects of feeding on sharp wave-ripples (SWRs) in the hippocampus, which could reveal a better understanding of how metabolism is regulated by neural processes. Expanding on prior work that showed that SWRs trigger a decrease in peripheral glucose levels, the authors further tested the relationship between SWRs and meal consumption by recording LFPs from the dorsal CA1 region of the hippocampus before and after meal consumption. They found an increase in SWR magnitude during sleep after food intake, in both food restricted and ad libitum fed conditions. Using fiber photometry to detect GABAergic neuron activity in the lateral hypothalamus, they found increased activity locked to the onset of SWRs. They conclude that the animal's satiety state modulates the amplitude and rate of SWRs, and that SWRs modulate downstream circuits involved in regulating feeding. These experiments provide an important step forward in understanding how metabolism is regulated in the brain. However, currently, the paper lacks sufficient analyses to control for factors related to sleep quality and duration; adding these analyses would further support the claim that food intake itself, as opposed to sleep quality, is primarily responsible for changes in SWR activity. Adding this, along with some minor clarifications and edits, would lead to a compelling case for SWRs being modulated by a satiety state. The study will likely be of great interest in the field of learning and memory while carrying broader implications for understanding brain-body physiology.

      Strengths:

      The paper makes an innovative foray into the emerging field of brain-body research, asking how sharp wave-ripples are affected by metabolism and hunger. The authors use a variety of advanced techniques including LFP recordings and fiber photometry to answer this question. Additionally, they perform comprehensive and logical follow-up experiments to the initial food-restricted paradigm to account for deeper sleep following meal times and the difference between consumption of calories versus the experience of eating. These experiments lay the groundwork for future studies in this field, as the authors pose several follow-up questions regarding the role of metabolic hormones and downstream brain regions.

      We thank the reviewer for their appreciation and constructive review of the work.

      Weaknesses:

      Major comments:

      (1) The authors conclude that food intake regulates SWR power during sleep beyond the effect of food intake on sleep quality. Specifically, they made an attempt to control for the confounding effect of delta power on SWRs through a mediation analysis. However, a similar analysis is not presented for SWR rate. Moreover, this does not seem to be a sufficient control. One alternative way to address this confound would be to subsample the sleep data from the ad lib and food restricted conditions (or high calorie and low calorie, etc), to match the delta power in each condition. When periods of similar mean delta power (i.e. similar sleep quality) are matched between datasets, the authors can then determine if a significant effect on SWR amplitude and rate remains in the subsampled data.

      This is an important point that we believe we addressed in a few complementary ways. First, the mediation analysis we implemented measures the magnitude and significance of the contribution of food on SWR power after accounting for the effects of delta power, showing a highly significant food-SWR contribution. While the objective of subsampling is similar, mediation is a more statistically robust approach as it models the relationship between food, SWR power, and delta power in a way that explicitly accounts for the interdependence of these variables. Further, subsampling introduces the risk of losing statistical power by reducing the sample size, due to exclusion of data that might contain relevant and valuable information. Mediation analysis, on the other hand, uses the full dataset and retains statistical power while modeling the relationships between variables more holistically. However, as we were not satisfied with a purely analytical approach to test this issue, we carried out a new set of experiments in ad-libitum fed mice, where there is no potential issue of food restriction impairing sleep quality in the pre-sleep session. In these conditions food amount also significantly correlated with, and showed significant mediation of, the SWR power change. Finally, we acknowledge and discuss this point in the Discussion, highlighting that given the known relationship between cortical delta and SWRs, it is challenging to fully disentangle these signals.

      (2) Relatedly, are the animals spending the same amount of time sleeping in the ad lib vs. food restricted conditions? The amount of time spent sleeping could affect the probability of entering certain stages of sleep and thus affect SWR properties. A recent paper (Giri et al., Nature, 2024) demonstrated that sleep deprivation can alter the magnitude and frequency of SWRs. Could the authors quantify sleep quantity and control for the amount of time spent sleeping by subsampling the data, similar to the suggestion above?

      We will include a comparison of sleep amount in the revised manuscript.

      Additionally, we will add details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep).

      (3) Plot 5I only reports significance but does not clearly show the underlying quantification of LH GABAergic activity. Upon reading the methods for how this analysis was conducted, it would be informative to see a plot of the pre-SWR and post-SWR integral values used for the paired t-test whose p-values are currently shown. For example, these values could be displayed as individual points overlaid on a pair of box-and-whisker plots of the pre- and post-distribution within the session (perhaps for one example session per mouse with the p-value reported, to supplement a plot of the distribution of p-values across sessions and mice). If these data are non-normal, the authors should also use a non-parametric statistical test.

      We will include this quantification and visual representation in the revised manuscript.

      Minor comments:

      (4) A brief explanation (perhaps in the discussion) of what each change in SWR property (magnitude, rate, duration) could indicate in the context of the hypothesis may be helpful in bridging the fields of metabolism and memory. For example, by describing the hypothesized mechanistic consequence of each change, could the authors speculate on why ripple rate may not increase in all the instances where ripple power increases after feeding? Why do the authors speculate that ripple duration does not increase, given that prior work (Fernandez-Ruiz et al. 2019) has shown that prolonged ripples support enhanced memory?

      We will include a discussion of these points in the revised manuscript.

      (5) The authors suggest that "SWRs could modulate peripheral metabolism" as a future implication of their work. However, the lack of clear effects from GLP-1, leptin and insulin complicates this interpretation. It might be informative for readers if the authors expanded their discussion of what specific role they speculate that SWRs could play in regulating metabolism, given these negative results.

      While we provided potential explanations for the lack of effects of the hormone administrations, we will further elaborate on this point in the revised manuscript.

    1. eLife Assessment

      This study presents fundamental insights into overcoming resistance in hormone receptor-positive breast cancer by demonstrating that sustained CDK4/6 inhibitor treatment, either alone or in combination with CDK2 inhibitors, significantly suppresses the growth of drug-resistant cancer cells. The findings are supported by compelling evidence from both in vitro cell line experiments and in vivo mouse models, highlighting the therapeutic potential of maintaining CDK4/6 inhibitors beyond disease progression. Additionally, the identification of cyclin E overexpression as a key driver of resistance offers a target that will be of value for future therapeutic strategies, potentially improving outcomes for patients with advanced breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors identified that<br /> (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase;<br /> (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drug-resistant tumors;<br /> (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance;<br /> (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell;<br /> (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths:

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it.

      Weaknesses:

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers.

    3. Reviewer #2 (Public review):

      Summary:

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths:

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses:

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section.

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated.

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section.

      (4) Could this treatment approach be extended to triple-negative breast cancer?

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths:

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer.

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance.

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments.

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research.

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses:

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: <br /> In this manuscript, the authors identified that 

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase; 

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drug-resistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance;

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell;

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths:

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it.

      Weaknesses:

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Thank you for your thoughtful review and for highlighting both the strengths and weaknesses of our manuscript. We appreciate your recognition of the methodological rigor and the significance of our findings in addressing resistance to CDK4/6 inhibitors combined with endocrine therapy.

      To address your concern regarding the need to delineate our results from those achieved by other researchers, we will incorporate clarifications in the revised manuscript. Specifically, we will:

      (1) Clearly distinguish our novel contributions from prior findings in the field.

      (2) Explicitly cite and discuss relevant studies to contextualize our work, ensuring that our contributions are appropriately framed within the broader body of knowledge.

      These revisions will enhance the transparency and impact of our manuscript, as well as highlight the originality and significance of our findings. Thank you again for your constructive feedback.

      Reviewer #2 (Public review):

      Summary:

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths:

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we will update the methods section to include a detailed description of how the drug-resistant cell lines were developed. Specifically, we will clarify whether a drug concentration gradient treatment was employed and provide step-by-step details to ensure reproducibility.

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (1-4). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6 inhibitor-resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we will provide explicit details regarding the number of replicates and mice used for each experiment. This information will be included in the materials and methods section, figure legends, and relevant text to ensure transparency and clarity.

      (4) Could this treatment approach be extended to triple-negative breast cancer? 

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on our data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we anticipate that the benefits of maintaining CDK4/6 inhibitors could indeed be applied to TNBC with an intact Rb/E2F pathway.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths:

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer.

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance.

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments.

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research.

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses:

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      We greatly appreciate this opportunity to further contextualize our findings within clinical practice. In the revised manuscript, we will expand the discussion to explore how the identified mechanisms can inform patient stratification and therapeutic combinations. We will also highlight the potential of integrating CDK2 inhibitors with continued CDK4/6 inhibition as a second-line strategy for HR+ breast cancer patients who exhibit resistance to CDK4/6 inhibitors, leveraging insights from current and ongoing clinical trials. This will provide a clearer framework for translating our findings into actionable therapeutic strategies.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      Thank you for this insightful suggestion. In the revised manuscript, we will delve deeper into the molecular mechanisms by which CDK2 inhibitors counteract resistance to CDK4/6 inhibitors and endocrine therapy. We will emphasize the role of the non-canonical Rb inactivation pathway and upregulated transcriptional activity in reactivating CDK2, which contribute to resistance under CDK4/6 inhibition. Furthermore, we will discuss how dual inhibition of CDK4/6 and CDK2 effectively suppresses this resistance pathway, offering a mechanistic rationale for the therapeutic potential of this combination strategy.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We greatly appreciate the reviewer for raising this important point. To address this, we will incorporate a discussion on the long-term safety and efficacy of CDK4/6 inhibitor maintenance therapy. Drawing from clinical trials and retrospective analyses (5-9), we will highlight data supporting the tolerability of prolonged CDK4/6i treatment, particularly in combination with endocrine therapy. We will also discuss its clinical benefits over chemotherapy or endocrine therapy alone, contextualizing these findings with our proposed therapeutic approach (6,8-11).

      References:

      (1) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M_, et al._ Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (2) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q_, et al._ INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (3) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H_, et al._ c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (4) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA_, et al._ Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (5) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on First-line CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (6) Xi J, Oza A, Thomas S, Ademuyiwa F, Weilbaecher K, Suresh R_, et al._ Retrospective Analysis of Treatment Patterns and Effectiveness of Palbociclib and Subsequent Regimens in Metastatic Breast Cancer. J Natl Compr Canc Netw 2019;17:141-7

      (7) Basile D, Gerratana L, Corvaja C, Pelizzari G, Franceschin G, Bertoli E_, et al._ First- and second-line treatment strategies for hormone-receptor (HR)-positive HER2-negative metastatic breast cancer: A real-world study. Breast 2021;57:104-12

      (8) Kalinsky K, Accordino MK, Chiuzan C, Mundi PS, Sakach E, Sathe C_, et al._ Randomized Phase II Trial of Endocrine Therapy With or Without Ribociclib After Progression on Cyclin-Dependent Kinase 4/6 Inhibition in Hormone Receptor–Positive, Human Epidermal Growth Factor Receptor 2–Negative Metastatic Breast Cancer: MAINTAIN Trial. Journal of Clinical Oncology;0:JCO.22.02392

      (9) Kalinsky K, Bianchini G, Hamilton EP, Graff SL, Park KH, Jeselsohn R_, et al._ Abemaciclib plus fulvestrant vs fulvestrant alone for HR+, HER2- advanced breast cancer following progression on a prior CDK4/6 inhibitor plus endocrine therapy: Primary outcome of the phase 3 postMONARCH trial. Journal of Clinical Oncology 2024;42:LBA1001-LBA

      (10) Mayer EL, Wander SA, Regan MM, DeMichele A, Forero-Torres A, Rimawi MF_, et al._ Palbociclib after CDK and endocrine therapy (PACE): A randomized phase II study of fulvestrant, palbociclib, and avelumab for endocrine pre-treated ER+/HER2- metastatic breast cancer. Journal of Clinical Oncology 2018;36:TPS1104-TPS

      (11) Llombart-Cussac A, Harper-Wynne C, Perello A, Hennequin A, Fernandez A, Colleoni M_, et al._ Second-line endocrine therapy (ET) with or without palbociclib (P) maintenance in patients (pts) with hormone receptor-positive (HR[+])/human epidermal growth factor receptor 2-negative (HER2[-]) advanced breast cancer (ABC): PALMIRA trial. Journal of Clinical Oncology 2023;41:1001-

    1. eLife Assessment

      Gating of mechanosensitive channels has been explained by the force-from-lipids model in which mechanical coupling of the channel protein to the plasma membrane transfers force from membrane tension to open the channel. In this important manuscript, the authors provide evidence for this mechanism in two different mechanically gated channels. The experiments were carried out in the same membranes, but the evidence is incomplete without a clear explanation of the relationship between measured mechanical parameters and membrane interfacial tension.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study the effect of the addition of synthetic amphiphile on the gating mechanisms of the mechano-sensitive channel MscL. They observe that the amphiphile reduces the membrane stretching and bending modulii, and increases the channel activation pressure. They then conclude that gating is sensitive to these two membrane parameters. This is explained by the effect of the amphiphile on the so-called membrane interfacial tension.

      Strengths:

      The major strength is that the authors found a way to tune the membrane's mechanical properties in a controlled manner, and find a progressive change of the suction pressure at which MscL gates. If analysed thoroughly, these results could give valuable information.

      Weaknesses:

      The weakness is the analysis and the discussion. I would like to have answers to some basic questions.

      (1) The explanation of the phenomenon involves a difference between interfacial tension and tension, without the difference between these being precisely defined. In the caption of Figure 4, one can read "Under tension, the PEO groups adsorb to the bilayer, suggesting adsorption is a thermodynamically favorable process that lowers the interfacial tension." What does this mean? Under what tension is the interfacial tension lowered? The fact that the system's free energy could be lowered by putting it under mechanical tension would result in a thermodynamic unstable situation. Is this what the authors mean?

      (2) From what I understand, a channel would feel the tension exerted by the membrane at its periphery, which is what I would call membrane tension. The fact that polymers may reorganise under membrane stretch to lower the system's free energy would certainly affect the membrane stretching modulus (as measured Figure 2E), but what the channel cares about is the tension (I would say). If the membrane is softer, a larger pipette pressure is required to reach the same level of tension, so it is not surprising that a given channel requires a larger activation pressure in softer membranes. To me, this doesn't mean that the channel feels the membrane stiffness, but rather that a given pressure leads to different tensions (which is what the channel feels) for different stiffnesses.

      (3) In order to support the authors' claim, the micropipette suction pressure should be appropriately translated into a membrane tension. One would then see whether the gating tension is affected by the presence of amphiphiles. In the micropipette setup used here, one can derive a relationship between pressure and tension, that involves the shape of the membrane. This relationship is simple (tension=pressure difference times pipette radius divided by 2) only in the limit where the membrane tongue inside the pipette ends with a hemisphere of constant radius independent of the pressure, and the pipette radius is much smaller than the GUV radius. None of these conditions seem to hold in Figure 2C. On the other hand, the authors do report absolute values of tension in the y-axis of Figure 2D. It seems quite straightforward to plot the activation tension (rather than pressure) as a function of the amphiphile volume fraction in Figure 2B. This is what needs to be shown.

      (4) The discussion needs to be improved. I could not find a convincing explanation of the role of interfacial tension in the discussion. The equation (p.14) distinguishes three contributions, which I understand to be (i) an elastic membrane deformation such as hydrophobic mismatch or other short-range effects, (ii) the protein conformation energy, and (iii) the work done by membrane tension. Apparently, the latter is where the effect is (which I agree with), but how this consideration leads to a gating energy difference (between lipid only and modified membrane) proportional to the interfacial tension is completely obscure (if not wrong).

      (5) I am rather surprised at the very small values of stretching and bending modulii found under high-volume fraction. These quantities are obtained by fitting the stress-strain relationship (Figure 2D). Such a plot should be shown for all amphiphile volume fraction, so one can assess the quality of the fits.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes how synthetic polymers, primarily poloxamers of different sizes, influence bacterial mechanosensitive channel MscL gating by modifying the interfacial tension of the membrane. The authors expressed MscL in U2OS cells and chemically blebbed the cells to derive giant plasma membrane vesicles (GPMVs) containing MscL G22S. They applied micropipette aspiration on GPMVs to obtain bending rigidity (kc) and area expansion modulus (kA) and used patch clamping to obtain activation pressure. They found a negative correlation between kc and kA with activation pressure and attributed the changes to activation pressure to the lowering of the interfacial tension in the presence of polymers. They carried out coarse-grain molecular dynamics simulations and showed that under tension the hydrophilic PEO group adsorbs to the bilayer more, thereby lowering the interfacial tension. Besides MscL, they showed similar results with TREK-1 activation. The conclusion that differences in interfacial tension are what drive the changes in activation pressure is based on using a thermodynamic model.

      Strengths:

      (1) Reveals that synthetic polymer that lowers bending rigidity and area expansion modulus increases activation pressure of mechanosensitive channel by lowering interfacial tension - this is an important finding.

      (2) General data quality is high with detailed and thorough analysis. The use of both micropipette aspiration and patch clamp in the same study is noteworthy.

      (3) Discussion on nanoplastics and their effect on membrane properties and therefore their impact on mechanosensitivity is interesting.

      Weaknesses:

      Interfacial tension is not experimentally measured. Given the main argument of this paper is that synthetic polymers reduce interfacial tension, which increases MS channel activation pressure, it would be prudent to show experimental measurements to bolster their analysis.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors set out to test the "force from lipids" mechanism of mechanosensitive channel gating, which posits that mechanical properties of the membrane are directly responsible for converting membrane tension into useful energy for channel gating. They employ amphiphilic polymers called poloxamers to alter membrane mechanical properties and relate those to the threshold of mechanical activation of the MscL channel of E.coli.

      The authors heterologously express the channel, perform electrical recordings, and assess the mechanical properties of vesicles derived from the same membranes. This allows them to directly compare derived mechanical parameters to channel gating in the same environment.

      They further repeat experiments in an eukaryotic mechano-channel and show that the same principles apply to gating in this very different protein, providing support for the force from lipids hypothesis.

      Strengths:

      In this work, characterization of the mechanical properties of the plasma membrane and electrical recordings of channel activity are carried out in membranes derived from the same cells. This is a nice contribution to these experiments since usually these two properties are measured in separate membranes with differing compositions. The experiments are of high quality and the data analysis and interpretation are careful.

      Weaknesses:

      It is not clear to this reviewer what the relationship is between the mechanical properties the authors measure, the membrane area expansion modulus, and bending rigidity, to what they call "interfacial tension".

    5. Author response:

      We appreciate the time and thoughtful reviews of all 3 reviewers. Ahead of a full revision of the paper, we would like to address a couple of points the reviewers have raised that we plan to address in more detail in our full revision.

      (1) The relationship between membrane tension and interfacial tension: The major request by reviewers was for a better explanation of the relationship between measured mechanical parameters and membrane interfacial tension. We plan to include a schematic of the different forces at play in the membrane and to clarify our discussion and here, provide a brief explanation.

      In our study, we identified a relationship between channel activation pressure and two membrane mechanical properties (area expansion modulus (K<sub>A</sub>) and bending rigidity (K<sub>c</sub>)) though we did not find a correlation between channel activation pressure and a third mechanical property (membrane fluidity). Through further computational analysis of the membranes, we identified an additional property called interfacial tension that helps unify and explain our results. Interfacial tension (γ) is a property akin to surface tension that reflects the chemical composition at the interface of the membrane (between the polar headgroups of the lipids and the hydrophobic acyl chains of the lipids) and balances the repulsive interaction of the nonpolar hydrocarbon chains with the polar headgroup regions of the lipids. In the established polymer brush model, the expansion modulus is proportional to the interfacial tension (W. Rawicz, Biophyiscal Journal, 2000)

      γ = K<sub>A</sub>/C,

      where C is a constant. Interfacial tension occurs at the boundary between the lipid bilayer and external aqueous environment and is different from mechanical tension. While mechanical membrane tension (t) reflects a physical force in plane with the membrane, interfacial tension reflects the chemical composition at each interface of the membrane. While mechanical membrane tension depends on the size and shape of the membrane, interfacial tension is independent of these features and depends on the molecular composition of the liquid-liquid interface. An expanded discussion on this topic was recently provided (Lipowsky. Faraday Discussions. 2024). While distinct, these two properties can be related to one another via the area expansion modulus (K<sub>A</sub>). Typically, one would imagine that upon reducing interfacial tension, and correspondingly reducing the K<sub>A</sub>, it should now take less energy to stretch the membrane to the same extent and should reduce the activation pressure (and corresponding in plane mechanical tension ) required to open an embedded mechanosensitive channel. Interestingly though, interfacial tension also works to pull the channel open so that a reduction in interfacial tension also means more energy will be required to open the channel. We find that reductions in interfacial tension and corresponding increased energy required to open embedded channels outweighs the reduced tension that should be required to stretch the membrane. We plan to more clearly explain this tradeoff in our revision. Overal, our findings identify the exact properties driving mechanosensitive channel behavior in our study. Further, they provide a guide to understanding how and why shifts in mechanosensitive channel activation occur by connecting chemical composition changes to the changes in membrane tension propagation in a given membrane.

      (2) Data presentation to support determined area expansion modulus and bending rigidity values: We will show stress strain curves used to derive Ka and kc values

      (3) Address why membrane tension data was not shown for ephys experiments: The micropipette and patch clamp setups are different, and we did not use the same system for both measurements. In fact, limitations in tools that would allow for concurrent tension measurements while conducting channel activation measurements have limited our understanding of the role of membrane tension on mechanosensation to date. While recent studies have attempted to resolve this limitation through the design of new tools that enable concurrent monitoring of mechanosensitive channel activation and membrane tension (Lüchtefeld et al. Nature Methods. 2024), these tools were not available to us during our study or now. Because our study also attempted to connect these two features (membrane tension and channel activation) but we lacked tools to do so simultaneously, we used two sets of measurements to separately uncover membrane mechanical properties and channel activation pressure.

      One reason it is difficult to measure membrane tension during a typical patch clamp study is because of limitations in the imaging equipment and pipettes used for this assay. The experiment is usually done by looking through the eyepiece and the pipette angle is around 45 degrees from the plane of the stage so it would be hard to visualize changes in the patch geometry in the tip of the pipette. Basically, we are able to see the pipette touch the GMPV, but cannot resolve the patch moving up the pipette. In response to the reviewer comment that tension=pressure difference times pipette radius divided by 2, we were unable to measure the radius and changes in radius of a patch upon increases in applied pressure due to the above mentioned imaging constraints. This limitation is why we were unable to directly measure applied tension with our current patch clamp set up.

      (4) Interfacial tension is not experimentally measured: Interfacial tension = K<sub>A</sub> /C where C is a constant (typically C=4 for bilayer membranes). The best way to measure interfacial tension is to determine K<sub>A</sub> (the area expansion modulus), which we have experimentally done by generating stress vs strain curves for GPMVs. In literature, reductions in interfacial tension of a membrane are typically experimentally determined by measuring a corresponding reduction in the associated K<sub>A</sub> value (eg. Ly and Longo. Biophys J. 2004). We have similarly followed this approach.

    1. eLife Assessment

      This study develops useful tools for distinct optogenetic control of neuronal activity by red or blue light. The basic characterization of the activation of a red-shifted channelrhodopsin paired with a blue-light sensitive anion channel engineered to obtain desired inhibitory current kinetics is solid. However, evidence for their practical use under simultaneous multi-color or high frequency stimulation in cells are missing.

    2. Reviewer #1 (Public review):

      Summary

      In this manuscript, the authors generate an AAV-deliverable tool that generates action potentials in response to red light, but not blue light, when expressed in neurons. To do this, they screen some red light-excitatory/blue light-inhibitory opsin pairs to find ones that are spectrally and temporally matched. They first show that this works with Chrimson and GtACR2, however, they expand their search after finding that the tau-off (inactivation after light cessation) kinetics of these two opsins are not well-matched. They directly examine a small set of options based on a literature search and settle on a variant of red light-excitatory Chrimson and blue light-inhibitory ZipACR. To even more closely match the kinetics of this pair, the authors create a structure homology model of the ZipACR retinal binding pocket and use this to guide generation of a small mutant panel, leading to a more optimized ZipACR mutant. They then show that a bicistronically expressed fusion arrangement of these opsins, plus some functional peptides, can drive action potentials up to 20hz with red light and does not do so with blue light, in hippocampal cells transduced by AAV. They also show function in vivo, in a mouse, using a physiological readout. They conclude that their new tool may be useful for complex experimental designs requiring multiple optical channels for write-in/read-out.

      The major advantage claimed by the authors over existing tools is the temporal time-locking of their inhibitory opsin - this is driven by the contrast between tau-off kinetics of their ZipACR variant compared to gtACR2, which is used by the leading competitor tool (BiPOLES).

      Big thoughts<br /> While the authors were carefully thoughtful about the potential influence of temporal kinetics on the efficiency of a tool such as this one, there were no experiments conducted that make use of the unique properties of this molecular strategy (although the authors state that these experiments are now underway in their lab). They share some examples of how the tool could be useful in the discussion. Where do I think this could be useful?

      First, experimental designs that require multiple optical channels of control. This appears to be aligned with the author's thoughts, as they state, correctly, that opsins utilizing retinal as a light-sensing chromophore are universally activated by blue light (the so-called 'blue shoulder'). Therefore, their tool may be useful for stimulating multiple populations using a blue excitatory opsin in neuron A and their tool for red excitation of neuron B - or, in the author's own words, "A potential solution to the problem of cross-talk...". In this manuscript, the authors provide state that there this is possible in theory and that there are no obvious reasons that it would not work, but do not present data that showcases their new tool for this purpose (e.g. Vierock, Johannes, et al. "BiPOLES is an optogenetic tool developed for bidirectional dual-color control of neurons." Nature communications 12.1 (2021): 4527. Figure 4f-I; 6). The same set-up could be imagined for green GECI (or equivalent) imaging of cells in the same volume that their tool is being used in - for instance, interleaving red stimulation light and blue imaging light, (perhaps) without the typical concern of imaging light bleed-through activating the opsin itself. I agree that it will likely work for multi-channel control, but only time will tell, at this point.

      Second, for high-frequency temporal control over both excitation and inhibition in the same neuron. Red light turns the cell on, and blue light turns the cell off (see, for instance, Zhang, Feng, et al. "Multimodal fast optical interrogation of neural circuitry." Nature 446.7136 (2007): 633-639. Figure 2; Vierock as above, Figure 4a,b). Again, here the authors are long on theory ("The new system...can drive time-locked high-frequency action potentials in response to red pulses") and short on explicit data. While they do show that red light = excitation and blue light = inhibition, they neither show 1) all-optical on/off modulation of the same cell; nor 2) high-frequency inhibition or excitation (max stim rate of 20hz, which is the same as the BiPOLES paper used for their LC stimulation paradigm; Vierock, as above, Figure 7a-d). They did provide a response to this critique that data showing excitation and inhibition spread across multiple panels were largely collected from the same cells.

      Despite these major shortcomings, the further development and characterization of tandem opsins, such as this one, is of interest to the community. There is on-going work by the BiPOLES team to create new iterations (e.g. Wahid, J., et al. "P-15 BiPOLES2 is a bidirectional optogenetic tool with a narrow activation spectrum and low red-light excitability." Clinical Neurophysiology 148 (2023): e16.). The authors have collected a substantial amount of additional data along the course of review and, even aside from the final tool, the overall data and approaches shown are useful.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Therefore, their tool may be useful for stimulating multiple populations using a blue excitatory opsin in neuron A and their tool for red excitation of neuron B… Yet, there are no data presented that showcases their new tool for this purpose

      We agree with the reviewer that in this manuscript we have not experimentally shown the applicability of our system for dual optical stimulation. However, the suppression of blue-light excitation of ZipV/T-IvfChr-expressing neurons strongly suggests this can be used in experiments exciting populations of neurons similarly shown for BiPOLES. We don’t see a theoretical basis where this experiment cannot be done if sufficient cell targeting mechanisms (such as the use of cre-lox or retroAAV) is utilized. We have started several projects pursuing these utilities in the meantime.

      While they do show that red light = excitation and blue light = inhibition, they neither show 1) all-optical on/off modulation of the same cell; nor 2) high-frequency inhibition or excitation (max stim rate of 20hz, which is the same as the BiPOLES paper used for their LC stimulation paradigm; Vierock, as above, Figure 7a-d).

      Regarding point 1, we understand that the reviewer asks if we have optically excited (with red light) and inhibited (with blue light) the same neurons. If so, figure 4B1 (optical excitation of ZipT-IvfCh with red light) and figure 5A (optical inhibition of  ZipT-IvfCh with blue light) represent largely the same set of neurons.

      Regarding point 2, we respectfully disagree with the reviewer’s interpretation of Figure 7a-d) in Vierock et al. As we understand, in this part the authors apply a 20 Hz optical stimulation protocol to the LC neurons in vivo. However, there is no data showing that individual neurons do follow this stimulation protocol. To be clear, we are not saying that BiPOLES cannot drive 20 Hz APs. Very likely it can. It is based on ChrimsonR which is capable of doing so (Klapoetke et al., Figure 2). Although, in this manuscript we have not shown data for optical stimulation above 20Hz, our system is based on vfChrimson, which is known to drive AP of 100Hz and above (Mager et al., figure 2 and 3).  

      they must revise the manuscript to show that their approach is both 1) different in some way when compared to BiPOLES (it is my understanding that they did not do this, as per the supplementary alignment of the BiPOLES sequence and the sequence of the BiPOLES-like construct that they did test) and 2) that the properties that the investigators specifically tailored their construct to have confer some sort of experimental advantage when compared to the existing standard.

      In the latest version of the manuscript, we have compared our ZipV-IvfChr and the BiPOLES construct adapted with vfChrimson (Fig. 2 Suppl 1). The mean photocurrent amplitude of IvfChr in the ZipV-IvfChr construct is ~2.7 x higher than BiPOLES adapted with vfChrimson (14 randomly selected HEK293 cells in each group) (Fig. 2 Suppl 1B). We conducted this experiment in HEK293 cells to ensure accurate voltage-clamping and less biased cell selection. Even adjusting for the smaller photocurrent of vfChrimson vs ChrimsonR, this would still translate to ~1.6 x greater photocurrent with ZipV-IvfChr compared to the original BiPOLES utilizing ChrimsonR. We believe the increased efficiency of excitation is an important aspect of adapting vfChrimson for red-light excitation of neurons.

      Reviewer #2 (Public Review):

      (1) In the Introduction or Discussion, the authors could better motivate the need for a red-shifted actuator that lacks blue crosstalk, by giving some specific examples of how the tool could be productively used, e.g. pairing with another blue-shifted excitatory opsin in a different population, or pairing with a GFP-based fluorescent indicator, e.g. GCaMP. The motivation for the current tool is not obvious to non-experts.

      In the discussion, we now provided examples for potential use of the tool. For example, one of the key aspects that can be manipulated by the existing tool is the induction of spike-timing dependent plasticity with 2 wavelengths of light with blue light channelrhodopsin such as oChIEF is used to evoke presynaptic release and ZipT-IvfChr expressed in postsynaptic neuron. In this situation, the rapid termination of inhibitory response is critical so it does not interfere with the induction of LTP or LTD. Another experiment is the alternate control of projection neurons and interneurons in cortical areas, independent controls of neurons of direct and indirect pathways in the striatum to manipulate behavior.

      (2) Simultaneous excitation and inhibition are not the same as non-excitation. The authors mentioned shunting briefly. Another possible issue is changes in osmotic balance. Activation of a Na+ channel and a Cl- channel will lead to net import of NaCl into the cell, possibly changing osmotic pressure. Please discuss.

      We agree with the notion that osmotic, ionic and pH changes in small neuronal structure can be disruptive to the physiology and this is the reason we developed our approach where the fastest channelrhodopsins are used so we can minimize the channel opening time and the flux of ions through the channels when brief light illuminations are applied. Not only the flux of protons, sodium ions and calcium ions are minimized, the flux of chloride should be minimal as well (as the membrane potential should be close to the reversal potential of chloride reversal potential hence low ion flow). Hence our approach should be minimally disruptive compared to most other existing channelrhodopsin-based approaches when short or minimal light pulses were used in conjunction with our tools. This recommendation is included in the updated manuscript .

      (3) The authors showed that in ZipT-IvfChr, orange light drives excitation and blue light does not. But what about simultaneous blue and orange light? Can the blue light overwhelm the effect of the orange light? Since the stated goal is to open the blue part of the spectrum for other applications, one is now worried about "negative" crosstalk. Please discuss and, ideally, characterize this phenomenon.

      We now have performed this experiment. Simultaneous blue (470nm) and red light (635nm) stimulation does not produce AP (Fig .4 Suppl 1A)). This suggests the inhibitory effect of ACR is more efficient than the excitatory effects of IvfChr due to their higher conductance, this re-emphasizes the rapid termination of the ACR effects is critical for minimal disruption of physiological effects in such pairing strategy.

      (3.1) Does the use of the new tool require careful balancing of the expression levels of the ZipT and the IvfChr? Does it require careful balancing of blue and orange light intensities?

      As with any optogenetic tool, the users should validate the efficacy of the tool in their own system. Our tool solely relies on the balanced expression of the 2A system, the efficiency of the two opsins and their degradation of the time-span of expression. These aspects of the tool would be better addressed in future versions of the tools or improvement of the BiPOLES-type of tandem expression in subsequent versions. From the instrumentation side, the light intensity and differential penetration depth requires careful consideration. However, this holds true in most optogenetic and fluorescence imaging-based approaches as well. In the current update of the manuscript, we have included further discussion on these aspects as well.

      (3.2) Also, many opsins show complex and nonlinear responses to dual-wavelength illumination, so each component should be characterized individually under simultaneous blue + orange light.

      We now have performed this experiment (please see our comment to point 3)

      (3.3) I was expecting to see photocurrents at different holding potentials as a function of illumination wavelength for the coexpressed construct (i.e. to see at what wavelength it switches from being excitatory to inhibitory); and also to see I-V curves of the photocurrent at blue and orange wavelengths for the co-expressed constructs (i.e. to see the reversal potential under blue excitation). Overall, the patch clamp and spectroscopic characterization of the individual constructs was stronger than that of the combined constructs.

      We have added the IV curves for the co-expressed construct at different holding potentials for 470nm and 635nm wavelengths. This shows reverse potential for the two wavelengths that are intended for in vitro and in vivo applications. Performing a similar experiment for a variety of wavelengths would not be as valuable, in part, due to the enormous amount of data generated. As we have shown in the study, the response of any channelrhodopsins vary with different light duration and light intensities in addition to the wavelengths and holding potentials. The results for each recorded cell could include stimulation by different wavelengths, stimulation by different illumination intensities, stimulation with different light duration in addition to different holding potentials. Not only would the results be highly variable from cell-to-cell, there will be potentially hundreds or thousands of combinations to be tested per cell (e.g., 5 light intensities @1, 2.5 , 5 , 10 and 20 mW/mm>sup>2</sup>, 8 different wavelengths @ 450nm, 475nm, 500nm, 525nm, 550nm, 575nm, 600nm and 625nm, 7 light durations @ 1ms, 5ms, 10ms, 50ms, 100ms, 500ms and 1s, and , and 6 holding potentials @ -80mV, -70mV, -60mV, -40mV, -20mV and 0mV would result in 1680 stimulation conditions per recorded cell).Technically, the significant lowering of membrane resistance when both IvfChr and ZipACR variants are activated simultaneously would compromise the quality of voltage-clamping even in HEK293 cells with series resistance compensation. We have yet to see any other studies that had included such ambitious electrophysiology experiment for the channelrhodopsin characterization, likely due to the feasibility of such experiment.

      Reviewer #3 (Public Review):

      (1) The enhanced vf-Chrimson could potentially be a highlight of the manuscript, serving broader applications. Yet, gauging the overall improvements of ivf-Chrimson in comparison to other Chrimson variants remains intricate due to several reasons. First, photocurrents from ivf-Chrimson seem smaller than those from C-Chrimson (Supplemental Figure 3), and a direct comparison with standard vf-Chrimson is absent.

      We appreciate the reviewer’s positive view of our modified variant. We did not emphasize this particular modification as it was identical to our previous published modification and similar to that previously published by others (CsChrimson and C1Chrimson). In all these cases, improved membrane expression was consistently detected. We believe that expression data and our comparison of C-Chrimson and IvfChr is sufficient to justify the improved membrane expression and function.

      Second, while membrane expression of ivf-Chrimson appears enhanced in provided brightfield recordings, the quantitative analysis would necessitate confocal microscopy and a membrane marker (Supplemental Figure)

      We have now quantified the results with a membrane palmitoylated mCherry using confocal microscopy shown in Fig 2 Suppl1 A. We measured the Pearson Correlation Coefficient of the mCherry with EGFP or Citrine signal for the 6 constructs (vfChrimson, vfChrimson with trafficking sequence, vfChrimson with N-terminal signaling peptide from oChIEF (C-vfChrimson), vfChrimson with trafficking sequence and N-terminal signaling peptide from oChIEF (IvfChr), BiPOLES with EGFP or citrine and vfChrimson) and the results were identical and consistent with the prior results using epifluorescence microscopy.

      (2) Finally, other N-terminal modified Chrimson variants, like CsChrimson by Klapoetke et al. in 2014 and C1Chrimson by Oda et al. in 2018, have been generated. Comparing ivf-Chrimson to vf-CsChrimson or vf-C1Chrimson would be important to evaluate the benefits of the applied N-terminal modification.

      Our development of IvfChrimson is similar to the approach of vf-CsChrimson and identical to that of vf-C1Chrimson and we do not claim these modifications to be unique or superior. However, we have developed our design independently of these other studies and we have more extensive functional comparison and characterization data of our IvfChrimson variant than the other studies.

      (2.1) The action spectra of ZipACR suggest peak absorption of ZipACR WT and its mutant at 525 - 550 nm (Fig. 3). This is even further red-shifted than previously reported by Govorunova et al. Further action spectra recordings differ for all constructs between recordings initiated with blue or red light (Supplementary Fig. 5). This discrepancy is unexpected and should be discussed.

      We thank the reviewer for the comment, this was a mistake in the traces used for the figure. The example traces were the spectral response measured from the 400 nm to 650 nm instead of the 650 nm to 400 nm order shown in the spectral data. This has now been corrected.

      Additionally, the representative photocurrents of Zip(151V) in Fig. 3D1 do not align with the corresponding action spectrum in Fig. 3D2 as they show maximal photocurrents for 400 nm excitation.

      Please, see point above.

      (3) The authors introduce two different bicistronic expression cassettes-ZipT-IvfChR and ZipV-IvfChR-without providing clear guidelines on their conditions of use. Although the authors assert that ZipT is slower and further red-shifted than ZipV, the differences in the data for both ACR mutants are small and the benefits of the different final constructs should be explained.

      In our testing in neurons, ZipT has less ‘escaped’ spikes after the termination of the light pulses in the cells we have tested. However, this is dependent on the membrane properties such as capacitance and resistance of the cells. ZipV has a faster termination time and in some situations may be necessary due to its faster termination time and reduced disruption of physiological processes.

      We have now included this discussion in our updated manuscript.

      (4) The ZipT/V-IvfChRs are designed as bicistronic constructs; yet, disparities in membrane trafficking and protein degradation between the two channels could lead to divergences in blue and red light photoresponses. For future applicants, understanding the extent of expression ratio variations across cells using the presented expression cassettes could be of significance and should be discussed.

      We now have included this discussion in our responses above.

      Reviewer #1 (Recommendations For The Authors):

      (1) The Figure 1a mV cartoon traces for chloride are confusing. The chloride currents are depolarizing, not hyperpolarizing. As noted by the authors, these channels largely generate AP blockade through shunting inhibition (division), not hyperpolarization (subtraction).

      The figure has been corrected.

      (2) Figure 2A does not show where the light is applied. Why are some of the bars blue and some of them not filled?

      This has been corrected

      (3) Figure 2C1 does not show where the light is applied. There should be an inset to detail the blue-light-cessation-evoked AP. Also doesn't give the holding potential.

      The requested details are added.

      (4) Figure 2C2 inset is described as showing that "Light-induced currents with 470 nm illumination were initially outward but turned inward immediately following light offset." Is that correct? It looks to me like the current turns inward about half-way through the light pulse and then becomes even stronger after the light turns off. That is also consistent with the CC traces, which appear to show a transition toward depolarization during the light pulse before the AP initiation at light offset.

      Yes, the reviewer's observation is correct. There are blue light-induced outward and inward current peaks at the onset and offset of the light. Accordingly, we have modified the phrasing for Fig. 2C2.

      (5) Figure 3D1 shows that Zip(151V) has a peak current at 400nm, with a steady increase in current from red to blue, however, this is not the case in the summary data in 3D2. It's also not shown in Supplementary Figure 5B. What's going on?

      We apologize for the prior version of the figure associated with the first submission. The example traces from 400nm -> 650 nm were incorrectly included in the figure whereas the 650nm -> 400 nm example traces should be included. This has been corrected.

      (6) Figure 3D1 has no time scale.

      It is now been included

      (7) Figure 3E1 should read "Transduced" and not "Transfected"

      This has been corrected.

      (8) IvfChr fidelity drops off dramatically at 20hz...down to 50% efficiency of generating APs. This is described in the legend as "high frequency". Maybe the cart came before the horse in this figure...as it looks like in panel C that using less light power density improves fidelity in the dual opsin configuration with red light stimulation...why not use that power for the characterization? Did you try any higher frequencies? Or longer pulse widths? This is an important characterization to inform further use of the tool. This shortcoming isn't a cell-intrinsic limitation, as the 470nm stim with IVfChr was 100% successful at both 10hz and 20hz.

      It is known that red but not blue light pulses induce desensitization (optical fatigue) in red-shifted ChR variants. Indeed, one can reinstate the response to red light, by giving violet-blue light pulses (Fig 4. Suppl 2). We think this is the reason that the 470nm stimulation was more effective in inducing AP in cells expressing IvfChR. Higher light intensities induce greater desensitization, but are preferred for faster opening of channels and depolarization of neurons. This can explain why, in some situations, lower light intensities were more effective in producing APs when pulse trains were used. We have recordings from cells firing APs at 40Hz (not included). All these cells had high expression levels of the opsin.   

      (9) Figure 4D: why use 100ms pulse width? How do you know that this isn't causing depol block? Or some of the nefarious concerns that are raised in the discussion, such as "...disrupt[ion of] normal neuronal physiology and signal processing that occurs in millisecond time scale"?

      We used 100ms pulse duration to follow the published protocol that this experiment is based on (Lin et al., 2013, Nature Neuroscience). 

      (10) Figure 4E-bottom: What is the blue peak at light onset? Is the tool driving early activation before silencing?

      There seems to be an early, sharp and brief activation by blue light. We don’t know the definite cause of this, but we speculate this is driven by blue-light activation of ZipACR and not the IvfChr portion of the construct. The reason is that such a sharp rise is absent when only IvfChr is expressed (Fig. 4E, upper panel). Soma-targeted motif tethered to channelrhodopsins is known to result in preferential expression of channels close to soma but does not exclude the expression of channelrhodopsin in axonal and dendritic compartments, especially when animals are allow to recover for long period of time after viral injection. We believe that ZipACR at axonal terminals where the chloride concentration is high can still cause blue-light evoked depolarization and transmitter release. We observed this phenomenon in two mice in their first trial. The data for individual trials for each mouse are included in a supplementary table.

      (11) Figure 4G: Earlier in this same figure (B2, C), 470nm light was more effective at stimulating IvfChr than 635nm light. Is it unexpected that 638nm light would in this in vivo context be more effective at driving IvfChr responses than 450 nm light (at least as reflected by the AUC measurements)? Does this reflect fiber placement and light penetration/scattering?

      The spectral peaks of Chrimson-based variants including vfChrimson are all centered around 600 nm, and at 635 / 638 nm light, the amplitudes of photo-response decline, the channel onset slows, and the channels suffer greater desensitization. In isolated preparations where the light penetration is similar between 635 / 638 nm and 470 nm, 470 nm responses can outperform 635 / 638 nm responses due to its lack of desensitization and higher consistency in its response. This is also a strong reason that we have developed our current approach. In in vivo preparation shown in Fig. 4D-G, the much higher tissue penetration of 638nm light due to reduced absorption and reduced scattering can offset the performance of IvfChr to 450 nm light.  

      (12) In the methods, it is noted that different viral batches appear to generate different levels of neuronal toxicity. If that is the case, how did you differentiate between true differences between constructs vs. differential cell health effects?

      For figure 4D-F (whisker movement), we determined virus toxicity using NeuN staining. In slice recordings, we used the electrophysiological property of the neurons to assess their health. For this manuscript, we had one batch of virus that produced toxicity. We did not include any data from this batch.

      Reviewer #2 (Recommendations For The Authors):

      ● Define AUC on first use.

      It is now defined.

      ● Figure 3C2: Please explain how the photocurrents were normalized. As presented, it looks like under strong orange light, the ZipACR has higher photocurrent than the ivfChr.

      This is due to the fact vfChrimson and other Chrimson-based variants do not fully recover in the dark after 590 nm stimulation. We tested IvfChrimson with both reconditioning light pulse of 405 nm and without 405 nm and we can consistently reach a greater ‘maximal’ response from the same cell after 405 nm reconditioning (see Fig. 4 Suppl 2). We therefore normalize the response to the maximal recorded response of the cell often achieved with 10 or 20 mW/mm<sup>2</sup> 590 nm stimulation after 405 nm reconditioning. We understand this can be confusing and have now replaced the light-intensity response in Fig. 3C2 with the one with 405 nm reconditioning which is easier to interpret for the readers.

      ● P. 3: "As expected, blue light pulses induce transient membrane suppression..." Unclear what "suppression" means. Shunting? Hyperpolarization?

      We rephrased this to “As expected, blue light pulses transiently suppress APs…”

      ● P. 3: "illumination at 470 nm and 590 nm wavelengths led to similar amounts of courtship song (110.1 {plus minus} 12.8 and 78.5 {plus minus} 11.6,n = 16-17, respectively)". What are the units of "courtship song"?

      The unit for courtship song is the number of pulses per 10 seconds. This has been clarified in the figure.

      ● P. 5: The quantification of photocurrent in terms of pA/pF/A.U. is non-standard. I understand the impetus to normalize by expression to give something proportional to per-molecule conductance, but a user cares about overall photocurrent. Please also give the real photocurrents, either pA or pA/pF.

      We have provided the real photocurrent in pA or pA/pF where scientifically appropriate. To avoid selection and experimenter’s bias in our data, we did not set criteria for data elimination for cells with specific fluorescence intensity or photocurrent amplitude. Some resulting response can range from vary up to 20 folds from the same construct in many experiments. We do not believe that averaging absolute photocurrent amplitude would be justified due to the imbalance of weighing in the results. We do acknowledge that not selecting or eliminating data points would introduce higher noise in recordings with smaller responses but this is preferable over the selection or experimenter bias that is likely to be introduced otherwise.

      ● Please quote illumination intensities wherever possible.

      ● P. 7: why was the red light crosstalk into Zip(151T) tested at 635 nm instead of 590 nm? Isn't the relevant parameter 590 nm, since that will be used for the excitatory opsin?

      In all our characterizations of the constructs using slice electrophysiology recordings, we used 635nm instead of 590nm. The reason is that compared to 590nm wavelength, at 635nm the photocurrent for Zip(151T) and Zip(151V) is significantly reduced (Fig. 3D1,D2).

      ● P. 10: "we examined the power at which responses to 470 nm and 635 nm lights induce APs in neurons expressing ZipT-IvfChr, ZipV-IvfChr, or IvfChr", but the preceding sentence says you didn't test the ZipT-IvfChr. This is confusing, please clarify.

      The previous paragraph refers to the photocurrent recordings in HEK293 cells where our fast LED based illumination system is limited to 590 nm light, whereas the subsequent paragraph refers to the brain slice neuronal recordings. We have now emphasized the difference of the experiments in the rewrite.

      ● Fig. 4B1, top: Why don't the blue traces return to the same baseline after the stimulus epochs?

      We observed this shift in baseline (~4mV more depolarized) in cells expressing IvfChR (or vfChR) only with blue light stimulation. This was observed in the neurons recorded in the CA1 as well (data not shown). There was no such a change following red light stimulation (Fig. 4B1). Therefore, this should not affect the applicability of our construct. The original paper introducing vfChR did not test the responses of their constructs to blue light. There could be another photocycle state that is activated stronger by 470nm than 590nm and it has a slow off-rate, but this is only a speculation from our side. It must be noted we did not observe such a phenomenon in cells expressing ChrimsonR (Fig. 1 Suppl 1C).

      ● Fig. S3B, right: The two colors are barely distinguishable on the graph. Consider more distinct colors and/or different symbols.

      It has been changed accordingly.

      ● P. 15: "However, we do not recommend the use of orange light pulses, as we observed a significant photocurrent in this wavelength." Not clear what this is referring to. Which construct? Under which circumstances shouldn't one use orange light pulses? Where's the data showing this?

      This is referring to Fig. 3D1,D2 and Figure 4 suppl Fig. 2 which show a normalized ~40-50% photocurrent at 590nm. Now in the text, the reference figures for the data are added.

    1. eLife Assessment

      This study presents valuable findings on the relative cerebral blood volume of non-human primates that move us closer to uncovering the functional and architectonic principles that govern the interplay between neuronal and vascular networks. The evidence of areal variations is solid, but that of vessel counting and laminar analysis is incomplete. The lack of a direct comparison of their approach against better-established MRI-based methods for measuring hemodynamics and vascular structure weakens the evidence provided in the current paper version. The work will be of interest to NHP imaging scientists.

    2. Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP, and compared those with various other metrics.

      Strengths:

      Noninvasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels is not fully validated, especially on a laminar scale. Further specific comments follow.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR2*). Although this approach is suitable for areal comparisons, its application at a laminar scale poses challenges due to significant contributions from large vessels including pial vessels. The primary concern is whether large-vessel contributions can be removed from the measured deltaR2* through processing techniques.

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. The reported number of penetrating vessels is only applicable to the experimental and processing conditions used in this study, which cannot be generalized.

      (3) Baseline R2* is sensitive to baseline R2, vascular volume, iron content, and susceptibility gradients. Additionally, it is sensitive to imaging parameters; higher spatial resolution tends to result in lower R2* values (closer to the R2 value). Thus, it is difficult to correlate baseline R2* with physiological parameters.

      (4) CBV-weighted deltaR2* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR2* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR2* can provide insights into other metrics in diseased or abnormal brain states.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based, measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with known distribution of different types of neurons, markers of metabolic load and others. While the presented methodology captures and estimated 30% of the vasculature, the authors corroborated previous findings regarding lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to captures the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work, begins by characterizing the spacing of penetrating arteries and ascending veins using vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 is computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

      Comments on revisions:

      I appreciate the effort made to improve the manuscript. That said, the direct validation of the underlying assumption about spatial resolution sampling remains unaddressed in the final version of this manuscript. With the only intention to further strengthen the methodology presented here, I would encourage again the authors to seek a direct validation of this assumption for other brain areas.

      In their reply, the authors stated "... line scanning or single-plane sequences, at least on first impression, seem inadequate for whole-brain coverage and cortical surface mapping. ". This seems to emanate for a misunderstanding as the method could be used to validate the mapping, not to map per-se.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP and compared those with various other metrics.

      Strengths:

      Non-invasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels, as well as orientation dependency, is not fully validated, especially on a laminar scale. Further specific comments follow.

      We suspect that the comment regarding the lack of validation on laminar level stems from an error made by the corresponding author in the original bioRxiv submission (v1, May 17th https://www.biorxiv.org/content/10.1101/2024.05.16.594068v1?versioned=true), where Figure 3 which contains laminar validation was lost during pdf conversion. After submitting to E-Life, this mistake was quickly identified, and a corrected manuscript was re-uploaded to the bioRxiv (v2, June 5th, https://doi.org/10.1101/2024.05.16.594068). Although we informed the eLife staff about the update, it appears that the revised manuscript may not have reached reviewer #1 in time. We sincerely apologize for any confusion or inconvenience this may have caused.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR2*). Although this approach is suitable for areal comparisons, its application on a laminar scale has not been validated in the literature or in this study. By comparing with histological vascular information of V1, the authors attempted to validate their approach. However, the generalization of their method is questionable. The main issue is whether the large vessel contribution is minimized by processing approaches properly in various cortical areas (such as clusters 1-3 in Figure 5). It would be beneficial to compare deltaR2* with deltaR2 induced by contrast agents in a few selected slices, as deltaR2 is supposed to be sensitive to microvessels, not macrovessels. Please discuss this issue.

      The requested validation is presented in Figure 3F, which compares our deltaR2* measurements with previously invasive estimates of large vessel, capillary and cytochrome oxidase (CO) levels in V1 (Weber et al., 2008; doi.org/10.1093/cercor/bhm259). Our deltaR2* values show a stronger correspondence with microvascularity and CO levels than large vessels. Moreover, Figure 3D illustrates relative differences between V1 and V2, which closely align with the relative vascular volume differences reported by Zheng et al., 1991. It is important to note that Weber and colleagues averaged across V2-V5 due to similar vascularity across these areas. In our material, we also observed similar vascularity in these areas, though V5 (e.g., MT) has slightly denser vascularity, in agreement with reports of CO staining.

      Additionally, we report similar GM/WM vascular density, and high vascular density in primary sensory areas. Unfortunately, available ground-truth data on vascularity does not provide further (general) validation data for laminar vasculature in macaques (such as those in cluster 1-3; Fig. 5). That said, we have provided substantial evidence linking whole-brain vascular measures with variations in neuron (for data distribution, see Supp. Fig. 6F) and receptor densities, which we believe provides strong support for our approach.

      We would like to clarify that the authors do not assert that gradient-echo MRI is exclusively sensitive to microvessels and not macrovessels. This is not stated anywhere in the manuscript. If any sentence appears misleading, please let us know, and we will consider revising it. It is well-established that large vessels contribute to ΔR2* (Ogawa et al., 1993; Boxerman et al., 1995), and this is clearly stated in the manuscript (introduction, methods, results and discussion) and demonstrated in Figures 2A, B, and Supp. Figs. 2, 3, and 4. The primary concern, as the reviewer also noted, is whether we have sufficiently minimized the contribution of large vessels in our parcellated data analysis.

      At the parcellated level, we used the median value to avoid skewness in the data distribution, which primarily arises from large vessels, as regions near these vessels exhibit higher ΔR2*. The skewness of ΔR2* is also visible in Figure 1F, G. While this approach mitigates this large-small vessel issue, it does not entirely resolve it, as a slight linear increase toward the cortical surface remains (in all parcels). This is likely due to our inability to delineate all penetrating vessels as shown in Figure 2E and because contrast agents cumulatively accumulate toward superficial layers where blood originates and returns to the pial surface. To mitigate this issue, we detrended across layers the parcellated profiles, obtaining results similar to the ground-truth measures of vascularity in V1-V5 and CO histology in V1.

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels, which is considered one of the major advancements in this study. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. There was no detailed description of the detection criteria. More importantly, the number of observable penetrating vessels is dependent on imaging parameters and the dose of the contrast agent. If imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would likely improve the detection of penetrating vessels. Using higher-field MRI would further enhance the detection of penetrating vessels. Therefore, the reported value is only applicable to the experimental and processing conditions used in this study. Detailed selection criteria should be mentioned, and all potential pitfalls should be discussed.

      We believe that Figure 2 represents a significant conceptual and data analysis advancement in the field of vascular imaging. To the best of our knowledge, this is the first MRI study attempting to assess vessel density across cortical layers and compare the number of vessels to the known ground-truth. While we do not claim to have achieved a perfect solution (as shown in Figure 2), we offer a robust challenge to the imaging community by introducing this novel benchmarking approach. Our hope is that this conceptual framework will inspire the MR imaging community to tackle this challenge.

      Regarding imaging parameters, TE did not have much effect on our results, with a slight effect observed in the superficial layers due to the presence of large pial vessels (blooming effect; Fig. 2C). This also suggests that similar results could be achieved by changing the contrast agent dose, though there are, of course, CNR requirements and limitations at either end of the spectrum.

      We completely agree with the reviewer that spatial resolution is critical in resolving the arterio-venous networks, and we have dedicated significant attention to this topic in the introduction, results and discussion sections. We also agree with the reviewer that if imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would improve the detection of vessels. However, while this approach is ideal for counting vessels in a single plane and isolated region of cortex, it is less suited to the surface mapping of vessels, which is the focus of our study.

      Regarding the exclusion of vessels, based on visual comparison of vessels in volume space, Frangi-filter detection of vessels in volume space, and surface detection of vessels, we found no evidence to develop additional exclusion criteria (Supp. Fig. 3). On the contrary, we identified a number of false negatives in both the surface maps and volume maps. Notable exceptions to this rule seemed to occur at premotor areas F2 and F3 (Matelli et al., 1984; Patterns of cytochrome oxidase activity in the frontal agranular cortex of the macaque monkey). In these regions, we observed peculiar “pockets” of signal drop-out in equivolumetric layers 4-5. It is unclear what these signal-voids represent but it is interesting to note that these cortical areas F1-F5 were originally delineated by distinct CO+ positive large cells (Matelli et al., 1984).

      (3) Attempts to obtain pial vascular structures were made (Figure 2). As mentioned in this manuscript, the blooming effect of susceptibility contrasts is problematic. In the MRI community, T1-based Gd contrast agents have been used for mapping large vasculature, which is a better approach for obtaining pial vascular structures. Alternatively, computer tomography with a blood contrast agent can be used for mapping blood vasculature noninvasively. This issue should be discussed.

      We agree with the reviewer that T1-based contrast agents may offer more precise direct localization of large vessels in pial vasculature. However, the primary focus of our study was not on visualizing pial vascular structures, but rather on measuring vascular volume across cortical layers. For this purpose, we opted to use ferumoxytol, which provides superior T2*-contrast and about ten times longer plasma half-life compared to gadolinium. While we anticipated artifacts from the pial network, we developed a novel method to indirectly map these long-distance susceptibility artifacts arising from large vessels onto the cortical surface (Fig. 2A). If the goal would be to specifically visualize pial vessels, we applaud the high-resolution TOF angiography developed for direct vessel visualization (Bollman et al., 2022; https://doi.org/10.7554/eLife.71186)

      Changes in text:

      “4.1 Methodological considerations - vessel density informed MRI

      While the pial vessels can be directly visualized using high-resolution time-of-flight MRI (Bollmann et al., 2022), and computed tomography (Starosolski et al., 2015), imaging of the dense vascularity within the large and highly convoluted primate gray matter presents other formidable challenges. Here, we used a combination of ferumoxytol contrast agent and cortical layer resolution 3D gradient-echo MRI to map cerebrovascular architecture in macaque monkeys. These methods allowed us to indirectly delineate large vessels and indirectly estimate translaminar variations in cortical microvasculature.”

      (4) Since baseline R2* is related to baseline R2, vascular volume, iron content, and susceptibility gradients, it is difficult to correlate it with physiological parameters. Baseline R2* is also sensitive to imaging parameters; higher spatial resolution tends to result in lower R2* values (closer to the R2 value). Therefore, baseline R2* findings need to be emphasized.

      We agree with the reviewer's comment on the complexity of correlating baseline R2* with vasculature, given its sensitivity to multiple factors such as venous oxygenation, iron content, and imaging parameters such as image resolution. While our study focuses on vascular measurements, one could also highlight iron’s role in brain energy metabolism. Deoxygenated blood affects R2*, iron in oligodendrocytes supports myelination and neuronal signaling, and iron’s role in cytochrome c oxidase during electron transport impacts mitochondrial energy production. These metabolic factors collectively affect baseline R2* and link it to vasculature. Though quantitative susceptibility mapping (QSM) could help differentiate these different factors, it is beyond the scope of this study.

      (5) CBV-weighted deltaR2* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR2* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR2* can provide insights into other metrics in diseased or abnormal brain states. If this is the case, then high-resolution deltaR2* will be useful. Please comment on this possibility.

      We agree with the reviewer that correlation deltaR2* with other metrics, such as myelin and cortical thickness, receptors and interneuron types, remains exploratory. Establishing causal relationships requires advanced multivariate analysis across cortical layers, but mapping histological stains to cortical layers is still under development. While this exploratory approach is promising, the ability to apply these insights to diseased or abnormal brain states is not yet clear. Layer-specific analysis of vasculature and function in disease is a future goal, and ongoing work aims to expand this line of inquiry. For now, while high-resolution deltaR2* may indeed offer diagnostic potential, we prefer to refrain from overstating its clinical utility at this stage. We agree that multimodal studies integrating neuroanatomy, function, and vascular metrics will be valuable for deeper insights into brain abnormalities.

      Changes in text:

      “4.3 The vascular network architecture is intricately connected to the neuroanatomical organization within cerebral cortex

      …To comprehensively understand the factors contributing to the vascular organization of the brain, experimental disentanglement through multivariate analysis of laminar cell types and receptor densities is needed (Hayashi et al., 2021, Froudist-Walsh et al., 2023).”

      (6) There is no discussion about the deltaR2* difference across subcortical areas (Figure 1). This finding is intriguing and warrants a thorough discussion in the context of the cortical findings.

      We thank the reviewer for this comment. We have expanded discussion on subcortical structures:

      Section 4.3, 1st paragraph:

      “In the cerebral cortex, neurons account for a significant portion (≈80-90%) of energy demand, with most of this energy allocated to signaling (≈80%) and maintaining membrane resting potentials (≈20%) (Attwell and Laughlin, 2001; Howarth et al., 2012). Since firing frequency is modulatory and the neural networks utilize distributed coding, the maintenance of resting-state membrane potential determines the minimal energy budget and the lower-limit for cerebral perfusion. Based on neuronal variability and energy dedicated to maintaining surface potential, this suggest an approximate (4 × 20% ≈) 80% variation in CBF and a resultant 25% variation in CBV across the cortex, in line with Grubbs' law (CBV = 0.80 × CBF0.38) (Grubb et al., 1974). In the cerebellar cortex, neuron density is higher, and the resting potentials are thought to account for more than 50% of energy usage (Howarth et al., 2012), aligning with its higher vascular volume compared to the cerebral cortex (Fig. 1F). However, this is a simplified estimation, and a more comprehensive assessment would need to account for consider an aggregate of biophysical factors such as…”

      Section 4.3, 4th paragraph:

      “When viewed in terms of information flow, CBV appear to decrease along the canonical circuit pathway (e.g., L4→L2/3→L5) in the primary visual cortex (Douglas and Martin, 2007) and as one ascends the hierarchy (e.g., V1→V2→V3&4→MT→7A) from primary sensory areas (Fig. 3F, Supp. Fig. 8) (Felleman and Van Essen et al., 1991, Markov et al., 2014). A similar pattern is observed in the auditory hierarchy, where the inferior colliculus, an early processing hub, exhibits the highest vascular volume, followed by a gradual reduction along cortical auditory ‘where’ and ‘what’ pathways (Fig. 1F, Fig. 3B).”

      (7) Figure 3 is missing. Several statements in the manuscript require statistics (e.g., bimodality in Figure 2D, Figure 3F).

      We apologize to the reviewer for the absence of Figure 3 in the initial submission.

      As for statistical testing of bimodality, we respectfully disagree and feel that this would not add much value to the manuscript. We think a descriptive, rather than rigorous, approach is sufficient in this context.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent, and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with the known distribution of different types of neurons, markers of metabolic load, and others. While the presented methodology captures an estimated 30% of the vasculature, the authors corroborated previous findings regarding the lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non-invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to capture the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work begins by characterizing the spacing of penetrating arteries and ascending veins using a vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 are computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - For broader readership, it would be beneficial to provide a guide on how to interpret baseline R2* versus ΔR2*.

      The text was edited as follows:

      “…For quantitative assessment, R<sub>2</sub>* values were estimated from multi-echo gradient-echo images acquired both before and after the administration of ferumoxytol contrast agent (Table 1). Subsequently, the baseline R<sub>2</sub>* and ΔR<sub>2</sub>*, an indirect proxy measure of CBV (Boxerman et al., 1995), volume maps for each subject were mapped onto the twelve native equivolumetric layers (ELs) (Fig. 1C). Each vertex was then corrected for normal of the cortex relative to B<sub>0</sub> direction (Supp. Fig. 1). Surface maps for each subject were registered onto a Mac25Rhesus average surface using cortical curvature landmarks and then averaged across the subjects (Fig. 1D, E). Around cortical midthickness, the distribution of R<sub>2</sub>*, an aggregate measure for ferritin-bound iron, myelin content and venous oxygenation levels (Langkammer et al., 2012), resembled the spatial pattern of ΔR<sub>2</sub>* vascular volume. However, across cortical layers, these measures exhibited reversed patterns: R<sub>2</sub>* increased toward the white matter surface, whereas ΔR<sub>2</sub> decreased (Fig. 1E, G).”

      - The legends in Figure 1 describe green/cyan arrows, which are not visible in the figure itself.

      We thank the reviewer for noting this discrepancy. The reference to green/cyan arrows was removed from the Figure 1 legend.

      - There are typos in Section 3.3: "(Figure 4A, E)" and "(cluster 3; Figure 3)" should be corrected to Figure 5.

      We thank the reviewer for noting this error. The references to the Figures were corrected.

      Reviewer #2 (Recommendations for the authors):

      The work is elegantly presented and very easy to follow. The figures and the data presented there are compelling and well-organized. I have enjoyed reading the paper, despite my disagreement with the validity of the methodology presented.

      Validation against MRA methods (high resolution needed here, Bolan et al 2006, cited also by the authors). Certainly, that work used a much higher magnetic field. This could be done through collaboration if such a magnet is not available. In my humble opinion, the current arguments provided in the paper as validation fall short in convincing future readers. Other TOF approaches might be better suited (in combination with line scanning or single plane sequences) for the 3T used in this work.

      We appreciate the reviewer’s suggestion regarding time-of-flight (TOF) angiography at ultra-high magnetic fields, such as 9.4T for improved visualization of fast-flowing blood in arterial vessels, as elegantly demonstrated in Bolan et al., 2006. However, our focus was on mapping vasculature across cortical layers and TOF is not optimal for imaging slow capillary blood inflow. To enhance CNR also at capillary level, we used ferumoxytol-contrast agent to create quantitative CBV-weighted cortical layer maps (Boxerman et al., 1995).

      We are open to collaborative opportunities to revisit this work using ultra-high magnetic field strengths and more detailed neuroanatomical ground-truth measures. However, the recommended line scanning or single-plane sequences, at least on first impression, seem inadequate for whole-brain coverage and cortical surface mapping.

      Some of the methodology can be made more accessible to non-MRI readers. For example, a more elaborate explanation of R2* and ΔR2 could benefit future readers.

      Elaborated as requested (see above reply).

      A more detailed discussion of the limitations of the methodology could also be beneficial here. Explain the potential implications of under-sampling denser vascular areas (i.e. with potentially more than 7 penetrating vessels per mm2).

      V1, with its highest neuronal density, likely also has the highest feeding/draining vessel density. Based on this, we hypothesized that a 0.23 mm isotropic image resolution would sufficiently capture cortical arterio-venous networks, but we did not achieve the expected detection of 7 penetrating vessels per mm<sup>2</sup>. Consequently, we refrained from quantifying vessel density in other areas, albeit we did report the total vessel count.

      This under-sampling likely biases our ΔR2* estimates, skewing them toward larger vessels. To address this, we used median parcel values to avoid over-representing large vessels (the long-tail in ΔR2 parcels data distribution represents large vessels) and corrected for the cortical surface bias where blood originates from and returns to the pial network. These steps helped mitigate large vessel bias as described in the methods, results and discussion (see also our response to Reviewer #1, question #1).

      To improve clarity for readers, we further clarified:

      Methods:

      “The effect of blood accumulation in large feeding arteries and draining veins toward in the superficial layers was estimated using linear model and regressed out from the parcellated ΔR<sub>2</sub>* maps.”

      Results:

      “To mitigate bias resulting from undersampling the large-caliber vessels (Fig. 2A, B), median parcel values were obtained and M132 parcellated ΔR2* profiles were then detrended across ELs in each subject and then averaged.”

      Discussion:

      “This methodology, however, has known limitations. First, gradient-echo imaging is more sensitized toward large pial vessels running along the cortical surface and large penetrating vessels, which could differentially bias the estimation of Δ R<sub>2</sub>* across cortical layers (Fig. 2A, 2B) (Boxermann et al., 1995; Zhao et al., 2006). Additionally, vessel orientation relative to the B<sub>0</sub> direction introduce strong layer-specific biases in quantitative ΔR<sub>2</sub>* measurements (Supp. Fig. 1C) (Ogawa et al., 1993; Viessmann et al., 2019; Lauwers et al., 2008). To address these concerns, we conducted necessary corrections for B<sub>0</sub>-orientation, obtained parcel median values and regressed linear-trend thereby mitigating the effect of undersampling large-caliber vessels across ELs (Fig. 2C, Supp. Fig. 1).” 

      Please note, we are currently unable to create BALSA links to the figures due to maintenance issues at the data repository. As a result, we have opted to remove the links:

    1. eLife Assessment

      This important work addresses the relationship between the transdiagnostic compulsivity dimension and confidence as well as confidence-related behaviours like reminder setting. The relationship between confidence and compulsive disorders has recently received a lot of attention and has been considered to be a key cognitive change. The authors paired an elegant experimental design and pre-registration to give convincing evidence of the relationship between compulsivity, reminder setting, and confidence. In the revised version they thoroughly addressed the reviewers comments, in particular adding new analyses clarifying how their findings relate to prediction error based learning further strengthening the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      Boldt et al test several possible relationships between trandiagnostically-defined compulsivity and cognitive offloading in a large online sample. To do so, they develop a new and useful cognitive task to jointly estimate biases in confidence and reminder-setting. In doing so, they find that over-confidence is related to less utilization of reminder-setting, which partially mediates the negative relationship between compulsivity and lower reminder-setting. The paper thus establishes that, contrary to the over-use of checking behaviors in patients with OCD, greater levels of transdiagnostically-defined compulsivity predicts less deployment of cognitive offloading. The authors offer speculative reasons as to why (perhaps it's perfectionism in less clinically-severe presentations that lowers the cost of expending memory resources), and sets an agenda to understand the divergence in cognitive between clinical and nonclinical samples. Because only a partial mediation had robust evidence, multiple effects may be at play, whereby compulsivity impacts cognitive offloading via overconfidence and also by other causal pathways.

      Strengths:

      The study develops an easy-to-implement task to jointly measure confidence and replicates several major findings on confidence and cognitive offloading. The study uses a useful measure of cognitive offloading - the tendency to set reminders to augment accuracy in the presence of experimentally manipulated costs. Moreover, the utilizes multiple measures of presumed biases -- overall tendency to set reminders, the empirically estimated indifference point at which people engage reminders, and a bias measure that compares optimal indifference points to engage reminders relative to the empirically observed indifference points. That the study observes convergenence along all these measures strengthens the inferences made relating compulsivity to the under-use of reminder-setting. Lastly, the study does find evidence for one of several a priori hypotheses and sets a compelling agenda to try to explain why such a finding diverges from an ostensible opposing finding in clinical OCD samples and the over-use of cognitive offloading.

      Weaknesses:

      Although I think this design and study are very helpful for the field, I felt that a feature of the design might reduce the tasks's sensitivity to measuring dispositional tendencies to engage cognitive offloading. In particular, the design introduces prediction errors, that could induce learning and interfere with natural tendencies to deploy reminder-setting behavior. These PEs comprise whether a given selected strategy will be or not be allowed to be engaged. We know individuals with compulsivity can learn even when instructed not to learn (e.g., Sharp, Dolan and Eldar, 2021, Psychological Medicine), and that more generally, they have trouble with structure knowledge (eg Seow et al; Fradkin et al), and thus might be sensitive to these PEs. Thus, a dispositional tendency to set reminders might be differentially impacted for those with compulsivity after an NPE, where they want to set a reminder, but aren't allowed to. After such an NPE, they may avoid moreso the tendency to set reminders. Those with compulsivity likely have superstitious beliefs about how checking behaviors lead to a resolution of catastrophes, that might in part originate from inferring structure in the presence of noise or from purely irrelevant sources of information for a given decision problem.<br /> It would be good to know if such learning effects exist, if they're modulated by PE (you can imagine PEs are higher if you are more incentivized - e.g., 9 points as opposed to only 3 points - to use reminders, and you are told you cannot use them), and if this learning effect confounds the relationship between compulsivity and reminder-setting.

      A more subtle point, I think this study can be more said to be an exploration than a deductive of test of a particular model -> hypothesis -> experiment. Typically, when we test a hypothesis, we contrast it with competing models. Here, the tests were two-sided because multiple models, with mutually exclusive predictions (over-use or under-use of reminders) were tested. Moreover, it's unclear exactly how to make sense of what is called the direct mechanism, which is supported by the partial (as opposed to complete) mediation.

    3. Reviewer #2 (Public review):

      Summary:

      Boldt et al., investigated whether previously established relationships between transdiagnostic psychiatric symptom dimensions and confidence distortions would result in downstream influences on the confidence-related behaviour of reminder setting. 600 individuals from the general population completed a battery of psychiatric symptom questionnaires and an online reminder-setting task. In line with previous studies, individuals high in compulsivity (CIT) showed over-confidence in their task performance, whereas individuals high in anxious-depression (AD) tended to be under-confident. Crucially, the over-confidence associated with CIT partially mediated a decreased tendency to use external reminders during task performance, whereas the under-confidence associated with AD did not result in any alteration in external reminder setting. The authors suggest that metacognitive monitoring is impaired in CIT which has a knock-on effect on reminder setting behaviour, but that a direct link also exists between CIT and reduced reminder setting independently of confidence.

      Strengths:

      The study combines the latest advances in transdiagnostic approaches to psychopathology with a cleverly designed external reminder-setting task. The approach allows for investigation of what some of the downstream consequences associated with impaired metacognition in sub-clinical psychopathology may be.

      The experimental design and hypotheses were pre-registered prior to data collection.

      The manuscript is well written and rigorous analysis approaches are used throughout.

      Weaknesses:

      Participants only performed a single task so it remains unclear if the observed effects would generalise to reminder setting in other cognitive domains.

      The sample consisted of participants recruited from the general population. Future studies should investigate whether the effects observed extend to individuals with the highest levels of symptoms (including clinical samples).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) You claim transdiagnostic phenotypes are temporally stable -- since they're relatively new constructs, do we know how stable? In what order?  

      This is an important question. We have added two recent references to support this claim on page 1 and cite these studies in the references on pages 25 and 28:

      “Using factor analysis, temporally stable (see Fox et al., 2023a; Sookud, Martin, Gillan, & Wise, 2024), transdiagnostic phenotypes can be extracted from extensive symptom datasets (Wise, Robinson, & Gillan, 2023).”

      Fox, C. A., McDonogh, A., Donegan, K. R., Teckentrup, V., Crossen, R. J., Hanlon, A. K., … Gillan, C. M. (2024). Reliable, rapid, and remote measurement of metacognitive bias. Scientific Reports, 14(1), 14941. https://doi.org/10.1038/s41598-024-64900-0

      Sookud, S., Martin, I., Gillan, C., & Wise, T. (2024, September 5). Impaired goal-directed planning in transdiagnostic compulsivity is explained by uncertainty about learned task structure. https://doi.org/10.31234/osf.io/zp6vk

      More specifically, Sookud and colleagues found the intraclass correlation coefficient (ICC) for both factors to be high after a 3- or 12 month period (ICC<sub>AD_3</sub> = 0.87; ICC<sub>AD_12</sub> = 0.87; ICC<sub>CIT_3</sub> = 0.81; ICC<sub>CIT_3</sub>= 0.76; see Tables S41 and S50 in Sookud et al., 2024).

      (2) On hypotheses of the study: 

      I didn't understand the logic behind the hypothesis relating TDx Compulsivity -> Metacognition > Reminder-setting

      It seems that (a) Compulsivity relates to overconfidence which should predict less remindersetting

      Compulsivity has an impaired link between metacognition and action, breaking the B->C link in the mediation described above in (a). What would this then imply about how Compulsivity is related to reminder-setting?

      "In the context of our study, a Metacognitive Control Mechanism would be reflected in a disrupted relationship between confidence levels and their tendency to set reminders."  What exactly does this predict - a lack of a correlation between confidence and remindersetting, specifically in high-compulsive subjects?

      Lastly, there could be a direct link between compulsivity and reminder-usage, independent of any metacognitive influence. We refer to this as the Direct Mechanism  Why though theoretically would this be the case? 

      "We initially hypothesised to find support for the Metacognitive Control Mechanism and that highly compulsive individuals would offload more". 

      The latter part here, "highly compulsive individuals would offload more" is I think the exact opposite prediction of the Metacognitive control mechanism hypothesis (compulsive individuals offload less). How could you possibly have tried to find support, then, for both? 

      Is the hypothesis that compulsivity positively predicts reminder setting the "direct mechanism" - if so, please clarify that, and if not, it should be added as a distinct mechanism, and additionally, the direct mechanism should be specified. 

      There's more delineation of specific hypotheses (8 with caveats) in Methods. 

      "We furthermore also tested this hypothesis but predicted raw confidence (percentage of circles participants predicted they would remember; H6b and H8b respectively)," What is the reference of "this hypothesis" given that right before this sentence two hypotheses are mentioned?  To keep this all organized, it would be good to simply have a table with hypotheses listed clearly. 

      We agree with the reviewer that there is room to improve the clarity of how our hypotheses are presented. The confusion likely arises from the fact that, since we first planned and preregistered our study, several new pieces of work have emerged, which might have led us to question some of our initial hypotheses. We have taken great care to present the hypotheses as they were preregistered, while also considering the current state of the literature and organizing them in a logical flow to make them more digestible for the reader. We have clarified this point on page 4:

      “Back when we preregistered our hypotheses only a limited number of studies about confidence and transdiagnostic CIT were available. This resulted in us hypothesising to find support for the Metacognitive Control Mechanism and that highly compulsive individuals would offload more due to an increased need for checkpoints.”

      The biggest improvement we believe comes from our new Table 1, which we have included in the Methods section in response to the reviewer’s suggestion (pp. 21-22):

      “We preregistered 8 hypotheses (see Table 1), half of which were sanity checks (H1-H4) aimed to establish whether our task would generally lead to the same patterns as previous studies using a similar task (as reviewed in Gilbert et al., 2023).”

      We furthermore foreshadowed more explicitly how we would test the Metacognitive Control Mechanism in the Introduction section on page 4, as requested by the reviewer:

      “In the context of our study, a Metacognitive Control Mechanism would be reflected in a disrupted relationship between confidence levels and their tendency to set reminders (i.e., the interaction between the bias to be over- or underconfident and transdiagnostic CIT in a regression model predicting a bias to set reminders).”

      To avoid any confusion regarding the term ‘direct’ in the ‘Direct Mechanism’, we now explicitly clarify on page 4 that it refers to any non-metacognitive influences. Additionally, we had already emphasized in the Discussion section the need for future studies to specify these influences more directly.

      Page 4: “We refer to this as the Direct Mechanism and it constitutes any possible influences that affect reminder setting in highly-compulsive CIT participants outside of metacognitive mechanisms, such as perfectionism and the wish to control the task without external aids.”

      The reviewer was correct in pointing out that, in the Methods section, we incorrectly referred to ‘this hypothesis’ when we actually meant both of the previously mentioned hypotheses. We have corrected this on page 23:

      “We furthermore also tested these hypotheses but predicted raw confidence (percentage of circles participants predicted they would remember; H6b and H8b respectively), as well as extending the main model with the scores from the cognitive ability test (ICAR5) as an additional covariate (H6c and H8c respectively).”

      Finally, upon revisiting our Results section, we noticed that we had not made it sufficiently clear that hypothesis H6a was preregistered as non-directional. We have now clarified this on page 9:

      “We predicted that the metacognitive bias would correlate negatively with AD (Hypothesis 8a; more anxious-depressed individuals tend to be underconfident). For CIT, we preregistered a non-directional, significant link with metacognitive bias (Hypothesis H6a). We found support for both hypotheses, both for AD, β = -0.22, SE = 0.04, t = -5.00, p < 0.001, as well as CIT, β = 0.15, SE = 0.05, t = 3.30, p = 0.001, controlling for age, gender, and educational attainment (Figure 3; see also Table S1). Note that for CIT this effect was positive, more compulsive individuals tend to be overconfident.”

      (3) You say special circles are red, blue, or pink. Then, in the figure, the colors are cyan, orange, and magenta. These should be homogenized. 

      Apologies, this was not clear on our screens. We have corrected this now but used the labels “blue”, “orange” and “magenta” as our shade of blue is much darker than cyan:

      Page 16: “These circles flashed in a colour (blue, orange, or magenta) when they first appear on screen before fading to yellow.”

      (4) The task is not clearly described with respect to forced choice. From my understanding, "forced choice" was implicitly delivered by a "computer choosing for them". You should indicate in the graphic that this is what forced choice means in the graphic and description more clearly. 

      This is an excellent point. On pages 17 and 18 we now include a slightly changed Figure 6, which includes improved table row names and cell shading to indicate the choice people gave. Hopefully this clarifies what “forced choice” means.

      (5) If I have point (4) right, then a potential issue arises in your design. Namely, if a participant has a bias to use or not use reminders, they will experience more or less prediction errors during their forced choice. This kind of prediction error could introduce different mood impacts on subsequent performance, altering their accuracy. This will have an asymmetric effect on the different forced phases (ie forced reminders or not). For this reason, I think it would be worthwhile to run a version of the experiment, if feasible, where you simply remove choice prior to revealing the condition. For example, have a block of choices where people can "see how well you do with reminders" -- this removes expectation and PE effects. 

      [See also this point from the weaknesses listed in the public comments:]

      Although I think this design and study are very helpful for the field, I felt that a feature of the design might reduce the tasks's sensitivity to measuring dispositional tendencies to engage cognitive offloading. In particular, the design introduces prediction errors, that could induce learning and interfere with natural tendencies to deploy reminder-setting behavior. These PEs comprise whether a given selected strategy will be or not be allowed to be engaged. We know individuals with compulsivity can learn even when instructed not to learn (e.g., Sharp, Dolan, and Eldar, 2021, Psychological Medicine), and that more generally, they have trouble with structure knowledge (eg Seow et al; Fradkin et al), and thus might be sensitive to these PEs. Thus, a dispositional tendency to set reminders might be differentially impacted for those with compulsivity after an NPE, where they want to set a reminder, but aren't allowed to. After such an NPE, they may avoid more so the tendency to set reminders. Those with compulsivity likely have superstitious beliefs about how checking behaviors leads to a resolution of catastrophes, which might in part originate from inferring structure in the presence of noise or from purely irrelevant sources of information for a given decision problem. 

      It would be good to know if such learning effects exist if they're modulated by PE (you can imagine PEs are higher if you are more incentivized - e.g., 9 points as opposed to only 3 points - to use reminders, and you are told you cannot use them), and if this learning effect confounds the relationship between compulsivity and reminder-setting.

      We would like to thank the reviewer for providing this interesting perspective on our task. If we understand correctly, the situation most at risk for such effects occurs when participants choose to use a reminder. Not receiving a reminder in the following trial can be seen as a negative prediction error (PE), whereas receiving one would represent the control condition (zero PE). Therefore, we focused on these two conditions in our analysis.

      We indeed found that participants had a slightly higher tendency to choose reminders again after trials where they successfully requested them compared to after trials where they were not allowed reminders (difference = 4.4%). This effect was statistically significant, t(465) = 2.3, p = 0.024. However, it is important to note that other studies from our lab have reported a general, non-specific response ‘stickiness,’ where participants often simply repeat the same strategy in the next trial (Scarampi & Gilbert, 2020), which could have contributed to this pattern.

      When we used CIT to predict this effect in a simple linear regression model, we did not find a significant effect (β = -0.05, SE = 0.05, t = -1.13, p = 0.26).

      To further investigate this and potentially uncover an effect masked by the influence of the points participants could win in a given trial, we re-ran the model using a logistic mixed-effects regression model. This model predicted the upcoming trial’s choice (reminder or no reminder) from the presence of a negative prediction error in the current trial (dummy variable), the ztransformed number of points on offer, and the z-transformed CIT score (between-subject covariate), as well as the interaction of CIT and negative PE. In this model, we replicated the previous ‘stickiness’ effect, with a negative influence of a negative PE on the upcoming choice, β = -0.24, SE = 0.07, z = -3.44, p < 0.001. In other words, when a negative PE was encountered in the current trial, participants were less likely to choose reminders in the next trial. Additionally, there was a significant negative influence of points offered on the upcoming choice, β = -0.28, SE = 0.03, z = -8.82, p < 0.001. While this might seem counterintuitive, it could be due to a contrast effect: after being offered high rewards with reminders, participants might be deterred from using the reminder strategy in consecutive trials where lower rewards are likely to be offered, simply due to the bounded reward scale. CIT showed a small negative effect on upcoming reminder choice, β = -0.06, SE = 0.04, z = -1.69, p = 0.09, indicating that participants scoring higher on the CIT factor tended to be less likely to choose reminders, thus replicating one of the central findings of our study. It is unclear why this effect was not statistically significant, but this is likely due to the limited data on which the model was based (see below). Finally, and most importantly, the interaction between the current trial’s condition (negative PE or zero PE) and CIT was not significant, contrary to the reviewer’s hypothesis, β = 0.04, SE = 0.07, z = 0.57, p = 0.57.

      It should also be noted that this exploratory analysis is based on a limited number of data points: on average, participants had 2.5 trials (min = 0; max = 4) with a negative PE and 6.7 trials (min = 0; max = 12) with zero PE. There were more zero PE trials simply because to maximise the number of trials included in this analysis, each participant’s 8 choice-only trials were included and on those trials the participant always got what they requested (the trial then ended prematurely). Due to the fact that not all cells in the analysed design were filled, only 466 out of 600 participants could be included in the analysis. This may have caused the fit of the mixed model to be singular.

      In summary, given that these results are based on a limited number of data points, some models did not fit without issues, and no evidence was found to support the hypotheses, we suggest not including this exploratory analysis in the manuscript. However, if we have misunderstood the reviewer and should conduct a different analysis, we are happy to reconsider.

      Unfortunately, conducting an additional study without the forced-choice element is not feasible, as this would create imbalances in trial numbers for the design. The advantage of the current, condensed task is the result of several careful pilot studies that have optimized the task’s psychometric properties.

      Scarampi, C., & Gilbert, S. J. (2020). The effect of recent reminder setting on subsequent strategy and performance in a prospective memory task. Memory, 28(5), 677–691. https://doi.org/10.1080/09658211.2020.1764974

      (6) One can imagine that a process goes on in this task where a person must estimate their own efficacy in each condition. Thus, individuals with more forced-choice experience prior to choosing for themselves might have more informed choice. Presumably, this is handled by your large N and randomization, but could be worth looking into. 

      We would like to thank the reviewer for pointing this out, as we had not previously considered this aspect of our task. However, we believe it is not the experience with forced trials per se, but rather the frequency with which participants experience both strategies (reminder vs. no reminder), that could influence their ability to make more informed choices. To address this, we calculated the proportion of reminder trials during the first half of the task (excluding choiceonly trials, where the reminder strategy was not actually experienced). We hypothesized that the absolute distance of this ‘informedness’ parameter should correlate positively with the absolute reminder bias at the end of the task, with participants who experienced both conditions equally by the midpoint of the task being less biased towards or away from reminders. However, this was not the case, r = 0.05, p = 0.21.

      Given the lengthy and complex nature of our preregistered analysis, we prefer not to include this exploratory analysis in the manuscript.

      (7) Is the Actual indifference calculated from all choices? I believe so, given they don't know only till after their choice whether it's forced or not, but good to make this clear. 

      Indeed, we use all available choice data to calculate the AIP. We now make this clear in two places in the main text:

      Page 5: “The ‘actual indifference point’ was the point at which they were actually indifferent, based on all of their decisions.”

      Page 6: “Please note that all choices were used to calculate the AIP, as participants only found out whether or not they would use a reminder after the decision was made.”

      (8) Related to 7, I believe this implies that the objective and actual indifference points are not entirely independent, given the latter contains the former. 

      Yes, the OIP and AIP were indeed calculated in part from events that happened within the same trials. However, since these events are non-overlapping (e.g., the choice from trial 6 contributes to the AIP but the accuracy measured several seconds later from that trial contributes to the OIP) and since our design dictates whether or not reminders can be used on those trials in question (by randomly assigning them to the forced internal/forced external condition) this could not induce circularity.

      (9) I thought perfectionism might be a trait that could explain findings and it was nice to see convergence in thinking once I reached the conclusion. Along these lines, I was thinking that perhaps perfectionism has a curvilinear relationship with compulsivity (this is an intuition I'm not sure if it's backed up empirically). If it's really perfectionism, do you see that, at the extreme end of compulsivity, there's more reminder-setting? Ie did you try to model this relationship using a nonlinear function? You might clues simply by visual inspection. 

      It is interesting to note that the reviewer reached a similar interpretation of our results. We considered this question during our analysis and conducted an additional exploratory analysis to examine how CIT quantile relates to reminder bias (see Author response image 1). Each circle reflects a participant. As shown, no clear nonlinearities are evident, which challenges this interpretation. We believe that adding this to the already lengthy manuscript may not be necessary, but we are of course happy to reconsider if Reviewer 1 disagrees.

      Author response image 1.

      (10) [From the weaknesses listed in the public comments.] A more subtle point, I think this study can be more said to be an exploration than a deductive test of a particular model -> hypothesis > experiment. Typically, when we test a hypothesis, we contrast it with competing models. Here, the tests were two-sided because multiple models, with mutually exclusive predictions (over-use or under-use of reminders) were tested. Moreover, it's unclear exactly how to make sense of what is called the direct mechanism, which is supported by partial (as opposed to complete) mediation.

      The reviewer’s observation is accurate; some aspects of our study did take on a more exploratory nature, despite having preregistered hypotheses. This was partly due to the novelty of our research questions. We appreciate this feedback and will use it to refine our approach in future studies, aiming for more deductive testing.

      Reviewer #2:

      (1) Regarding the lack of relationship between AD and reminder setting, this result is in line with a recent study by Mohr et al (2023:https://osf.io/preprints/psyarxiv/vc7ye) investigating relationships between the same transdiagnostic symptom dimensions, confidence bias and another confidence-related behaviour: information seeking. Despite showing trial-by-trial under-confidence on a perceptual decision task, participants high in AD did not seek information any more than low AD participants. Hence, the under-confidence in AD had no knock-on effect on downstream information-seeking behaviour. I think it is interesting that converging evidence from your study and the Moher et al (2023) study suggest that high AD participants do not use the opportunity to increase their confidence (i.e., through reminder setting or information seeking). This may be because they do not believe that doing so will be effective or because they lack the motivation (i.e., through anhedonia and/or apathy) to do so. 

      This is indeed an interesting parallel and we would like to thank the reviewer for pointing out this recently published study, which we unfortunately have missed. We included it in the Discussion section, extending our sub-section on the missing downstream effects of the AD factor, as well as listing it in the references on page 27.

      Page 14: “Our findings align with those reported in a recent study by Mohr, Ince, and Benwell (2024). The authors observed that while high-AD participants were underconfident in a perceptual task, this underconfidence did not lead to increased information-seeking behaviour. Future research should explore whether this is due to their pessimism regarding the effectiveness of confidence-modulated strategies (i.e., setting reminders or seeking information) or whether it stems from apathy. Another possibility is that the relevant downstream effects of anxiety were not measured in our study and instead may lie in reminder-checking behaviours.”

      Mohr, G., Ince, R.A.A. & Benwell, C.S.Y. Information search under uncertainty across transdiagnostic psychopathology and healthy ageing. Transl Psychiatry 14, 353 (2024). https://doi.org/10.1038/s41398-024-03065-w

      (2) Fox et al 2023 are cited twice at the same point in the second paragraph of the intro. Not sure if this is a typo or if these are two separate studies? 

      Those are indeed two different studies and should have been formatted as such. We have corrected this mistake in the following places and furthermore also corrected one of the references as the study has recently been published:

      P. 2 (top): “Previous research links transdiagnostic compulsivity to impairments in metacognition, defined as thinking about one’s own thoughts, encompassing a broad spectrum of self-reflective signals, such as feelings of confidence (e.g., Rouault, Seow, Gillan & Fleming, 2018; Seow & Gillan, 2020; Benwell, Mohr, Wallberg, Kouadio, & Ince, 2022; Fox et al., 2023a;

      Fox et al., 2023b; Hoven, Luigjes, Denys, Rouault, van Holst, 2023a).”

      P. 2 (bottom): “More specifically, individuals characterized by transdiagnostic compulsivity have been consistently found to exhibit overconfidence (Rouault, Seow, Gillan & Fleming, 2018; Seow & Gillan, 2020; Benwell, Mohr, Wallberg, Kouadio, & Ince, 2022; Fox et al., 2023a; Fox et al., 2023b; Hoven et al., 2023a).”

      P. 4: “Prior evidence exists for overconfidence in compulsivity (Rouault et al., 2018; Seow & Gillan, 2020; Benwell et al., 2022; Fox et al., 2023a; Fox et al., 2023b; Hoven et al., 2023a), which would therefore result in fewer reminders.”

      P. 23: “Though we did not preregister a direction for this effect, in the light of recent findings it has now become clear that compulsivity would most likely be linked to overconfidence (Rouault et al., 2018; Seow & Gillan, 2020; Benwell et al., 2022; Fox et al., 2023a; Fox et al., 2023b; Hoven et al., 2023a).”

      P. 24: “Fox, C. A., Lee, C. T., Hanlon, A. K., Seow, T. X. F., Lynch, K., Harty, S., … Gillan, C. M. (2023a). An observational treatment study of metacognition in anxious-depression. ELife, 12, 1–17. https://doi.org/10.7554/eLife.87193”

      P. 24: “Fox, C. A., McDonogh, A., Donegan, K. R., Teckentrup, V., Crossen, R. J., Hanlon, A. K., … Gillan, C. M. (2024). Reliable, rapid, and remote measurement of metacognitive bias. Scientific Reports, 14(1), 14941. https://doi.org/10.1038/s41598-024-64900-0”

      (3) Typo in the Figure 1 caption: "The preregistered exclusion criteria for the for the accuracies with....".  

      Thank you so much for pointing this out. We haved changed the sentence in the caption of Figure 1 to read “The preregistered exclusion criteria for the accuracies with or without reminder are indicated as horizontal dotted lines (10% and 70% respectively).”

      Typo in the Figure 5 caption: "Standardised regression coefficients are given for each pat".

      Thank you so much for pointing this out to us, we have corrected the typo and the sentence in the caption of Figure 5 now reads “Standardised regression coefficients are given for each path.”

      [From the weaknesses listed in the public comments.] Participants only performed a single task so it remains unclear if the observed effects would generalise to reminder-setting in other cognitive domains.

      We appreciate the reviewer’s concern regarding the use of a single cognitive task in our study, which is indeed a common limitation in many cognitive neuroscience studies. The cognitive factors underlying offloading decisions are still under active debate. Notably, a previous study found that intention fulfilment in an earlier version of our task correlates with real-world behaviour, lending validity to our paradigm by linking it to realistic outcomes (Gilbert, 2015). Additionally, recent unpublished work (Grinschgl, 2024) has shown a correlation between offloading across two lab tasks, though a null effect was reported in another study with a smaller sample size by the same team (Meyerhoff et al., 2021), likely due to insufficient power. In summary, we agree that future research should replicate these findings with alternative tasks to enhance robustness.

      Gilbert, S. J. (2015). Strategic offloading of delayed intentions into the external environment. Quarterly Journal of Experimental Psychology, 68(5), 971–992. https://doi.org/10.1080/17470218.2014.972963

      Grinschgl, S. (2024). Cognitive Offloading in the lab and in daily life. 2nd Cognitive Offloading Meeting. [Talk]

      Meyerhoff, H. S., Grinschgl, S., Papenmeier, F., & Gilbert, S. J. (2021). Individual differences in cognitive offloading: a comparison of intention offloading, pattern copy, and short-term memory capacity. Cognitive Research: Principles and Implications, 6(1), 34. https://doi.org/10.1186/s41235-021-00298-x

      (6) [From the weaknesses listed in the public comments.] The sample consisted of participants recruited from the general population. Future studies should investigate whether the effects observed extend to individuals with the highest levels of symptoms (including clinical samples). 

      We agree that transdiagnostic research should ideally include clinical samples to determine, for instance, whether the subclinical variation commonly studied in transdiagnostic work differs qualitatively from clinical presentations. However, this approach poses challenges, as transdiagnostic studies typically require large sample sizes, and recruiting clinical participants can be more difficult. With advancements in online sampling platforms, such as Prolific, achieving better availability and targeting may make this more feasible in the future. We intend to monitor these developments closely and contribute to such studies whenever possible.

    1. eLife Assessment

      This valuable manuscript investigated the role of glutamate signaling in the dorsomedial striatum of rats in a treadmill-based task and reported that it differs in goal-trackers compared to sign-trackers in a way that corresponds to differences in behaviour. The evidence supporting these claims is solid but could be further strengthened by adding more analyses and more detailed descriptions of current analyses. These findings will primarily be of interest to behavioural neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors measured glutamate transients in the DMS of rats as they performed an action selection task. They identified diverse patterns of behavior and glutamate dynamics depending on the pre-existing behavioral phenotype of the rat (sign tracker or goal tracker). Using pathway-specific DREADDs, they showed that these behavioral phenotypes and their corresponding glutamate transients were differentially dependent on input from the prelimbic cortex to the DMS.

      Strengths:

      Overall there are some very interesting results that make an important contribution to the field. Notably, the results seem to point to differential recruitment of the PL-DMS pathway in goal-tracking vs sign-tracking behaviors.

      Weaknesses:

      (1) The controls for off-target effects of CNO are not given sufficient importance both in terms of power and in reporting of their results. There is precedent to accept that CNO at the dosage given is unlikely to disrupt the behaviour, this doesn't justify the assumption that glutamate transmission won't be affected, and this possibility hasn't been sufficiently ruled out.<br /> (2) The specificity of the viral approach needs to be clarified. Figure 8 indicates a large proportion of the PL neuron population that expresses mCherry in the absence of AAV-Cre. This infers that there are a large number of neurons inhibited by CNO administration that were outside the projection pathway, drawing into question the specificity of the effects.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine whether goal-directed and cue-driven attentional strategies (goal- and sign-tracking phenotypes) were associated with variation in cued motor responses and dorsomedial striatal (DMS) glutamate transmission. They used a treadmill task in which cues indicated whether rats should turn or stop to receive a reward. They collected and analyzed several behavioral measures related to task performance with a focus on turns (performance, latency, duration) for which there are more measures than for stops. First, they established that goal-trackers perform better than sign-trackers in post-criterion turn performance (cued turns completed) and turn initiation. They used glutamate sensors to measure glutamate transmission in DMS. They performed analyses on glutamate traces that suggest phasic glutamate DMS dynamics to cues were primarily associated with successful turn performance and were more characteristic of goal-trackers (ie. rats with "goal-directed" attentional strategy). Smaller and more frequent DMS glutamate peaks were associated with other task events, cued misses (missed turns), cued stops, and reward delivery and were more characteristic of sign-trackers (i.e. rats with "cue-driven" attentional strategies). Consistent with the reported glutamate findings, chemogenetic inhibition of prelimbic-DMS glutamate transmission had an effect on goal-trackers' turn performance without affecting sign-trackers' performance in the treadmill task.

      Strengths:

      The power of the sign- and goal-tracking model to account for neurobiological and behavioral variability is critically important to the field's understanding of heterogeneity of the brain in health and disease. The approach and methodology are sound in their contribution to this important effort.

      The authors establish behavioral differences, measure a neurobiological correlate of relevance, and then manipulate that correlate in a broader circuitry and show a causal role in behavior that is consistent with neurobiological measurements and phenotypic differences.

      Sophisticated analyses provide a compelling description of the authors' observations.

      Limitations:

      Considerable transparency was added in the revised preprint. The "n" for each analysis is now available in Tables 1 and 3, carefully cross-referenced by figure. Readers may now carefully consider the n's in drawing their own conclusions from reported data.

      While more conventional trial-averaged population activity traces are not presented or analyzed, the unique nature of the peak phenotypes is likely to "wash out" potentially meaningful signals if averaged across subjects. The distribution of peaks analyses (and shifts observed with chemogenetic inhibition) are improved in the revised preprint and are informative to illustrate this likelihood. Representative traces should theoretically be consistent with population averages within phenotype, and if not, discussion of such inconsistencies may have enriched the conclusions drawn from the study. For example, population traces of the phasic cue response in GT may resemble the representative peak examples, while smaller irregular peaks of ST may "wash out" in a population average (possibly resulting in a prolonged elevation) and could have strengthened the rationale for more sophisticated analyses of peak probability that remain the focus of the revised preprint.

    4. Reviewer #3 (Public review):

      Summary:

      Avila and colleagues investigate the role of glutamate signaling in the dorsomedial striatum in a treadmill-based task where rats learn to turn or stop their walking based on learning cue-associations that allow them to acquire rewards. Phenotypic variation in Pavlovian conditioned sign and goal-tracking behavior was examined, where behavioral differences in stopping and turning were observed. Glutamate signals in the DMS were recording during the treadmill task, and were related to features of cue-controlled movement, with a stronger relationship seen for goal trackers. Finally, chemogenic inhibition of prelimbic neurons projecting to the DMS (the predicted source of those glutamate signals), preferentially affected cued movement in goal trackers. The authors couch these experiments in the context of cognitive control-attentional mechanisms, movement disorders, and individual differences in cue reactivity.

      Strengths:

      Overall these studies are interesting and are of general relevance to a number of research questions in neurology and psychiatry. The assessment of intersection of individual differences in cue-related learning strategies with movement-related questions - in this case cued turning behavior - is interesting and understudied question. The link between this work and growing notions of corticostriatal control of action selection makes it timely.

      Weaknesses:

      The clarity of the manuscript could be improved in several places, including in the graphical visualization of data. It is difficult to interpret the glutamate results, as presented, in the context of specific behaviors. It is difficult to assess how many trials/subjects are represented in the data shown, and too much emphasis is placed on representative examples. Averages traces of the glutamate data and other standard analysis approaches would improve the paper and allow for easier interpretation of the data.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      Overall there are some very interesting results that make an important contribution to the field. Notably, the results seem to point to differential recruitment of the PL-DMS pathway in goal-tracking vs sign-tracking behaviors.

      Thank you.

      Weaknesses:

      There is a lot of missing information and data that should be reported/presented to allow a complete understanding of the findings and what was done. The writing of the manuscript was mostly quite clear, however, there are some specific leaps in logic that require more elaboration, and the focus at the start and end on cholinergic neurons and Parkinson's disease are, at the moment, confusing and require more justification.

      In the revised paper, we provide additional graphs and information in support of results, and we further clarify procedures and findings. Furthermore, we expanded the description of the proposed interpretational framework that suggests that the contrasts between the cortical-striatal processing of movement cues in sign- versus goal trackers are related to previously established contrasts between the capacity for the  cortical cholinergic detection of attention-demanding cues.

      Reviewer #2 (Public review):

      Strengths:

      The power of the sign- and goal-tracking model to account for neurobiological and behavioral variability is critically important to the field's understanding of the heterogeneity of the brain in health and disease. The approach and methodology are sound in their contribution to this important effort.

      The authors establish behavioral differences, measure a neurobiological correlate of relevance, and then manipulate that correlate in a broader circuitry and show a causal role in behavior that is consistent with neurobiological measurements and phenotypic differences.

      Sophisticated analyses provide a compelling description of the authors' observations.

      Thank you.

      Weaknesses:

      It is challenging to assess what is considered the "n" in each analysis (trial, session, rat, trace (averaged across a session or single trial)). Representative glutamate traces (n = 5 traces (out of hundreds of recorded traces)) are used to illustrate a central finding, while more conventional trial-averaged population activity traces are not presented or analyzed. The latter would provide much-needed support for the reported findings and conclusions. Digging deeper into the methods, results, and figure legends, provides some answers to the reader, but much can be done to clarify what each data point represents and, in particular, how each rat contributes to a reported finding (ie. single trial-averaged trace per session for multiple sessions, or dozens of single traces across multiple sessions).

      Representative traces should in theory be consistent with population averages within phenotype, and if not, discussion of such inconsistencies would enrich the conclusions drawn from the study. In particular, population traces of the phasic cue response in GT may resemble the representative peak examples, while smaller irregular peaks of ST may be missed in a population average (averaged prolonged elevation) and could serve as a rationale for more sophisticated analyses of peak probability presented subsequently.

      We have added two new Tables to clarify the number of rats per phenotype and sex used for each experiment described in the paper (Table 1), and the number of glutamate traces (range, median and total number) extracted for each analysis of performance-associated glutamate levels and the impact of CNO-mediated inhibition of fronto-striatal glutamate (Table 3).

      As the timing of glutamate peaks varies between individual traces and subjects, relative to turn and stop cue onset or reward delivery, subject-and trial-averaged glutamate traces would “wash-out” the essential findings of phenotype- and task event-dependent patterns of glutamate peaks. In the detailed responses to the reviewers, we illustrate the results of an analysis of averaged traces to substantiate this view. Furthermore, as detailed in the section on statistical methods, and as mentioned by the reviewer under Strengths, we used advanced statistical methods to assure that data from individual animals contribute equally to the overall result, and to minimize the possibility that an inordinate number of trials obtained from just one or a couple of rats biased the overall analysis.

      Reviewer #3 (Public review):

      Strengths:

      Overall these studies are interesting and are of general relevance to a number of research questions in neurology and psychiatry. The assessment of the intersection of individual differences in cue-related learning strategies with movement-related questions - in this case, cued turning behavior - is an interesting and understudied question. The link between this work and growing notions of corticostriatal control of action selection makes it timely.

      Thank you.

      Weaknesses:

      The clarity of the manuscript could be improved in several places, including in the graphical visualization of data. It is sometimes difficult to interpret the glutamate results, as presented, in the context of specific behavior, for example.

      We appreciate the reviewer’s concerns about the complexity of some of the graphics, particularly the results from the arguably innovative analysis illustrated in Figure 6. Figure 6 illustrates that the likelihood of a cued turn can be predicted based on single and combined glutamate peak characteristics. The revised legend for this figure provides additional information and examples to ease the readers’ access to this figure. In addition, as already mentioned above, we have added several graphs to further illustrate our findings.

      (Recommendations for the authors)

      Reviewer #1 (Recommendations for the authors):

      (1) The differences in behavioral phenotype according to vendor (Figure 1c) are slightly concerning, could the authors please elaborate on why they believe this difference is? Are there any other differences in these stocks- i.e. weight, appearance, other types of behaviors?

      Differences in PCA behavior across vendors or specific breeding colonies were documented previously and may reflect the impact of environmental, developmental and genetic factors (references added in the revised manuscript). We included animals from both vendors to increase phenotypic variability and due to animal procurement constraints during COVID-related restrictions.

      (2) Possibly related to the above, the rats in Figure 1a and Figure 2 are different strains. Please clarify.

      In the revised legend of Figure 2 we clarify that the rat shown in the photographs is a Long-Evans rat that was not part of the experiments described in this paper. This rat was used to generate these photos as the black-spotted fur provided better contrast against the white treadmill belt.

      (3) Figure 3c, the pairwise comparison showing a significant increase from Day 1 to Day 3 is hard to understand unless this is a lasting change. Is this increase preserved at Day 4? Examination of either a linear trend across days or a simple comparison of either Day 1 & 2 against Day 3 & 4 or, minimally Day 1 against Day 4 would communicate this message. Otherwise, there doesn't seem to be much of a case for improvement across test sessions, which would also be fine in my view.

      As the analysis of post-criterion performance also revealed an effect of DAY, we felt compelled to report and illustrate the results of pairwise comparisons in Fig. 3c. In agreement with the reviewer’s point, we did not further comment on this finding in the manuscript.

      (4) Figure 4e. I find it extremely unlikely that every included electrode was located exactly at anterior 0.5mm. Please indicate the range - most anterior and most posterior of the included electrodes in the study.

      The schematic section shown in Fig. 4e depicted that AP level of that section and collapsed all placements onto that level. As detailed in Methods, electrode placements needed to be within the following stereotaxic space: AP: -0.3 to 0.6 mm, ML: 2 to 2.5 mm, and DV: -4.2 to -5 mm (see Methods). To clarify this issue, the text in Results and the legend was modified and the 0.5 mm label was removed from Fig. 4e.

      (5) The paper generally is quite data light and there are a lot of extra results reported that aren't shown in the figures. There are 17 instances of the phrase "not shown", some are certainly justified, but a lot of results are missing…

      We followed the reviewer’s suggestion and added several graphs. The revised Figure 5 includes the new graph 5d that shows the number of glutamate traces with just 1, 2 or 3 peaks occurring during cue presentation period. Likewise, the revised Figure 7 includes the new graph 7h that shows the number of glutamate traces with just 1, 2 or 3 peaks following the administration of CNO or its vehicle. In both cases, we also revised the analysis of peak number data, by counting the number of cases (or traces) with just 1, 2 or 3 peaks and using Chi-squared tests to determine the impact of phenotype and, in the latter case, of CNO. In addition, the revised Figure 7 now includes a graph showing the main effects of phenotype and CNO in reward delivery-locked glutamate maximum peak concentrations (Fig. 7k). In revising these sections, we also removed the prior statement about glutamate current rise times as this isolated observation had no impact on subsequent analyses or the discussion.

      Concerning the reviewer’s point 5d (DMS eGFP transfection correlations Figure 8), the manuscript clarifies that the absence of such a correlation was expected given that eGFP expression in the DMS does not accurately reproduce the prelimbic-DMS projection space that was inhibited by CNO. In contrast, the correlations between the efficacy of CNO and DREADD expression measures in prelimbic cortex were significant and are graphed (Figs. 8g and 8j).

      (6) Please clarify the exact number of animals in each experiment. The caption of Figure 3 seems to suggest there are 29 GTs and 22 STs in the initial experiment, but the caption of Figure 5b seems to suggest there are N=30 total rats being analyzed (leaving 21 un-accounted for), or is this just the number of GTs (meaning there is one extra)?

      We have added Table 1 to clarify the number of animals used across different experiments and stages. Additionally, we have included a new Table 3 that identifies, for each graph showing results from the analyses of glutamate concentrations, the number of rats from which recordings were obtained and the number of traces per rat (range, median, and total).

      (7) Relatedly, in Figures 5c-f and Figures 7g-i, the data seem to be analyzed by trial rather than subject-averaged, please clarify and what is the justification for this?

      As detailed Experimental design and statistical analyses, we employed linear mixed-effects modeling to analyze the amperometric data that generated figures 5 and 7 to minimize the risk of bias due to an excessive number of trials obtained from specific rats. LMMs were chosen to analyze these repeated (non-independent) data to address issues that may be present with subject-averaged data. For clarity, throughout the results for these figures, the numerator in the F-ratio reflects the degrees of freedom from the fixed effects (phenotype/sex) and the denominator reflects the error term influenced by the number of subjects and the within-subject variance.

      Concerning the illustration and analysis of trial- or subject-averaged glutamate traces please see reviewer 2, point 1 and the graph in that section. Within a response bin, such as the 2-s period following turn cues, glutamate peaks – as defined in Methods - occur at variable times relative to cue onset. Averaging traces over a population of rats or trials would “wash-out” the phenotype- and task event-dependent patterns of glutamate concentration peaks, yielding, for example, a single, nearly 2-s long plateau for cue-locked glutamate recordings from STs (see Figure 5b versus the graph shown in response to reviewer 2, point 1).

      (8) Likewise on page 22, the number of animals from which these trials were taken should be stated "The characteristics of glutamate traces (maximum peak concentration, number of peaks, and time to peak) were extracted from 548 recordings of turn cue trials, 364 of which yielded a turn (GTs: 206, STs: 158) and 184 a miss (GTs: 112, STs: 72).".

      The number of animals is now included in the text and listed in Table 3.

      (9) The control group for Figure 7 given the mCherry fluorophore - given the known off-target effects of CNO, this is a very important control. Minimally, this data should be shown, but it is troubling that the ST group has n=2, I don't really understand how any sort of sensible stats can be conducted with a group this size, and obviously it's too small to find any significant differences if they were there.

      As discussed on p. 14-15 in the manuscript under the section Clozapine N-Oxide, the conversion rate of CNO to clozapine suggests that approximately 50-100 times the dose of clozapine (compared to our 5.0 mg/kg CNO dosage) would be required to produce effects on rodent behavior (references on p. 14-15).

      Regarding evidence from control rats expressing the empty construct, the revised manuscript clarifies that no effects of CNO on cued turns were found in 5 GTs expressing the empty control vector. Although CNO had no effects in STs expressing the DREADD, we also tested the effects of CNO in 2 STs expressing the empty control vector (individual turn rates following vehicle and CNO are reported for these 2 STs). Moreover, we extracted turn cue-locked glutamate traces (vehicle: 18 traces; 16 CNO traces) from an empty vector-expressing GT and found that administration of CNO neither reduced maximum glutamate peak concentrations nor the proportion of traces with just one peak. The absence of effects of CNO on cued turning performance and on turn-cue locked glutamate dynamics are consistent with prior studies showing no effects of 5.0 mg/kg CNO in rats not expressing the DREADD vector (references in manuscript).

      (10) Figure 8b - the green circle indicated by 1 is definitely not the DMS, this is the DLS, and animals with virus placement in this region should be excluded.

      The reviewer of course is correct and that exactly was the point of that illustration, as such a transfection space would have received the lowest possible rating (as indicated by the “1” in the green space). Fig. 8b was intended to illustrate expression efficacy ratings and does not indicate actual viral transfection spaces. Because the results described in the manuscript did not include data from a brain with a striatal transfection space as was illustrated in green in the original Fig. 8b, we removed that illustration of an off-target transfection space.  

      (11) Figure 8j, the correlation specifically counts double-labeled PL hM4Di + eGFP neurons. Separating dual-labeled cells from all mCherry-labeled cells seems very strange given the nature of the viral approach. There seems to be an assumption that there are some neurons that express the mCherry-hM4Di that don't also have the AAV-Cre (eGFP). Obviously, if that were true this poses a huge problem for your viral approach and would mean that you're inhibiting a non-selective population of neurons. More likely, the AAV-Cre (eGFP) is present in all of your mCherry-hM4Di cells, just not at levels visible without GFP antibody amplification. Ideally, staining should be done to show that all cells with mCherry also have eGFP, but minimally this correlation should include all cells expressing mCherry with the assumption that they must also have the AAV-Cre.

      As noted on page 15 in the Visualization and Quantification of eGFP/mCherry-Expressing Neurons section, eGFP expression in our viral approach was notably bright and did not necessitate signal enhancement. Furthermore, given the topographic organization of prelimbic-DMS projections on the on hand, and the variable transfection spaces in cortex and striatum on the other hand, the speculation that AAV-Cre may have been present in all mCherry cells is without basis. Second, there certainly are mCherry-positive cells that do not also express the retrogradely transported AAV-Cre, and that therefore were not affected by CNO. Third, the entire point of this dual vector strategy was to selectively inhibit prelimbic-striatal projections, and the strong correlation between double-labeled neuron numbers and cued turn scores substantiates the usefulness of this approach.

      (12) Discussion, a bit more interpretation of the results would be good. Specifically - does the PL-DMS inhibition convert GTs to STs? There were several instances where the behavior and glutamate signals seemed to be pushed to look like STs but also a lot of missing data so it is hard to say. One would assume this kind of thing if, as I think is being said (please clarify), the ST phenotype is being driven by glutamatergic drive either locally or from sources other than PL cell bodies, presumably silencing the PL cell body inputs in GTs also leaves other glutamatergic inputs as the primary sources?

      We agree with the reviewer that one could say, perhaps somewhat colloquially, that PL-DMS inhibition turns GTs to STs, in terms of turning performance and associated glutamate peak dynamics. The newly added data graphs are consistent with this notion. However, there are of course numerous other neurobiological characteristics which differ between GTs and STs and are revealed in the context of other behavioral or physiological functions.  In the Discussion, and as noted by the reviewer, we discuss alternative sources of glutamatergic control in STs and the functional implications of bottom-up mechanisms. In the revised manuscript, we have updated references and made minor revisions to improve this perspective.

      (13) I found the abstract really detailed and very dense, it is pretty hard to understand in its current form for someone who hasn't yet read the paper. At this level, I would recommend more emphasis on what the results mean rather than listing the specific findings, given that the task is still quite opaque to the reader.

      We revised the abstract, in part by deleting two rather dense but non-essential statements of results and by adding a more accessible conclusion statement.

      (14) There are a lot of abbreviations: CTTT, PD, PCA, GT, ST, MEA, GO, LMM, EMMs, PL, DMS. Some of these are only mentioned a few times: MEA, LMM, and EMMs are all mentioned less than 5 times. To reduce mental load for the reader, you could spell these ones out, or include a table somewhere with all of the abbreviations.

      We added a list of Abbreviations and Acronyms and eliminated abbreviations that were used infrequently.

      (15) Generally, the logic that cortico-striatal connections contribute to GT vs ST seems easy to justify, however, the provided justification is missing a line of connection: "As such biases of GTs and STs were previously shown to be mediated in part via contrasting cholinergic capacities for the detection of cues (Paolone et al., 2013; Koshy Cherian et al., 2017; Pitchers et al., 2017a; Pitchers et al., 2017b), we hypothesized that contrasts in the cortico-striatal processing of movement cues contribute to the expression of these opponent biases." Please elaborate on why specifically cholinergic involvement suggests corticostriatal involvement. I think there are probably more direct reasons for the current hypothesis.

      Done – see p. 4-5.

      (16) Along the same line, paragraph 3 of the intro about Parkinson's disease and cholinergics seems slightly out of place. This is because the specific or hypothesized link between these things and corticostriatal glutamate has not been made clear. Consider streamlining the message specifically to corticostriatal projections in the context of the function you are investigating.

      Done – see p. 4-5.

      (17) Page 8, paragraph 2. There is a heading or preceding sentence missing from the start of this paragraph: "Contrary to the acclimation training phase, during which experimenters manually controlled the treadmill, this phase was controlled entirely by custom scripts using Med-PC software and interface (MedAssociates).".

      Revised and clarified.

      (18) Page 13 "We utilized a pathway-specific dual-vector chemogenetic strategy (e.g., Sherafat et al., 2020) to selectively inhibit the activity of fronto-cortical projections to the DMS". The Hart et al (2018) reference seems more appropriate being both the same pathway and viral combination approach.

      Yes, thank you, we’ve updated the citation.

      (19) Pages 20-21: "Maximum glutamate peak concentrations recorded during the cue period were significantly higher in GTs than in STs (phenotype: F(1,28.85)= 8.85, P=0.006, ηp 2=0.23; Fig. 5c). In contrast, maximum peak amplitudes locked to other task events all were significantly higher in STs." The wording here is misleading, both Figures 5c and 5d report glutamate peaks during the turn cue, the difference is what the animal does. So, it should be something like "Maximum glutamate peak concentrations recorded during the cue period were significantly higher in GTs than in STs when the animal correctly made a turn (stats) but this pattern reversed on missed trials when the animal failed to turn (stats)..." or something similar.

      Yes, thank you. We have revised this section accordingly.  

      (20) Same paragraph: "Contingency tables were used to compare phenotype and outcome-specific proportions and to compute the probability for turns in GTs relative to STs." What is an outcome-specific proportion?

      This has been clarified.

      .

      (21) Page 22 typo: "GTs were only 0.74 times as likely as GTs to turn".

      Fixed.

      (22) The hypothesis for the DREADDs experiment isn't made clear enough. Page 23 "In contrast, in STs, more slowly rising, multiple glutamate release events, as well as the presence of relatively greater reward delivery-locked glutamate release, may have reflected the impact of intra-striatal circuitry and ascending, including dopaminergic, inputs on the excitability of glutamatergic terminals of corticostriatal projections" As far as I can understand, the claim seems to be that glutamate release might be locally modulated in the case of ST, on account of the profile of glutamate release- more slowly rising, multiple events, and reward-locked. Please clarify why these properties would preferentially suggest local modulation.

      We have revised and expanded this section to clarify the basis for this hypothesis.

      (23) The subheadings for the section related to Figure 7 "CNO disrupts..." "CNO attenuates..." presumably you mean fronto-striatal inhibition disrupts/attenuates. As it stands, it reads like the CNO per se is having these effects, off-target.

      Fixed.

      (24) The comparison of the results in the discussion against a "hypothetical" results section had the animals not been phenotyped behaviorally is unnecessary and overly speculative, given that 30-40% of rats don't fall into either of these two categories. I think the point here is to emphasize the importance of taking phenotype into account. This point can surely be made directly in its own sentence, probably somewhere towards the end of the discussion).

      We have partly followed the reviewer’s advice and separated the discussion of the hypothetical results from the summary of main findings. However, we did not move this discussion toward the end of the Discussion section as we believe that it justifies the guiding focus of the discussion on the impact of phenotype.

      (25) The discussion, like the introduction, talks a lot about cholinergic activity. As noted, this link is unclear - particularly how it links with the present results, please clarify or remove. Likewise high-frequency oscillations.

      We have revised relevant sections in the Introduction (see above) and Discussion sections. However, given the considerable literature indicating contrasts between the cortical cholinergic-attentional capacities of GTs and STs, the interpretation of the current findings in that larger context is justified.

      (26) Typo DSM in the discussion x 2.

      Thanks, fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned in the Public Review, it is challenging to assess what is considered the "n" in each analysis, particularly for the glutamate signal analysis (trial, session, rat, trace (averaged across session or single trial)). Representative glutamate traces are used to illustrate a central finding, while more conventional trial-averaged population activity traces are not presented or analyzed. For example, n = 5 traces, out of hundreds of recorded traces, with each rat contributing 1-27 traces across multiple sessions suggests ~1-2% of the data are shown as time-resolved traces. Representative traces should in theory be consistent with population averages within phenotype, and if not, discussion of such inconsistencies would enrich the conclusions drawn from the study. In particular, population traces of the phasic cue response in GT may resemble the representative peak examples, while smaller irregular peaks of ST may be missed in a population average (averaged prolonged elevation in signal) and could serve as rationale for more sophisticated analyses of peak probability presented subsequently (and relevant to opening paragraph of discussion where hypothetical data rationale is presented).

      We have added the new Table 1 to provide a complete account of the number of rats, per phenotype and sex, for each component of the experiments. In addition, the new Table 3 provides the range, median and total number of glutamate traces that were analyzed and formed the foundation of the individual data graphs depicting the results of glutamate concentration analyses.

      We chose not to present trial- or subject-averaged traces, as glutamate peaks occur at variable times relative to the onset of turn and stop cues and reward delivery, and therefore averaging across a population of rats or trials would obscure phenotype- and task event-dependent patterns of glutamate peaks. The attached graph serves to illustrate this issue. The graph shows turn cue-locked glutamate concentrations (M, SD) from trials that yielded turns, averaged over all traces used for the analysis of the data shown in Fig. 5d (see also Table 3, top row). Because of the variability of peak times, trial- and subject-averaging of traces from STs yielded a nearly 2-s long elevated plateau of glutamate concentrations (red triangles), contrasting with the presence single and multiple peaks in STs as illustrated in Figs. 5b and 5e. Furthermore, averaging of traces from GTs obscured the presence of primarily single turn cue-locked peaks. Because of the relatively large variances of averaged data points, again reflecting the variability of peak times, analysis of glutamate levels during the cue period did not indicate an effect of phenotype (F(1,190)=1.65, P\=0.16). Together, subject- or trial-averaged traces would not convey the glutamate dynamics that form the essence of the amperometric findings obtained from our study. We recognize, as inferred by the reviewer, that smaller irregular peaks in STs may have been missed given the definition of a glutamate peak (see Methods). It is in part for that reason that we conducted a prospective analysis of the probability for turns given a combination of peak characteristics (maximum peak concentration and peak numbers; Fig. 6).

      (2)To this latter point, the relationship between the likelihood to turn and the size of glutamate peak is focused on the GT phenotype, which limits understanding of how smaller multiple peaks relate to variables of interest in ST (missed turns, stops, reward). If it were possible to determine the likelihood for each phenotype, without a direct contrast of one phenotype relative to the other, this would be a more straightforward description of how signal frequency and amplitude relate to relevant behaviors in each group. Depending on the results, this could be done in addition to or instead of the current analysis in Figure 6.

      We considered the reviewer’s suggestion but could not see how attempts to analyze the role of maximum glutamate concentrations and number of peaks within a single phenotype would provide any significant insights beyond the current description of results. Moreover, as stressed in the 2nd paragraph of the Discussion (see Reviewer 1, point 24), the removal of the phenotype comparison would nearly completely abolish the relationships between glutamate dynamics and behavior from the current data set.

      Author response image 1.

      (3) If Figure 6 is kept, a point made in the text is that GT is 1.002x more likely than ST to turn at a given magnitude of Glu signal. 1.002 x more likely is easily (perhaps mistakenly) interpreted as nearly identical likelihood. Looking closely at the data, perhaps what is meant is @ >4uM the difference between top-line labeled {b} and bottom-line labeled {d,e} is 1.002? If not, there may be a better way to describe the difference as 1x could be interpreted as the same/similar.

      Concerning the potential for misinterpretation, the original manuscript stated (key phrase marked here in red font): Comparing the relative turn probabilities at maximum peak concentrations >4 µM, GTs were 1.002 times more likely (or nearly exactly twice as likely) as STs to turn if the number of cue-evoked glutamate peaks was limited to one (rhombi in Fig. 6a)  when compared to the presence of 2 or 3 peaks (triangles in Fig. 6a). However, we appreciate the reviewer’s concern about the complexity of this statement and, as it merely re-emphasized a result already described, it was deleted.

      (4) For Figure 7e, the phenotype x day interaction is reported, but posthocs are looking within phenotype (GT) at treatment effects. Is there a phenotype x day x treatment, or simply phenotype x treatment (day collapsed) to justify within-group treatment posthocs?

      We have revised the analysis and illustration of the data shown in Figs 7e and 7f, by averaging the test scores from the two tests, per animal, of the effects of vehicle and CNO, to be able to conduct a simpler 2-way analysis of the effects of phenotype and treatment.

      (5) Ideally, viral control is included as a factor in this analysis as well. The separate analysis for viral controls was likely done due to low n, however negative findings from an ANOVA in which an n=2 (ST) should be interpreted with extreme caution. The authors already have treatment control (veh, CNO) and may consider dropping the viral controls completely due to the lack of power to perform appropriate analyses.

      This issue has been clarified – see reviewer 1, point 9.

      Minor:

      (1) In the task description, it could be clearer how reward delivery relates to turns and stops. For example, does the turn cue indicate the rat will be rewarded at the port behind it? Does the stop cue indicate that the rat will be rewarded at the port in front of it? This makes logical sense, but the current text does not describe the task in this way, instead focusing on what is the correct action (seemingly but unlikely independent of reinforcement).

      We have updated the task description in Methods and the legend of Figure 2 to indicate the location of reward delivery following turns and stops.

      (2) For the peak analysis, what is the bin size for determining peaks? It is indicated that the value before and after the peak is >1 SD below the peak value, so it is helpful to know the temporal bin resolution for this definition.

      As detailed on p 11-12 under Amperometry Data Processing and Analysis of Glutamate Peaks, we analyzed glutamate concentrations recorded at a frequency of 5 Hz (200 ms bins) throughout the 2-second-long presentation of turn and stop cues and for a 2-second period following reward delivery.

      (3) Long Evans rats are pictured in Figure 2 (presumably contrast with a white background is better here), while SD rats are pictured in Figure 1. Perhaps stating why LE rats are pictured would help clear up any ambiguity about the strains used, as a quick look gives the impression two strains are used in two different tasks.

      Yes, see reviewer 1, point 2.

      (4) In Figure 7e, the ST and GT difference in turns/turn cue does not seem to replicate prior findings for tracking differences for this measure (Figure 3b). ST from the chemogenetic cohort seems to perform better than rats whose behavior was examined prior to glutamate sensor insertion. What accounts for this difference? Training and testing conditions/parameters?

      The reviewer is correct. The absence of a significant difference between vehicle-treated GTs and vehicle-treated STs in Fig. 7e reflects a relatively lower turn rate in GTs than was seen in the analysis of baseline behavior (Fig. 3b; note the different ordinates of the two figures, needed to show the impact of CNO in Fig. 7e). Notably, the data in Fig. 7e are based on fewer rats (12 versus 29 GTs and 10 versus 22 STs; Table 1) and on rats which at this point had undergone additional surgeries to infuse the DREADD construct and implant electrode arrays. We can only speculate that these surgeries had greater detrimental effects in GTs, perhaps consistent with evidence suggesting that immune challenges trigger a relatively greater activation of their innate immune system (Carmen et al., 2023). We acknowledged this issue in the revised Results.

      (5) The authors are encouraged to revise for grammar (are vs. is, sentence ending with a preposition, "not only" clause standing alone) and word choice (i.e. in introduction: insert, import, auditorily). Consider revising the opening sentence on page 5 for clarity.

      We have revised the entire text to improve grammar and word choice.

      (6) Do PD fallers refer to rats or humans? if the latter, this may be a somewhat stigmatizing word choice.

      We have replaced such phrases using more neutral descriptions, such as referring to people with PD who frequently experience falls.

      (7) Page 27 What does "non-instrumental" behavior mean?

      We have re-phrased this statement without using this term.

      (8) The opening paragraph of the discussion is focused on comparing reported results (with phenotype as a factor) to a hypothetical description of results (without phenotype as a factor) that were not presented in the results section. There is one reference to a correlation analysis on collapsed data, but otherwise, no reporting of data overall rats without phenotype as a factor. If this is a main focus, including these analyses in the results would be warranted. If this is only a minor point leading to discussion, authors could consider omitting the hypothetical comparison.

      We have revised this section - see reviewer 1 point 24.

      Reviewer #3 (Recommendations for the authors):

      (1) These are really interesting studies. I think there are issues in data presentation/analysis that make it difficult to parse what exactly is happening in the glutamate signals, and when. Overall the paper is just a bit of a difficult read. A generally standard approach for showing neural recording data of many kinds, including, for example, subject-averaged traces, peri-event histograms, heatmaps, etc summarizing and quantifying the results - would be helpful. Beyond the examples in Figure 5, I would suggest including averaged traces of the glutamate signals and quantification of those traces.

      We have addressed these issues in multiple ways, see the response to several points of reviewers 1 and 2, particularly reviewer 2, point 1.

      (2) Figure 6 (and the description in the response letter) is also very non-intuitive. It's unclear how the examples shown relate to the reported significance indicators/labels/colors etc in the figure. I would suggest rethinking this figure overall, and if there is a more direct quantitative way to connect signal features with behavior. Again, drawing from standard visualization approaches for neural data could be one approach.

      See also reviewer 2 points 1 and 3. Furthermore, we have revised the text in Results and the legend to improve the accessibility of Fig. 6.

      (3) As far as I can tell, all of the glutamate sensor conclusions reflect analysis collapsed across 100s of trials. Do any of the patterns hold for a subjects-wise analysis? How variable are individual subjects?

      We employed linear mixed-effect model analyses and added a random subject intercept to account for subject variability outside fixed effects (phenotype and treatment). The variance of the intercept ranged 0.01-1.71 SEM across outcome (cued turns/cued stops/misses). See also reviewer 1, point 7 and reviewer 2, point 1.

    1. eLife Assessment

      This study investigates the conditions under which abstract knowledge transfers to new learning. It presents solid evidence across a number of behavioral experiments that when explicit awareness of learned statistical structure is present, knowledge can transfer immediately, but that otherwise similar transfer requires sleep-dependent consolidation. The valuable results provide new constraints on theories of transfer learning and consolidation.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In sum, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers which have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      The authors argued in their response to this point that this issue could have quantitative but not qualitative impacts on the results, but we see no reason that the impact could not be qualitative. In other words, it should be acknowledged that an implicit test could potentially result in the implicit group exhibiting immediate structure transfer.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects and deserves discussion.

    3. Reviewer #2 (Public review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces, but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs and in the second training phase, which took place after a retention phase (2 min awake, 12 hour incl. sleep, 12 h only wake, 24 h incl. sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure performance on all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2 minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.

    4. Reviewer #3 (Public review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. However, when an overnight sleep separated the first and second learning phases, this opposite effect was reversed and came to match the pattern of the explicit group, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      In their revision the authors addressed my major comments successfully and I commend them for that.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl. sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we added an explicit discussion of these potential pitfalls in the revised version of the manuscript.

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and par-type differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, in the revised version of the manuscript we added analysis of achieved power for the statistical tests most critical to our arguments.

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we revamped and expanded the discussion of how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we expanded on our description of the matching and decision process and added supplementary descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test. The plots give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we expanded on in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be very useful to add individual data points (and/or another depiction of the distribution) to the bar plots. If not in the main figures, as added figures in the supplement.

      We added violin plots for all results in the Supplementary.

      (2) It would be helpful to include in the supplement some examples of responses that led to the 'explicit' or 'implicit' classification. Specifically, what kind of response was considered to contain a partial recognition of the underlying structure vs. no recognition?

      We added example responses used for classification in the Supplementary.

      (3) It would be useful to show the results of Experiment 5 as well as the diagonal version as supplemental figures.

      We added the requested figures in the Supplementary.

      Typos: page 10: "in in the tests", page 15: "rerun"

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      (1) My strongest reservation relates to the small sample size in the explicit group. The authors do report stats for all experiments together in one analysis and I think this is the only robust finding for this group. I would suggest removing any comparisons between this smaller group and the larger implicit group since they do not make a lot of sense due to the imbalance in sample size in my opinion. If they do want to report the explicit group individually for each experiment, they should at least test for differences between the experiments also for this group using ANOVA.

      We do agree that the unbalanced nature of the sample sizes can be problematic for the between-group comparisons. The t-tests reported for between-group comparisons are in fact Welch’s t-test better suited for unequal sample sizes and variances. Previously, we failed to report that these t-tests were Welch’s t-test, which we fixed in the revised version.

      In the Supplementary, we previously reported an ANOVA including all explicit participants from all experiments. This showed a significant main effect of Experiment and test type, but no significant interaction. We take this as evidence that although specific levels of learning vary by experimental condition, the overall pattern of learning (i.e. which pairs are learned better) are the same across all experiments.

      (2) Moreover, the explicit group does not only differ in the explicitness of their memory but also regarding learning performance per se (as evidenced by performance differences for the first training). This important confound needs to be acknowledged and discussed more thoroughly!

      We agree that this topic is important, this is why the subsection “The Type of Transfer Depends on Quality of Knowledge, Not Quantity of Knowledge” deals exclusively with this issue. See our reply to the next point.

      (3) The resampling approach is somewhat interesting to solve the issue raised in 2. However, I doubt that the authors actually achieve what they are claiming. Since we have a 2-AFC task the possibility must be considered that participants who chose correctly in the implicit group did so by chance. This means that the assumption that the matched pairs actually have the same amount of memory for the first training period as the explicit group is likely false. Therefore, this analysis is still comparing apples and oranges.

      We address this idea in detail in the supplementary materials pointing out first that the matched results showed the same pattern as the full results suggesting that Phase 1 and Phase 2 results are independent for this group, and by arguing that randomly selected subset of participants should not show a significant deviation from null performance in the Same vs. Novel performance in Phase 2.

      (4) One important issue, when conducting online experiments is assuring random allocation of participants. How did the authors recruit participants to ensure they did not select participants for the different experiments that differed regarding their preference for wake vs. sleep retention intervals? If no care was taken in this regard, I would suggest reporting this and maybe briefly discussing it.

      This shortcoming was now reported and addressed in the discussion section of the revised manuscript.

      (5) I could not find any information about the exact questions that were asked about the task rules. Also, there was no information on how the answers were used to assign groups. Both should be added.

      The exact questions were added to the revised Supplementary.

      (6) I think that the literature on sleep and rule extraction is well-represented in the manuscript. However, I think also referring more thoroughly to the literature on how sleep leads to gist extraction, schemas, and insight would help understand the relevance of the present research.

      We subsumed references to the mentioned areas of research under the labels of abstraction and generalization. In the revised section, we listed the appropriate labels along with the already used references to make the connection to a vast literature treating generalization in related but distinct ways more explicit.

      (7) It is unclear to me why the items learned in the first learning phase interfere with those learned in the second learning phase (without sleep) and not vice versa. What is the author's explanation for this?

      We added a paragraph on this to our revised discussion section. In short, there may also be retroactive interference. However, we would need yet another variation of the paradigm to properly measure it, and this was outside the scope of the current work.

      (8) As far as I can tell the study lacks all of the usual control tasks that are used in the field of sleep and memory (especially subjective sleepiness and objective vigilance). In addition, this research has the circadian confound, and therefore additional controls would have been warranted, e.g., morningness-eveningness, retrieval capabilities. Also, performance immediately after training phase 1 was not tested, which would serve as an important control for circadian differences in initial learning of the rule.

      The study uses a number of the control measures established in the sleep and memory literature, such as habitual sleep quality and sleep quality during the night of and the night before the experiment. However, there are, of course, more potentially interesting measures, such as the ones named by the reviewer.

      Testing performance right after training phase 1 would have been very interesting indeed. However, due to the nature of statistical learning tasks, this would have completely confounded the implicitness of learning by presenting participants with segmented input; i.e. isolated pairs. Therefore, we opted for the lesser of two evils in our design decision.

      (9) As far as I can tell, there is no effect of sleep on correctly identifying pairs from training phase 1. This would be expected and thus should be discussed.

      As noted and referenced in the discussion section, the effect of sleep on statistical learning per se is a subject of controversy in the literature, where some studies apparently find effects, while others find no effect on statistical learning whatsoever.

      (10) The manuscript should explicitly mention if the study was preregistered.

      It was not.

      Reviewer #3 (Recommendations For The Authors):

      The topic of this project is close to my heart, and I commend the authors for conducting numerous variations of the experiment with large sample sizes. I have some suggestions I feel will make the paper stronger, and a few minor comments that caught my eye during reading:

      (1) First and foremost, I found the paper's structure cumbersome. For instance, different aspects of Experiment 1 results are reported in (1) the main text, (2) under methods, and (3) in Supplementary. This makes reading unnecessarily difficult. This relates not only to the analysis results - the sample size is reported as 226 in the main text, 226+3 in Methods, and 226+3+19 in Supplementary. I strongly suggest removing all results from the Methods section and merging the supplementary results with the main results.

      We overhauled the structure of the paper, moving much more information into the proper method section and out of the Supplementary.

      (2) "Attention checks" and "response bias" appear first in Supplementary Experiment 1 but are explained only later under Experiment 1. The same thing for the experimental procedure. I therefore suggest placing Experiment 1 before Supplementary Experiment 1, but related to my previous comment - have one paragraph dedicated to Subject Exclusion of all experiments.

      The new structure of the Method sections solves this.

      (3) Figure 4 is mentioned but does not appear in the manuscript.

      This has been fixed. The paragraph in question now references the correct supplementary figure.

      (4) OSF project includes only data with no README file on how to understand the data. The work would also benefit from sharing the experimental and analysis codes.

      A README file was added.

      (5) This sentence is repeated in relation to four experiments: "Bayes Factors from Bayesian t-tests for implicit participants reported for experiments 1, 2, and 3 used an r-scale parameter of 0.5 instead of the default √2/2, reflecting that Experiment 1 found small effect sizes for this group". First, it is missing an explanation of what the r-scale means. Second, it sounds as if this was a product of the procedure, but in fact it was a decision by the researcher if I am correct. If so, it is missing a description of how and why this choice was made.

      This was indeed a decision by the researchers, in line with a Baysian logic of evidence accumulation. We made the explanation in the paper clearer.

      (6) Did I understand correctly that each pair was tested 4 times? Was it against the same foil? Did you make sure not to repeat the same pair in back-to-back trials? These details, in addition to what I noted in the public review, are needed.

      Each pair was tested 4 times. Each time against a different foil pair. Details have been added to the Method section.

      (7) Also in relation to my public review, I could not understand why the sample size was overshot by so much in Experiment 1 (229 instead of 198.15)?

      The calculated sample size of 198.15 was for the implicit subgroup alone, while 229 included explicit and implicit participants.

      (8) The correlation between phase 1 and phase 2 is only tested in explicit participants. Why is that? A test in implicit participants is needed for completeness.

      Correlations for implicit participants have been added.

      (9) There is known asymmetry between the horizontal and vertical plains in our visual system (with preference for horizontal stimuli). I was missing a comparison between learning in the two structures, and a report of how many participants received either in Phase 1.

      The allocation of participants to horizontal and vertical conditions was balanced. In the Method section we already report an ANOVA testing for a potential effect of orientation condition, which was not significant.

      Minor/aesthetic comments:

      (1) "In Phase 2, explicit participants performed above chance for learning pairs that shared their higher level orientation structure with that of pairs in Phase 1". This sounds as if there was a separate test following the two learning phases. Perhaps reword to "for phase 2 pairs".

      Fixed

      (2) "the two asleep-consolidation groups (Exp. 3 and 4)" - I think you mean Exp. 2 and 4.

      Fixed.

      (3) "acquiring explicitness in Experiment 5 as compared to 1" I think you mean Supplementary Experiment 1 as compared to 1.

      Fixed

      (4) "without such a redescription, the previously learned patterns in Phase 1 interfere with new ones in Phase 2, when redescription occurs..." The comma should be a dot.

      Fixed

      (5) In Experiment 4, did 168 or 169 participants survive exclusion? Both accounts exist, and so do reports of degrees of freedom that allow both 23 and 24 explicit participants.

      Fixed.

      (6) "Implicit learners also performed above chance.." in Experiment 2 is missing (n=XX).

      Fixed.

    1. eLife Assessment

      The study presents compelling evidence that the melanocortin system originating in the arcuate nucleus of the hypothalamus plays a crucial role in puberty onset, representing a significant advance in our understanding of reproductive biology. The research employs innovative approaches and benefits from the combined expertise of two respected laboratories, enhancing the robustness of the findings. Given the potential impact on human health and the strength of the evidence presented, this fundamental work will likely influence the field substantially and may inform future clinical applications.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that POMC neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      Innovative and novel<br /> Technically sound<br /> Well-designed<br /> Thorough

      Weaknesses:

      There were no major weaknesses identified.

    3. Reviewer #2 (Public review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R signalling on kisspeptin neurons. The initial question is a good one. Infertility associated with MC4 mutations in humans has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether there is a direct role for MC4 in regulating the HPG axis has not been thoroughly examined. Here, the researchers have assembled an elegant combination of targetted loss of function and gain of function in vivo experiments, specifically targetting MC4 expression in kisspeptin neurons. This excellent experimental design should provide compelling evidence for whether melanocortin signalling dirently affects arcuate kisspeptin neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). However, the magnitude of these responses and the overall effect on fertility were relatively minor. The mice lacking MC4R in kisspeptin neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in kisspeptin cells in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different populations of kisspeptin neurons. Collectively, the study provides some novel insights into how direct actions of melanocortin signalling via the MC4 receptor in kisspeptin neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss of function/gain of function experiments provides a conceptually simple but hugely informative experimental design. This is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic kisspeptin neurons and not necessarily caused as a consequence of obesity. As concluded by the authors: "reproductive impairments observed in MC4R deficient mice, which replicate many of the conditions described in humans, are largely mediated by the direct action of melanocortins via MC4R on Kiss1 neurons and not to their obese phenotype." This is important, as it might change how such fertility problems are treated.

      I would like to see the validation experiments for the genetic manipulation studies given greater prominence in the manuscript because they are critical to interpretation. Presently, only single unquantified images are shown, and a much more comprehensive analysis should be provided.

      Weaknesses:

      (1) Given that mice lacking MC4R in kisspeptin neurons remained fertile despite some reproductive irregularities, this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system. This is now appropriately covered in the discussion.

      (2) The mechanistic studies evaluating melanocortin signalling in kisspeptin neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter how they respond to hormones and neuropeptides. Eliminating this variable makes interpretation difficult, but the authors have justified this as a reductionist approach to evaluate estradiol actions specifically. However, this does not reflect the actual complexity of reproductive function.

      For example, the authors focus on a reduced LH response to exogenous estradiol in ovariectomised mice as evidence that there might be a sub-optimal preovulatory LH surge. However, the preovulatory LH sure (in intact animals) was not measured.

      They have not assessed why some follicles ovulated, but most did not. They have focused on the possibility that the ovulation signal (LH surge) was insufficient rather than asking why some follicles responded and others did not. This suggests some issue with follicular development, likely due to changes in gonadotropin secretion during the cycle and not simply due to an insufficient LH surge.

    4. Reviewer #3 (Public review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not entirely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it is hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system align with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear, logical, and organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      Comments on revisions:

      The authors have addressed my comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We are grateful to the reviewers and the editorial team for their feedback and thorough revisions of our paper. We also appreciate their acknowledgement that this study represents a significant advancement in the field of reproductive neuroendocrinology and offers insights on the contribution of obesity vs melanocortin signaling in women’s fertility. In the revised version, we will provide a more detailed clarification of the data and methodology and adhere to the reviewers’ suggestions.

      Please find below our answers to specific concerns in the public review:

      Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      We will tone down these statements throughout the manuscript to indicate that MC4R in Kiss1 neurons plays a role in the metabolic control of fertility (rather than “…is required for fertility”)

      The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult.

      (1) Bosch et al., 2013 Mol & Cell Endo; https://doi.org/10.1016/j.mce.2012.12.021

      Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons [1-2]. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons [3]. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories [4-7]. Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      (1) Padilla et al., 2010 Nat Med; https://doi.org/10.1038/nm.2126

      (2) Lam et al., 2017 Mol Metab; https://doi.org/10.1016/j.molmet.2017.02.007

      (3) Stincic et al., 2018 eNeuro; https://doi.org/10.1523/eneuro.0103-18.2018

      (4) Fenselau et al., 2017 Nat Neuro; https://doi.org/10.1038/nn.4442

      (5) Rau & Hentges, 2019 J Neuro; https://doi.org/10.1523/jneurosci.3193-18.2019

      (6) Fortin et al., 2021 Nutrients; https://doi.org/10.3390/nu13051642

      (7) Villa et al., 2024 J Neuro; https://doi.org/10.1523/jneurosci.0222-24.2024

      Recommendations for Authors

      We thank the reviewers and the editorial team for their comments and thorough revisions of our paper. We have now addressed their comments and edited the manuscript accordingly:

      Reviewer #1 (Recommendations For The Authors):

      L80 -This is an awkward sentence; it isn't an inverse agonist of the AgRP; this may read better just to say that the inverse agonist, AgRP.

      Thank you for this comment. This has now been changed in the text (L80).

      L86 - This text reads as if mice have an inherent obesity issue.

      This has also now been addressed in the text (L86).

      L131 - The numbers of digits past the decimal point should match for both mean and SEM.

      This has also now been addressed throughout the text.

      Figure 1D: Revise the bar graphs with distinct SEM bars, as these data are not generated within the same mice.

      The graphs are now changed, and they include distinct SEM and individual data points.

      Figure 2I-L - An n of 3 for controls is pretty minimal, though the clustering of data points is tight.

      We thank the reviewer for this comment, and we emphasize that while we agree that an n=3 for controls is minimal, the mRNA level values of this group are close, therefore the clustering of the data points is tight. We are happy to provide the raw data value for these groups if the reviewer wishes to.

      L159 - The role of reduced dynorphin mRNA is pretty speculative with regard to basal levels of LH, especially since no other indices of LH secretion were affected. It should also be recognized that mRNA levels do not always equate to activity.

      We agree with the reviewer that our explanation of the role of the reduced dynorphin with regards to the elevated basal LH is speculative, however, we only report that the higher LH levels correlates with the lower expression of the Pdyn gene expression, which is in line with the well documented role of Dynorphin on inhibiting LH secretion. We also recognize that mRNA levels don’t necessarily reflect activity. We have now added this statement to the text (L159).

      L164 - Given the ovary data, it seems that the increase seen in KO mice isn't quite sufficient, but is it known how much of a surge is necessary for ovulation in mice?

      We agree with the reviewer’s comment that the LH surge in Kiss1MC4RKO group is not enough to consistently induce ovulation, which is supported by the decrease in the numbers of corpora lutea data (Figure 2, O).

      According to literature, an LH surge in the female mice is estimated by a LH value >4 ng/ml (Bahougne et al., 2020). According to this rule, our data show that only two females out of six had LH surge in the KO group, while four females out of five had LH surge in the control group.  

      L211 - According to the figure, LH pulses were not recovered and remained similar to KO levels. Looking at the LH secretory patterns presented, it seems like the pulse frequency data should be interpreted with some caution, given that some of the pulses identified are tenuous at best.

      We agree that the LH pulses identified by our software (criteria described in the methods) are variable in shape (LH pulses are difficult to detect clearly in gonad intact females) and did not differ in number between groups; however, the reinsertion of Mc4r within Kiss1 neurons restored LH basal levels, amplitude and total secretory mass, which are clear indicatives of a significant improvement in the ability of these mice to release LH.

      L218 - Is there a reason why the surge was not looked at in these groups?

      Ovarian histology is the best indicator of ovulation. In these mice, corpora lutea were absent, indicating impaired ovulation, thus, we did not consider performing an LH surge protocol was necessary.

      L244 - This would also fit with previous findings in sheep that not all Kiss neurons express MC receptors

      We agree with this comment.

      L329 - Given the rapidity of its actions, how would this membrane ER function during a normal surge?

      Rapid estrogen signaling can act to ease transitions between states. Membrane delimited E2 actions can quickly attenuate or enhance coupling between receptors and signaling cascades. These effects will precede E2-driven changes in gene expression that produce more stable alterations in signaling. This combination of mechanisms will reduce any lag between rises in serum E2 and physiological effects. Considering the abbreviated mouse reproductive cycle, parallel mechanisms acting on different timescales are particularly important.

      L365 - I'm a little confused as to how this particular work sheds light on a role for MC3R. Is the relative distribution of the two isoforms within Kiss neurons known?

      In the present study, we report that hypothalamic Mc3r expression decreases leading up to the age of puberty onset (p30), in line with the profile of expression of Mc4r and a recent publication involving Mc3r in puberty onset (Lam et al., 2021), suggesting that both receptors may be involved in the control of reproductive function, potentially through the direct regulation of Kiss1 neurons as characterized in our present study.

      L422 - While I understand the nature of this statement, the receptor may simply reflect the activity of what binds to it, i.e., AgRP vs. alpha-MSH, suggesting that maybe the prepubertal period is more AgRP-dominated.

      We agree with this statement, and this needs to be further investigated.

      L495 - Reinsertion of Mc4R in Kiss1 neurons

      Thank you for this comment. This is now corrected in the text (L501).

      L524 - Bilateral ovariectomy of 6-month

      Thank you for this comment. This is now corrected in the text (L530).

      L538 - Is it known what stage of the cycle these mice were in when samples were collected?

      Yes, the samples were collected in diestrus. This is now mentioned in the text (L548)

      L556 - Pulse amplitude is usually measured relative to the preceding nadir.

      The method that we have been consistently using in our lab is the average of the 4 highest LH values in the samples collection period for each animal. We have found this to be consistent and representative of the overall amplitude (McCarthy et al., 2021; Talbi et al., 2021).

      L594 - This is a little confusing - the whole MBH would contain the ARH, but only the ARH was collected from the KO mice. If the whole MBH, dynorphin and Tac3, and Tac3 are expressed outside of the ARC, making interpretation of changes specifically within the ARH is difficult.

      Here (L592), we describe two different experiments, as mentioned by i) and ii).

      For experiment 1 (i): MBH was used in the WT mice at ages P10, P15, P22 and P30 to investigate the expression of the melanocortin genes (Agrp, Pomc, Mc3r and Mc4r).

      For experiment 2 (ii): In both KO and control groups, only the micro-dissected ARH was used to investigate genes expressions of Pdyn, Kiss1, Tac2, Tacr3.

      Reviewer #2 (Recommendations For The Authors):

      The validation experiments for the various manipulations are currently presented in the supplementary data. Still, in my opinion, these are critically important for interpreting the data, and it should be considered to present these more comprehensively in the main body of the manuscript. In Figure S1, it seems that the exposure of the two images is not the same, with a higher background in the control. Has this image been adjusted to highlight the staining, while the other has not? It looks like there remains a low level of expression still present in at least some of the KO cells - this may reflect difficulties using RNAscope (with its extreme amplification) to detect the absence of a signal, or it could also be that the knockout is incomplete. A percentage of cells still express MC4R. I think this should be acknowledged or discussed.

      We thank the reviewer for the feedback. While we agree that the validation of the mouse model is critical, we would like to keep it in the supplemental data.

      We also agree that the exposure looks different between the KO and WT controls, and we thank the reviewer for this comment. The quality of the photograph decreased when transferring to the manuscript. This has now been improved in the revised figure.

      As for the MC4R expression in some of the KO cells, we believe that MC4R is expressed in non Kiss1 cells as shown in the merged figure. Therefore, we believe that the Knockout of Mc4r in Kiss1 neurons is complete in these mice.

      The clear difference from the PVN's lack of effect is convincing and indicates that a specific knockout has been achieved. Is equivalent data also available for the AVPV population of cells that are examined later in the manuscript? Do those Kiss1 neurons also express the MC4R? The same question applies to the knock-in experiment: Was the expression of MC4R also driven in the AVPV population using this approach

      Yes, Kiss1 neurons in the AVPV also express MC4R as indicated in this study, and thus Mc4r is removed/reinserted in the AVPV as well in this mouse model.

      The quantitative RT-qPCR data on developmental changes in metabolic signaling molecules are really peripheral to the paper's main question. Relative to the validation experiments (as discussed above), I think these are less important data and could be placed into a supplementary figure. The discussion of these data becomes problematic, e.g., on line 359, the changes are described as "a low melanocortin tone..." but this seems problematic when referring to reduced expression of AgRP, an inverse agonist at the MC4R. If you are going to present these data, individual data points should be shown. Similarly, the question about whether this is a PCOS-like phenotype is perhaps worth asking. Still, the simple assessment of T and AMH could also be reported in a sentence without necessarily showing the data (or placing it in a supplementary figure). Better to focus on the key question - which is the role of MC4R signaling in Kiss1 neurons.

      We understand this reviewer’s concerns, however, due to the impact of MC4R signaling (particularly in the context of AgRP) on puberty, we strongly believe that the reader will benefit from expression profile across ages so we will respectfully disagree and keep in the main figure.  

      Per this reviewer’s comment, we have now added individual data points to Figure 1D.

      We also agree with the reviewer that the T and AMH data are not in the main scope of the paper, but since we uncovered a PCOS-like phenotype in female mice with specific deletion of Mc4r from Kiss1 neurons, it is important to keep these data in the main figure to show that the phenotype does not fully resemble a PCOS model.

      Having praised the experimental design, I think it is fair to acknowledge that the reproductive data from these experiments remain difficult to interpret. I understand that it is difficult to illustrate estrous cycles, but the "quantitative" data on percentages of time spent in any one stage are not as informative as seeing the actual individual patterns in Figure 2B. Were all of the animals consistently like the one illustrated, with persistent diestrus and only occasional evidence of ovulation?

      We agree that Figure 2C may be difficult to interpret but it is the best way to capture the all the data points for each group.

      All the 5 Kiss1MC4RKO females had persistent diestrus phases with only one or two estrus phases over 15 days (except for one female who had 4 estrous days), compared to control females who had 7 to 9 days of estrous, as shown in the graph (except for one female who had 5 days of estrus over 15 days period).

      Given that LH pulses appear to be normal, does this, in fact, suggest an ovarian problem? Is that possible? Are MC4R and Kiss1 co-expressed in the ovary? Or do you think this suggests an ovulation problem, perhaps driven by the impaired LH surge?

      This reviewer is correct in that our findings suggest a central defect in ovulation based on the deficit observed in the preovulatory LH surge. Thus, it is possible to have normal LH pulses, which are driven by one population of Kiss1 neurons (ARH) and the LH surge, driven by a distinct population of Kiss1 neurons (AVPV).

      Similarly, the response to the "LH surge induction protocol" is impaired (why not look at endogenous LH surges?). It seems that ovulation should be an all-or-none phenomenon in that if the LH surge is sufficient to induce ovulation, then all available follicles would be ovulated. If it is not, then no follicles will be ovulated. Why fewer follicles are ovulated in the gene-targeted animals seems more likely to be due to impaired follicular development rather than a subthreshold LH surge. So, this again points back to the ovary. Or perhaps we need a more thorough assessment of the pattern of LH pulses throughout the cycles in these animals.

      An LH surge induction protocol allows us to submit all female mice to the same conditions and expect a similar response, which is then optimal to compare with animals with an expected ovulation deficit, as it eliminates   external factors. We disagree in that ovulation is an all-or-none phenomenon because in mice numerous follicles mature at the same time and thus a decrease in the number of ovulated oocytes may be significant between groups even if the animals are not completely infertile.

      Collectively, my assessment of these data is that there are effects on reproduction, but they are actually relatively subtle. There were abnormal cycles and impaired LH surge in response to exogenous estrogen. But the animals are not actually infertile, so can ovulate and express normal reproductive behavior. So while there is a role for MC4R signalling in Kiss1 neurons, it may be a contributing modulatory role rather than a major regulatory mechanism. I think the tone of the descriptions should reflect this. I like the way it is framed in some parts of the discussion ("reproductive impairments...mediated by MC4R in Kiss1 neurons and not by their obese phenotype"), but the overall significance of this is overstated in some places, such as the abstract and in other parts of the discussion ("this population is tightly controlled by melanocortins").

      As mentioned in previous responses, ovulation in mice is not all-or nothing, so while the mice can reproduce, the disruption in the central mechanisms that control ovulation and irregular estrous cycles are a significant advancement in the field with strong translational potential to species where only one oocyte is usually ovulated, like in humans, where reproductive disorders in MC4R patients had been attributed to the obesity phenotype rather than to a central action of MC4R (as the reviewer captured in their comment). This is one of the main findings of this study.

      The overstatement has been now addressed throughout the text.

      For in vitro studies, all mice were ovariectomized and given estradiol "replacement." What was the rationale for this? Wouldn't this suppress the basal activity of these neurons? Then it appears that some of the animals were studied as ovariectomised (for an unspecified time but apparently ">7 days", without hormone replacement. The basal activity of these cells would be dramatically different. I think these artificial manipulations make these data quite difficult to interpret. How does this reflect the situation in a normal (or abnormal) estrous cycle? My understanding is that the brain slice approach already compromises the ability of this population of cells to function as a coordinated network (i.e., coordinated episodes of activity that are seen in vivo have not been observed in vitro in brain slices). Ovariectomizing and providing exogenous hormones also removes the additional regulatory elements of the cyclical changes in hormone inputs, so the cells may or may not behave like they would in vivo. Perhaps the authors could justify their choice of experimental model.

      We have clarified that the mice were ovariectomized for 7-10 days. A group of 3 mice are OVXed at once and then used on subsequent days a week later. This delay is both for the recovery of the animal and to allow for “washout” of endogenous ovarian hormones. For optogenetic studies, we were not measuring basal activity. Rather, we prioritized the ability to detect a postsynaptic response. While E2 decreases the networked activity of Kiss1- ARH neurons, the Hcn channels, calcium channels, and Vglut2 expression are all increased. This leads to increased excitability and more glutamate release. Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen (Bosch et al., J Mol Cell Endocrinology 2013). This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Finally, we have documented that Kiss1<sup>ARH</sup> neurons retain the synchronization of their neuronal firing in the hypothalamic slice preparation (Qiu et al., eLife 2016).

      Figure 4E shows neurons' staining after expressing a Cre-dependent channel rhodopsin vector into POMC-Cre mice. The number of labelled cells looks markedly larger than expected for adult POMC neurons. Was the specificity of this approach to neurons expressing POMC checked? I understand that the POMC-Cre mice have been criticised for ectopic expression of Cre during development in other populations of neurons in the arcuate nucleus that does not express POMC, such as the AgRP neurons (e.g., PMID: 22166984). Is it possible that this is not a problem in adult animals? Has that been validated in these animals? The description of the method suggests that it is acknowledged that some of the expression driven in these animals might be in AgRP neurons. Still, optogenetic activation of these cells will include all cells expressing Cre at the time of AAV administration.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories (Padilla et al., Nat Med 2010; Lam et al., Mol Metab 2017; Stincic et al., eNeuro 2018 eNeuro; Fenselau et al., Nat Neuro 2017). We have previously shown that AAV-driven mCherry expression is limited to cells labeled with a beta-endorphin antibody (Stincic et al., 2018 eNeuro). Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      Some additional explanation of the electrophysiology result may be required. For example, on Line 292, I'm confused by Fig 4M. Why is the response to 20Hz stimulation different in this cell (compared to the one in 4L) before administering naloxone? What proportion of cells showed this opposite response? On line 307: Is 5 cells sufficient for testing the POMC inputs onto AVPV and PeN Kiss1 neurons? How many slices/animals are included in collecting these 5 cells? The rapid action of STX illustrates the ability to modulate the response to MTII, but I am struggling to understand the implications of this in a physiological context. Suppose this response is desensitized by longer-term treatment with E2 (as indicated in the manuscript). Is it relevant to normal regulation during the cycle (particularly in the AVPV, where the key regulatory step seems to be the prolonged exposure to high estradiol as part of the preovulatory signals leading up to the LH surge)?

      As stated in the text, E2 has been shown to increase POMC expression and beta-Endorphin immunostaining. We do not know the effects of E2 on aMSH expression and release. E2 also tends to attenuate the coupling between inhibitory postsynaptic metabotropic (Gi,o-coupled) receptors and signaling cascades. So, there is likely a combination of pre- and post-synaptic mechanisms contributing to these responses. However, the focus of the current studies was on the predominant melanocortin signaling and, as such, we chose to eliminate the influence of opioid signaling. We have added two more cells to this group, both of which were successfully rescued for a total of 5 of 6 cells (6 slices, 5 animals). Between the labeling of b-endorphin fibers and high rate of rescue, we do believe that this is sufficient evidence to support a direct POMC input to Kiss1<sup>AVP/PeN</sup> neurons.

      Line 52: "Here, we show that Mc4r expressed in Kiss1 neurons is required for fertility in females." The knockout animals remain fertile, so this conclusion needs to be re-worded.

      Thank you for this comment. This has now been changed (L52).

      Line 80: "The melanocortin 4 receptor (MC4R) binds α-melanocyte stimulating hormone (αMSH), an agonist product of the pro-opiomelanocortin (Pomc) gene, and the inverse agonist of the agouti-related peptide (AgRP) to regulate food intake and energy expenditure" Is this the correct wording? I think it should be stated that AgRP is an inverse agonist at the MC4R, not that αMSH is the inverse agonist of AgRP. Re-work this sentence.

      Thank you for this comment. This has now been changed (L79-80).

      Line 88: "... however, conflicting reports exist". Describe what these conflicting reports show. Many MC4 variants ("mutations") are expressed in humans, but few will fully inactivate signalling like the mouse knockout.

      We thank the reviewer for this comment. By conflicting data, we refer to the studies that report no reproductive impairments in women with MC4R mutations. Either because the metabolic impairments (obesity, hyperphagia, hyperinsulinemia, hyperleptinemia, etc) are so strong that the focus is skewed to these issues, without a full reproductive assessment in these women, or simply because the reviewer mentioned, not all MC4R mutations fully inactivate its signaling in humans - as opposed to mouse models where reproductive disruption has been described previously in full body MC4RKOs.

      Line 91: "...that largely affects females". Is this a genuine sex difference, or are reproductive deficits simply more overt in female rodents? I think the Coss paper (reference 19 in the manuscript) showed a greater effect of diet-induced obesity in males than in females.

      We believe that sex differences exist with regards to the role of MC4R in the regulation of fertility, as we show that most of this effect is mediated by MC4R signaling in Kiss1 AVPV neurons, a neuronal population that is specific to the female brain.

      As far as we can tell, the Coss paper (Villa et al., 2024) has only tested males but not females. Moreover, they investigated the effect of diet induced obesity in mice on their fertility (specifically LH secretion), while in this study we are specifically looking at the deletion of MC4R from Kiss1 neurons, and these mice were not obese (Figure 2A). While both these conditions induce impaired fertility, the mechanisms and signaling pathways are different (our mice lack MC4R signaling while the obese mice have a decrease in MC4R expression but the signaling is still functional).

      Line 392: also Hessler et al. PMID: 32337804.

      This reference is now added to the text (Line 393).

      Line 433. The discussion of how advanced puberty onset (seen in the Kiss1-specific KO animals) might be caused by MC4R signalling in AVPV Kiss1 neurons, which are sexually dimorphic, which might explain sex differences in puberty timing in mammals seems extremely speculative and based on limited data. More targeted experiments would be needed to address this, and I think this speculation should be removed here.

      This speculation has now been removed from the text.

      Line 438: "Furthermore, our findings suggest that metabolic cues, through the regulation of the melanocortin output onto Kiss1AVPV/PeN neurons, are essential for the timing and magnitude of the GnRH/LH surge." Again, I think this is overstating the present data, which has only looked at an artificial hormone administration regime. The animals are fertile and, thus, must be able to mount a sufficient LH surge. The major effect, in fact, seems to be on their cycle, perhaps leading to impaired follicular development. Please acknowledge that this will be one of the multiple pathways by which metabolic information is fed into the HPG axis.

      In addition to the effect on their cycles as mentioned by the reviewer, the Kiss1MC4RKO females also display impaired fertility (Figure 2, S-T) and fewer corpora lutea which is in line with the impaired mounting of LH surge (Figure 2, M). Even if the LH surge is induced by the hormone administration protocol, it only reflects the natural ability of the HPG axis to mount the surge, as this regimen is only there to mimic the endogenous hormonal changes leading to LH surge and therefore ovulation, in a controlled manner. Nonetheless, we agree with this reviewer that this is not the sole mechanism by which metabolism regulates reproductive function and this has been emphasized in the paper. (line 443)

      Reviewer #3 (Recommendations For The Authors):

      The decreased melanocortin tone drives puberty onset (Figure 1D), and this is correlative. The transgenic animals' hypothalamic expression of Agrp, Pomc, Mc4r, and Mc3r can be measured to strengthen the claim. Hprt expression should be demonstrated, as this housekeeping gene was used as a common denominator.

      We thank the reviewer for this comment. While we think that indeed, measuring Agrp, Pomc, Mc4r, and Mc3r gene expressions in the transgenic mice will strengthen our claim and give more insights into the melanocortins tone during pubertal maturation, this is unfortunately not feasible as it will involve generating a lot of mice (at least n=40 pups for an n=5/group, KO and control littermates, females only -which will require setting up lots of breeding pairs-) during different ages throughout puberty.

      As for the gene expression of Hprt, because we have 6 mice per age, 4 ages total, every gene (Agrp, Pomc, Mc4r, Mc3r) was run in a separate plate with Hprt as its own housekeeping gene. Samples were run in duplicates for each Hprt and melanocortin genes in a 96 well = 48 wells for Hprt and 48 wells for each of the melanocortin genes. Therefore, it won’t be possible to represent one Hprt expression for all the four genes, however every gene was normalized to the Hprt gene expression that was ran in the same plate).

      In Figures 4 and 5, dot plots can be used (as opposed to the bar graphs) to better reflect the individual data points.

      Figures 4 and 5 have been revised to include individual data points.

      The electrophysiology experiment requires more details in the method section. In addition to the publication cited, a brief recap of the methodology used in this paper, such as the focal application of MTII (Figure 4B), is also needed.

      We have added more details to the Methods.

    1. eLife Assessment

      This study describes a highly complex automated algorithm for analyzing vascular imaging data from two-photon microscopy. The proposed tool has the potential to be extremely valuable to the field and to fill gaps in knowledge of hemodynamic activity across a regional network. The biological application provided, however, has several problems that make many of the scientific claims in the paper incomplete.

    2. Reviewer #2 (Public review):

      This work describes a highly complex automated algorithm for analyzing vascular imaging data from two-photon microscopy. This tool has the potential to be extremely useful to the field and to fill gaps in knowledge of hemodynamic activity across a regional network. The biological application provided, however, has several problems that make many of the scientific claims in the paper questionable.

      The authors have commented on my main concerns. They have provided some limited evidence in the literature of prolonged vascular signals - though still nothing close to the several hundred-second long vascular responses oscillating between dilation and constriction shown here. And they have added a nice experiment showing they can resolve small beads (though still quite bigger than their average capillary diameter) with their system. They have also added comparisons with other software which shows some modest but clear improvement in some aspects. All these make the paper stronger.

      However, I still think the main overall problem from the biological interpretation side of the paper is still not fixed. Perhaps I am too skeptical but I have a hard time accepting the conclusions about dilators and constrictors (depth dependence, distance from nearest neuron, etc.) because the data are just too temporally sparse and too unconventional in their duration and fluctuation. Also, the differences are often very small compared to the variability.

      Regarding the spatial resolution, I was more concerned that if the pixel size is about 1 micron, then detecting around 1 micron dilations (or even less) is really below the resolution of the system. While the bead imaging is good for showing they can extract these diameters very close to the real value, this is still not like a living brain with imaging and motion artifacts. Given the temporal resolution issues already mentioned, this makes me highly skeptical of the biological claims. I think the discussion should at least strongly emphasize that a major caveat in their analysis is that the diameters are only sampled every 42 seconds, and , given the fluctuation in vessel diameter above and below baseline, this makes classification of the vessel as constrictor/dilator and by how much highly dependent on what time point the vessel diameter was sampled.

      Although the computational side of the paper is not my strong point, it seems there is potential for the pipeline to be useful in other applications. But given the limitations of the system they are using, I feel that it is a methods paper in its current form more than anything that should be making the biological claims included.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):<br /> The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm2 photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm2 photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm2; the 1.1 mW/mm2 responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated in Table 1.”

      Reviewer #1 (Recommendations for the authors):

      Comments to the authors replies to the reviews:

      - Supplementary Figure 13:

      As indicated before the 3d images + scale makes it impossible to judge the quality of the outputs.

      As aforementioned, 2D slices have been added to the Supplementary Figure 13.

      - Supplementary Table 3:

      There is a significant increase in the Hausdorrf and Mean Surface Distance measures for the new data, why ?

      A single vessel being missed by either the rater or the model would significantly affect the Hausdorff distance (HD) and by extension Mean Surface Distance: this is particularly pertinent in the LSFM image with its much larger FOV and thus a potential for much larger max distances to result from missed vessels in the prediction or ground truth data. Large Hausdorff distances may indicate a vessel was missed in either the ground truth or the segmentation mask.

      Of note, a different rater annotated these additional datasets from the raters labeling the ground truth data. There is a high variability in boundary placements between raters. On a test where three raters segmented the same three images from the original dataset, we computed a ICC of 0.73 across their segmentations. Our model Dice scores on predictions in out-of-distribution data sets were on par with the inter-rater ICC on the Thy1ChR2 2PFM data.

      - Supplementary Figure 2: The authors provide useful data on the time responses. However, looking at those figures, it is puzzling why certain vessels were selected as responding as there seems almost no change after stimulation. In addition, some of the responses seem to actually start several tens of seconds before the actual stimulus (particularly in A).

      Only some traces in C and D (dark blue) seem to be actually responding vessels.

      This is not discussed and unclear.

      Supplementary Figure 2 displays the time courses of vessel calibre for all vessels in the FOV, not just those deemed responders.

      The aforementioned effects are due to the loess smoothing filter having been applied to the time courses for the preliminary response, which has been rectified in the updated figures. In particular, Supplementary Figure 2 has been updated with separate loess smoothing before and after photostimulation. The (pre-stimulation) effect is gone once the loess smoothing has been separated.

      - R Point 7: As indicated before and in agreement with the alternative reviewer, the quality of the results in 3d is difficult to judge. No 2d sections that compare 'ground truth' with inferred results are shown in the current manuscript which would enable a much better judgment. The provided video is still 3d and not a video going through 2d slices. Also, in the video the overlap of vasculature and raw data seems to be very good and near 100%, why is the dice measure reported earlier so low ? Is this a particularly good example ?

      Some examples, indicating where the pipeline fails (and why) would be helpful to see, to judge its performance better (ideally in 2d slices).

      As discussed in the public comments, the 2D slices are now included in Suppl. Fig. 4 and suppl. Fig 13 to facilitate visual assessment. The vessels are long and thin so that slight dilations or constrictions impact the Dice scores without being easily visualizable.

      - Author response images 6 and 7. From the presented data the constrictions measured in the smaller vessels may be a result (at least partly) of noise. This seems to be particularly the case in Author response image 7 left top and bottom for example. It would be helpful to show the actual estimates of the vessels radii overlaid in the (raw) images. In some of the pictures the noise level seems to reach higher values than the 10-20% of noise used in the tests by the authors in the revision.

      The vessel radii are estimated as averages across all vertices of the individual vessels: it is thus not possible to overlay them meaningfully in 2D slices: in Figure 2B, we do show a rendering of sample vessel-wise radii estimates.

      - "We tested the centerline detection in Python, scipy (1.9.3) and Matlab. We found that the Matlab implementation performed better due to its inclusion of a branch length parameter for the identification of terminal branches, which greatly reduced the number of false branches; the Python implementation does not include this feature (in any version) and its output had many more such "hair" artifacts. Clearmap skeletonization uses an algorithm by Palagyi & Kuba(1999) to thin segmentation masks, which does not include hair removal. Vesselvio uses a parallelized version of the scipy implementation of Lee et al. (1994) algorithm which does not do hair removal based on a terminal branch length filter; instead, Vesselvio performs a threshold-based hair removal that is frequently overly aggressive (it removes true positive vessel branches), as highlighted by the authors."

      This statement is wrong. The removal of small branches in skeletons is algorithmically independent of the skeletonization algorithm itself. The authors cite a reference concerned with the algorithm they are currently employing for the skeletonization. Careful assessment of that reference shows that this algorithm removes small length branches after skeletonization is performed. This feature is available in open-source packages as well, or could be easily implemented.

      We appreciate that skeletonization is distinct from hair removal and have reworded this paragraph for clarity. We are currently working with SciPy developers to implement hair removal in their image processing pipeline so as to render our pipeline fully open-source.

      The removal of hairs after skeletonization with length based thresholding leads to the possibility of removing parts of centerlines in the main part of vessels after branch points with hairs. The Matlab implementation does not do this and leaves the main branches intact.

      This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in python, but is available in Matlab [9].

      - "On the reviewer's comment, we did try inputting normalized images into Ilastik, but this did not improve its results." This is surprising. Reasonable standard preprocessing (e.g. background removal, equalization, and vessel enhancement) would probably restore most of illastik's performance in the indicated panel.

      While the improvement may be present in a particular set of images, the generalizability of such improvement to other patches is often poor in our experience, as reflected by aforementioned results and the widespread uptake of DL approaches to image segmentation. The in vivo datasets also contain artifacts arising from eg. bleeding into the FOV that ilastik is highly sensitive to. This is an example of noise that is not easily removed by standard preprocessing.

      - "Typical pre-processing/standard computer vision techniques with parameter tuning do not generalize on out-of-distribution data with different image characteristics, motivating the shift to DL-based approaches."

      I disagree with this statement. DL approaches can generalize typically when trained with sufficient amount of diverse data. However, DL approaches can also fail with new out of distribution data. In that situation they only be 'rescued' via new time intensive data generation and retraining. Simple standard image pre-processing steps (e.g. to remove background or boost vessel structures) have well defined parameter that can be easily adapted to new out of distribution data as clear interpretations are available. The time to adapt those parameters is typically much smaller than retraining of DL frameworks.

      We find that the standard image processing approaches with parameter tuning work robustly only if fine-tuned on individual images; i.e., the fine-tuning does not generalize across datasets. This approach thus does not scale to experiments yielding large image sizes/having high throughput experiments. While DL models may not generalize to out-of-distribution data, fine-tuning DL models with a small subset of labels generally produce superior models to parameter tuning that can be applied to entire studies. Moreover, DL fine-tuning is typically an efficient process due to very limited labelling and training time required.

      - It is still unclear how the authors pipeline performs compared with other (open source or commercially) available pipelines. As indicated before, comparing to illastik, particularly when feeding non preprocessed data, does not seem to be a particularly high bar.

      This question has also been raised by the other reviewer who asked to compare to commercially available pipelines.

      This question was not answered by the authors, and instead the authors reply by claiming to provide an open source pipeline. In fact, the use of matlab in their pipeline does not make it fully open-source either. Moreover, as mentioned before, open-source pipelines for comparisons do exists.

      As discussed above, the manuscript now includes comparisons to Imaris for segmentation and Vesselvio for graph extraction. The pipeline is on github.

      -"We agree with the review that this question is interesting; however, it is not addressable using present data: activated neuronal firing will have effects on their postsynaptic neighbors, yet we have no means of measuring the spread of activation using the current experimental model."

      Distances to the closest neuron in the manuscript are measured without checking if it's active. Thus, distances to the first set of n neurons could be measured in the same way, ignoring activation effects.

      Shorter distances to an entire ensemble of neurons would still be (more) informative of metabolic demands.

      This could indeed be done within the existing framework. The connected-components-3d can be used to extract individual occurrences of neurons in the FOV from the neuron segmentation mask. Each neuron could then have its distance calculated to each point on the vessel centerlines.

      - model architecture:

      It is unclear from the description if any positional encoding was used for the image patches.

      It is unclear if the architecture / pipeline can handle any volume sizes or is trained on a fixed volume shapes? In the latter case how is the pipeline applied?

      The model includes positional encoding, as described in Hatamizadeh et al. 2021.

      The model can be applied to images of any size, as demonstrated on larger images in Supplementary Figure 9 and on smaller images in Supplementary Figure 2. The pipeline is applied in the same way. It will read in the size of an input image and output an image of the same size.

      - transformer models often show better results when using a learning rate scheduler that adjust the learning rate (up and down ramps typically). Did the authors test such approaches?

      We did not use a learning rate scheduler, as we found we were getting good results without using one.

      - formula (4): The 95% percentile of two numbers is the max, and thus (5) is certainly not what the HD95 metric is. The formula is simply wrong as displayed.

      Thank you. The formula has been updated.

      - formula (5): formula 5 is certainly wrong: n_X, n_y are either integer numbers as indicated by the sum indices or sets when used in the distances, but can't be both at the same time.

      Thank you for your comment. The Formula has been updated.

      - The statement:

      "this functionality of the skeletonization algorithm is currently unavailable in any python implementation, but is available in Matlab [56]."

      is not correct (see reply above)

      Please see the response above. This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in Python, but is available in Matlab [9].

      - the centerline extraction is performed after taking the union of smoothed masks. The union operation can induce novel 'irregular' boundaries that degrade skeletonization performance. I would expect to apply smoothing after the union?

      Indeed the images were smoothed via dilation after taking the union, as described in the previous set of responses to private comments.

      - "The radius estimate defined the size of the Gaussian kernel that was convolved with the image to smooth the vessel: smaller vessels were thus convolved with narrower kernels."

      It's unclear what image were filtered ?

      We have updated this text for clarity:

      The radius estimate defined the size of the Gaussian kernel that was convolved with the 2D image slice to smooth the vessel: smaller vessels were thus convolved with narrower kernels.

      - Was deconvolution on the raw images applied or after Gaussian filtering ?

      The deconvolution was applied before Gaussian filtering.

      - ",we extracted image intensities in the orthogonal plane from the deconvolved raw registered image. A 2D Gaussian kernel with sigma equal to 80% of the estimated vessel-wise radius was used to low-pass filter the extracted orthogonal plane image and find the local signal intensity maximum searching, in 2D, from the center of the image to the radius of 10 pixels from the center."

      Would it not be better to filter the 3d image before extracting a 2d plane and filter then ?

      That could be done, but would incur a significant computational speed penalty. 2D convolutions are faster, and produced excellent accuracy when estimating radii in our bead experiment.

      What algorithm was used to obtain the 2d images.

      The 2d images were obtained using scipy.ndimage.map_coordinates.

      - Figure 2: H is this the filtered image or the raw data ?

      Panel H is raw data.

      - It would be good to see a few examples of the raw data overlaid with the radial estimates to evaluate the approach (beyond the example in K).

      Additional examples are shown in Figure 5.

      - Figure 2 K: Why are boundary points greater than 2 standard deviations away from the mean excluded ?

      They are excluded to account for irregularities as vessels approach junctions [10], [11] REF.

      - Figure 2 L: what exactly is plotted here ? What are vertex wise changes, is that the difference between the minimum and maximum of all the detected radii for a single vertex? Why do some vessels (red) show high values consistently throughout the vessel ?

      Figure 2L displays change in the radius of vertices - in this FOV- following photostimulation in relation to baseline.

      - Assortativity: to calculate the assortativity, are radius changes binned in any form to account for the fact that otherwise, $e_{xy}$ and related measures will be likely be based on single data points?

      Assortativity is not calculated from single data points. It can be calculated by either binning into categories or computing it on scalars i.e. average radius across a vessel segment:

      See here for info on calculating assortativity from binned categories (ie classifying a vessel as a constrictor, dilator or non-responder):

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.attribute_assortativity_coefficient.html#networkx.algorithms.assortativity.attribute_assortativity_coefficient

      And see here for calculating assortativity from scalar values:

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.numeric_assortativity_coefficient.html#networkx.algorithms.assortativity.numeric_assortativity_coefficient

      We calculated the assortativity using scalar values.

      In both cases, one uses all nodes and calculates the correlation between each node and its neighbours with an attribute that can be binned or is a scalar. Binning the value on a given node would not affect the number of nodes in a graph.

      - "Ilastik tended to over-segment vessels, i.e. the model returned numerous false positives, having a high recall (0.89{plus minus}0.19) but low precision (0.37{plus minus}0.33) (Figure 3, Supplementary Table 3)."

      As indicated before, and looking at Figure 4, over segmentation seems due to too high background. A suggested preprocessing step on the raw images to remove background could have avoided this.

      The images were normalized in preprocessing.

      - Figure 4: The 3d panels are not much easier to read in the revised version. As suggested by other reviewers, 2d sections indicating the differences and errors would be much more helpful to judge the pipelines quality more appropriately.

      As discussed above, 2D sections are now available in a supplementary figure.

      - Figure 3: What would be the dice score (and other measures) between two ground truths extracted by two annotations by two humans (assisted e.g. by illastik).

      Two additional human rates annotated images. We observed a ICC of 0.73 across a total of three raters on the three images.

      - Figure 5: The authors only provide the absolute value of SU for the sigma noise levels. This only has some meaning when compared to the mean or median SU of the images. In the text the maximal intensity of 1023 SU is mentioned, but what are those values in images with weaker / smaller vessels (as provided in the constriction examples in the revision)/

      I am unclear why this validation figure should be part of the main manuscript while generalization performance is left out.

      The manuscript has been updated with the mean SNR value of 5.05 ± 0.15 to provide context for the quality of our images.

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. [Online]. Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

      (9) T. C. Lee, R. L. Kashyap, and C. N. Chu, “Building Skeleton Models via 3-D Medial Surface Axis Thinning Algorithms,” CVGIP Graph. Models Image Process., vol. 56, no. 6, pp. 462–478, Nov. 1994, doi: 10.1006/cgip.1994.1042.

      (10) M. Y. Rennie et al., “Vessel tortuousity and reduced vascularization in the fetoplacental arterial tree after maternal exposure to polycyclic aromatic hydrocarbons,” Am. J. Physiol.-Heart Circ. Physiol., vol. 300, no. 2, pp. H675–H684, Feb. 2011, doi: 10.1152/ajpheart.00510.2010.

      (11) J. Steinman, M. M. Koletar, B. Stefanovic, and J. G. Sled, “3D morphological analysis of the mouse cerebral vasculature: Comparison of in vivo and ex vivo methods,” PLOS ONE, vol. 12, no. 10, p. e0186676, Oct. 2017, doi: 10.1371/journal.pone.0186676.

    1. eLife Assessment

      This important study enhances our understanding of ephaptic interactions by utilizing earthworm recordings to refine a general model and use it to predict ephaptic influences across various synaptic configurations. The integration of experimental evidence, a robust mathematical framework and computer simulations convincingly demonstrate the effects of action potential propagation and collision properties on nearby membranes. The study will interest both computational neuroscientists and physiologists.

    2. Reviewer #2 (Public review):

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. In a revised version of the manuscript, it was also applied, with success, to published experimental data on the cerebellar basket cell-to-Purkinje cell pinceau connection. The conclusion is that an annihilating AP at a presynaptic terminal can emphatically influence the voltage of a postsynaptic cell (the 'electrical coupling between neurons' of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and the data showing equal conduction velocity of anti- and orthodromically propagating APs in every preparation are convincing.

      The conclusions drawn from the synaptic modelling are considerably strengthened by the data in Figure 5. Here, the authors' model - including AP annihilation at a synaptic terminal - is used to predict the amplitude and direction of experimentally observed effects at the cerebellar basket cell-to-Purkinje cell synapse (Blot & Barbour 2014). One particular form of the model (RTM with tau=0.5ms and realistic non-excitability of the terminal) matches the experimental data extremely well. The authors also include a convincing demonstration (Panel A) that a propagating but not annihilating AP has almost no effect on a neighbouring neuron's activity. Given that the authors' model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function, the implications for the relevance of ephaptic coupling at different synaptic contacts may be widespread and important.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      The authors explain that an action potential that reach an axon terminal emits a small electrical field as it "annihilates". This happens even though there is no gap junction, at chemical synapses. The generated electrical field is simulated to show that it can affect a nearby, disconnected target membrane by tens of microvolts for tenths of a microsecond. Longer effects are simulated for target locations a few microns away.

      To simulate action potentials (APs), the paper does not use the standard HodgkinHuxley formalism because it fails to explain AP collision. Instead it uses the Tasaki and Matsumoto (TM) model which is simplified to only models APs with three parameters and as a membrane transition between two states of resting versus excited. The authors expand the strictly binary, discrete TM method to a Relaxing Tasaki Model (RTM) that models the relaxation of the membrane potential after an AP. They find that the membrane leak can be neglected in determining AP propagation and that the capacitive currents dominate the process.

      The strength of the work is that authors identified an important interaction between neurons that is neglected by the standard models. A weakness of the proposed approach is the assumptions that it makes. For instance, the external medium is modeled as a homogeneous conductive medium, which may be further explored to properly account for biological processes. To the authors’ credit, the external medium can be largely varying and could be left out from the general model, only to be modeled specific instances.

      The authors provide convincing evidence by performing experiments to record action potential propagation and collision properties and then developing a theoretical framework to simulate effect of their annihilation on nearby membranes. They provide both experimental evidence and rigorous mathematical and computer simulation findings to support their claims. The work has a potential of explaining significant electrical interaction between nerve centers that are connected via a large number of parallel fibers.

      Comments on revisions:

      The authors responded to all of my previous concerns and significantly improved the manuscript.

      We thank the reviewer for his comments and are pleased that we were able to adequately address all of his previous concerns. As a small comment to the remark of the reviewer “potential of explaining ... interaction ... via a large number of parallel fibers” we would like to add: The ephaptic coupling is prominent when APs annihilate at axon terminals, as we illustrate in Figure 4 and 5. Across parallel fibers, the impact of propagating APs is much lower but still may result in synchronization of APs.

      Reviewer 2:

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. In a revised version of the manuscript, it was also applied, with success, to published experimental data on the cerebellar basket cell-to-Purkinje cell pinceau connection. The conclusion is that an annihilating AP at a presynaptic terminal can emphatically influence the voltage of a postsynaptic cell (this is, presumably, the ’electrical coupling between neurons’ of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and the data showing equal conduction velocity of anti- and orthodromically propagating APs in every preparation is now convincing.

      The conclusions drawn from the synaptic modelling have been considerably strengthened by the new Figure 5. Here, the authors’ model - including AP annihilation at a synaptic terminal - is used to predict the amplitude and direction of experimentally observed effects at the cerebellar basket cell-to-Purkinje cell synapse (Blot & Barbour 2014). One particular form of the model (RTM with tau=0.5ms and realistic non-excitability of the terminal) matches the experimental data extremely well. This is a much more convincing demonstration that the authors’ model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function. As such, the implications for the relevance of ephaptic coupling at different synaptic contacts may be widespread and important.

      However, it appears that all of the models in the new Fig5 involve annihilating APs, yet only one fits the data closely. A key question, which should be addressed if at all possible, is what happens to the predictive power of the best-fitting model in Fig5 if the annihilation, and only the annihilation, is removed? In other words, can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects of, say AP waveform/amplitude/propagation, that explain the synaptic effects measured in Blot & Barbour (2014)? This would appear to be a necessary demonstration to fully support the claims of the title.

      Reviewer 2 (Recommendations for the authors):

      Can you clarify whether all models shown in Fig5 involve an annihilating AP? Is it possible to plot the predicted effects of the most successful model (RTM 0.5ms in B) with *only* the annihilation selectively removed?

      We are grateful for the reviewer’s comments and the specific suggestion for improvement (’...can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects...’). For illustrating the importance of annihilation, we added the results of our calculation when no annihilation occurs, i.e. for propagating APs in the source neuron (Figure 5A) and we modified the geometry of the source neuron in Figure 5B such that only the annihilation takes place. Together with the source neuron with similar properties to the Basket cell (Figure 5C), we now show the effect of annihilation and the effect of Basket cell specific geometry and physiology. We added and edited in the main text the following 4 sentences:

      ll 271: In our two models (TM and RTM), the modulation of not terminating but propagating APs along the source axon on the AP rate of the target cell is minute (Figure 5A). Note that this geometry does not correspond to the Purkinje cell-Basket cell connectivity. For annihilating APs at the axon terminal, with excitable segments up to the very end, our models reveal a moderate modulation, and only about half of what was reported for the Purkinje cell by Blot and Barbour (2014). This illustrates the importance of AP annihilation for ephaptic coupling (Figure 5B). We added and edited the figure legend:

      Figure 5. ... (A) excluding the annihilation of an AP at the source neuron, i.e. a propagating AP, cause only minute modulation of the predicted AP rate in the target neuron. Note that this example does not represent the Basket cell terminal with annihilating APs. (B) annihilation of an AP at the terminal of the source neuron, with all segments being excitable in our calculation, cause moderate modulation. (C) source neuron with similar properties to the Basket cell, i.e. a bouton and last segments non-excitable (corresponding to 15 µm with no switch from resting state to excited state), cause inhibition and rebound that is very similar as described by Blot and Barbour (2014).

      In the discussion, we extended one sentence to refer to Figure 5:

      ll 346: This may cause synchronization of APs and our proposed model also can be used to study the observed phenomena of synchronization due to ephaptic coupling, even in the case of zero discharge (see Figure 4A, and local impact on the target, integrated on timescales >1 ms in Figure 5).

    1. eLife Assessment

      This study presents important findings on the early development of cardiac and respiratory interoceptive sensitivity based on an investigation of infants aged 3, 9 and 18 months and on extensive statistical analyses. The evidence supporting the conclusions are convincing although the research faced technical challenges that limited part of the findings interpretation. This study will be of significant interest to developmental psychologists and neuroscientists working on interoception and its influence on socio-cognitive development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study investigated the development of interoceptive sensitivity in the context of cardiac and respiratory interoception in 3-, 9-, and 18-month-old infants using a combination of both cross-sectional and longitudinal designs. They utilised the cardiac interoception paradigm developed by Maister et al (2017) and also developed a new paradigm to investigate respiratory interoception in infants. The main findings of this research are that 9-month-old infants displayed a preference for stimuli presented synchronously with their own heartbeat and respiration. The authors found less reliable effects in the 18-month-old group, and this was especially true for the respiratory interoceptive data. The authors replicated a visual preference for synchrony over asynchrony for the cardiac domain in 3-month-old infants, while they found inconclusive evidence regarding the respiratory domain. Considering the developmental nature of the study, the authors also investigated the presence of developmental trajectories and associations between the two interoceptive domains. They found evidence for a relationship between cardiac and respiratory interoceptive sensitivity at 18 months only and preliminary evidence for an increase in respiratory interoception between 9 and 18 months.

      Strengths:

      The conclusions of this paper are mostly well supported by data, and the data analysis procedures are rigorous and well-justified. The main strengths of the paper are:

      - A first attempt to explore the association between two different interoceptive domains. How different organ-specific axes of interoception relate to each other is still open and exploring this from a developmental lens can help shed light into possible relationships. The authors have to be commended for developing a novel interoceptive tasks aimed at assessing respiratory interoceptive sensitivity in infants and toddlers, and for trying to assess the relationship between cardiac and respiratory interoception across developmental time.<br /> - A thorough justification of the developmental ages selected for the study. The authors provide a rationale behind their choice to examine interoceptive sensitivity at 3, 9, and 18-months of age. These are well justified based on the literature pertaining to self- and social development. Sometimes, I wondered whether explaining the link between these self and social processes and interoception would have been beneficial as a reader not familiar with the topics may miss the point.<br /> - An explanation of direction of looking behaviour using latent curve analysis. I found this additional analysis extremely helpful in providing a better understanding of the data based on previous research and analytical choices. As the authors explain in the manuscript, it is often difficult to interpret the direction of infant looking behaviour as novelty and familiarity preferences can also be driven by hidden confounders (e.g. task difficulty). The authors provide compelling evidence that analytical choices can explain some of these effects. Beyond the field of interoception, these findings will be relevant to development psychologists and will inform future studies using looking time as a measure of infants' ability to discriminate among stimuli.<br /> - The use of simulation analysis to account for small sample size. The authors acknowledge that some of the effects reported in their study could be explained by a small sample size (i.e. the 3-month-olds and 18-month-olds data). Using a simulation approach, the authors try to overcome some of these limitations and provide convincing evidence of interoceptive abilities in infancy and toddlerhood (but see also my next point).

      Comments on revision:

      The authors have clearly addressed the comments on the previous version of this manuscript. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Tünte et al. investigated the development of interoceptive sensitivity during the first year of life, focusing specifically on cardiac and respiratory sensitivity in infants aged 3, 9, and 18 months. The research employed a previously developed experimental paradigm for the cardiac domain and adapted it for a novel paradigm in the respiratory domain. This approach assessed infants' cardiac and respiratory sensitivity based on their preferential looking behavior toward visuo-auditory stimuli displayed on a monitor, which moved either in sync or out of sync with the infants' own heartbeats or breathing. The results in the cardiac domain showed that infants across all age groups preferred stimuli moving synchronously rather than asynchronously with their heartbeat, suggesting the presence of cardiac sensitivity as early as 3 months of age. However, it is noteworthy that this preference direction contradicts a previous study, which found that 5-month-old infants looked longer at stimuli moving asynchronously with their heartbeat (Maister et al., 2017). In the respiratory domain, only the group of 9-month-old infants showed a preference for stimuli presented synchronously with their breathing. The authors conducted various statistical analyses to thoroughly examine the obtained data, providing deeper insights valuable for future research in this field.

      Strengths:

      Few studies have explored the early development of interoception, making the replication of the original study by Maister et al. (2017) particularly valuable. Beyond replication, this study expands the investigation into the respiratory domain, significantly enhancing our understanding of interoceptive development. The provision of longitudinal and cross-sectional data from infants at 3, 9, and 18 months of age is instrumental in understanding their developmental trajectory.

      Weaknesses:

      Due to a technical error, this study failed to counterbalance the conditions of the first trial in both the iBEAT and iBREATH tests. Although the authors addressed this issue as much as possible by employing alternative analyses, it should be noted that this error may have critically influenced the results and, thus, the conclusions.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      We sincerely appreciate the time and effort you and the reviewers have invested in evaluating our work.

      We are grateful for the constructive criticism of the reviewers. Building up on their feedback we have made additions to the reviewed preprint. Specifically, we have added information to the supplementary materials to give additional context on the impact of the fixed experimental design on infants’ looking behavior. Further, we have adapted the text throughout the manuscript to incorporate a thorough discussion of the impact of the experimental design.

      We believe that these revisions and the inclusion of supplementary analyses provide a clearer understanding of our findings.

    1. eLife Assessment

      In flies defective for axonal transport of mitochondria, the authors report the upregulation of one subunit, the beta subunit, of the heterotrimeric eIF2 complex via mass spectroscopy proteome analysis. Neuronal overexpression of eIF2β phenocopied aspects of neuronal dysfunction observed when axonal transport of mitochondria was compromised. Conversely, lowering eIF2β expression suppressed aspects of neuronal dysfunction. While these are intriguing, potentially useful observations, several technical weaknesses limit the interpretation and mean the evidence supporting the current claims is incomplete.

    2. Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging.

    3. Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion).

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your review and comments. We included analyses of protein levels of eIF2α, eIF2β, and eIF2γ at 7 days and 21 days (Figure 4D). The manuscript was revised as below;

      Lines 242-245 ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      Thank you for pointing it out, and we apologize for an insufficient description of the result. We included quantitation of the levels of LC3-I and LC3-II in Figure 2A, 2D, 3D, 6B and 7B. As the reviewer pointed out, changes in the LC3-II/LC3-I ratio do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, 6C, 7C in the original manuscript), these results collectively suggest that autophagy is lowered. We revised the manuscript to include this discussion as below:

      Lines 174-186 ‘During autophagy progression, LC3 is conjugated with phosphatidylethanolamine to form LC3-II, which localizes to isolation membranes and autophagosomes. LC3-I accumulation occurs when autophagosome formation is impaired, and LC3-II accumulation is associated with lysosomal defects(31,32). p62 is an autophagy substrate, and its accumulation suggests autophagic defects(31,32). We found that milton knockdown increased LC3-I, and the LC3-II/LC3-I ratio was lower in milton knockdown flies than in control flies at 14-day-old (Figure 2A). We also analyzed p62 levels in head lysates sequentially extracted using detergents with different stringencies (1% Triton X-100 and 2% SDS). Western blotting revealed that p62 levels were increased in the brains of 14-day-old of milton knockdown flies (Figure 2B). The increase in the p62 level was significant in the Triton X-100-soluble fraction but not in the SDS-soluble fraction (Figure 2B), suggesting that depletion of axonal mitochondria impairs the degradation of less-aggregated proteins.’

      Line 189-190 : ‘At 30 day-old, LC3-I was still higher, and the LC3-II/LC3-I ratio was lower, in milton knockdown compared to the control (Figure 2D).’

      Line 199-201: ‘However, in contrast with milton knockdown, Pfk knockdown did not affect the levels of LC3-I, LC3-II or the LC3-II/LC3-I ratio (Figure 3D).’

      Line 275-281: ‘Neuronal overexpression of eIF2β increased LC3-II, while the LC3-II/LC3-I ratio was not significantly different (Figure 6A and B). Overexpression of eIF2β significantly increased the p62 level in the Triton X-100-soluble fraction (Figure 6C, 4-fold vs. control, p < 0.005 (1% Triton X-100)) but not in the SDS-soluble fraction (Figure 6C, 2-fold vs. control, p = 0.062 (2% SDS)), as observed in brains of milton knockdown flies (Figure 2B). These data suggest that neuronal overexpression of eIF2β accumulates autophagic substrates.’

      Line 307-315: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 7B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 7C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 7C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for pointing it out. We included plots of the results of 21-day-old proteome as a part of the main figure (Figure 4C). As the reviewer pointed out, eIF2β protein levels are reduced at the 21-day-old. Since a reduction in the eIF2_β_ ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7D), the reduction in eIF2β observed in the 21-day-old milton knockdown flies is not likely to negatively contribute to milton knockdown-induced defects. We included this discussion in the manuscript as below:

      Lines 337-341:‘eIF2β protein levels are reduced at the 21-day-old; however, since a reduction in the eIF2β ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7), the reduction in eIF2β observed in the 21-day-old is not likely to negatively contribute to milton knockdown-induced defects.’

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation.

      We are sorry for our insufficient explanation in the previous version. As the reviewer pointed out, it is well known that the phosphorylated form of eIF2α inhibits translation initiation. Neuronal knockdown of milton caused a reduction in p-eIF2α (Figure 4J and K), and it also lowered translation (Figure 5); the relationship between these two events is currently unclear. We do not think that a reduction in the p-eIF2α suppressed translation; rather, we propose that the unbalance of expression levels of the components of eIF2 complexes negatively affects translation. We revised discussion sections to describe our interpretation more in detail as below:

      Line 368-378: ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes(39,40). Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 4). However, we also found that global translation was reduced (Figure 5). It may be possible that increased levels of eIF2β disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 6).’

      We have revised the graphical abstract and removed the eIF2 complex since its role in the loss of proteostasis caused by milton knockdown has not been elucidated yet.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%.

      Thank you for pointing it out. It was a mistake of 10-50%, and we apologize for the oversight. It was corrected (Figure 5).

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript.

      Thank you for pointing it out. We revised the graph (Figure 5).

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. We agree that it would be an interesting experiment, but it will take a considerable amount of time to analyze axonal translation with spatial resolution. We will try to include such analyses in the future. For this manuscript, we revised the discussion section to include the reviewer's suggestion as below;

      Lines 351-353: ‘Further analyses to dissect the effects of milton knockdown on proteostasis and translation in the cell body and axon by experiments with spatial resolution would be needed.’

      Recommendations for the authors:

      From the Reviewing Editor:

      As the Reviewing Editor, I have read your manuscript and the associated peer reviews. I have concerns about publishing this work in its current form. I think that your manuscript cannot claim to have found a novel function of eIF2beta because of technical uncertainties and conceptual problems that should be addressed.

      Thank you so much for your review and comments. We addressed all the concerns raised by the reviewers. Point-by-point responses are listed below.

      First, your manuscript is based partly on what appears to be a mistaken understanding of the mechanistic basis of the ISR. Specifically, eIF2 is a heterotrimeric complex of alpha, beta, and gamma subunits. When eIF2a is phosphorylated, the heterotrimer adopts a new conformation. This conformation directly binds and inhibits eIF2B, the decameric GEF that exchanges the GDP bound to the gamma subunit of the eIF2 complex for GTP. Unless I misunderstood your paper, you seem to propose that decreasing levels of phospho-eIF2a will inhibit translation, but this is backward from what we know about the ISR.

      Thank you for your insightful comment, and we are sorry for the confusion. We did not mean to propose that decreasing levels of phospho-eIF2_a_ inhibits translation. We apologize for our insufficient explanation, which might have caused a misunderstanding (Lines 312-318 in the original version). We agree with the reviewer that ‘mismatch due to elevated eIF2-beta could change the behavior of the ISR’. We revised the text in the result section as follows:

      Lines 259-264 (in the Result section) ‘Phosphorylation of eIF2α induces conformational changes in the eIF2 complex and inhibits global translation(36). To analyze the effects of milton knockdown on translation, we performed polysome gradient centrifugation to examine the level of ribosome binding to mRNA. Since p-eIF2α was downregulated, we hypothesized that milton knockdown would enhance translation. However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 5A and B).’

      Lines 368-378 (in the Discussion section): ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes(39,40). Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 4). However, we also found that global translation was reduced (Figure 5). It may be possible that increased levels of eIF2β disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 6).’

      It may be possible that a stoichiometric mismatch due to elevated eIF2-beta could change the behavior of the ISR, but your paper doesn't adequately address the expression levels of all three eIF2 subunits: alpha, beta, and gamma. The proteomic data shown in Fig 4B is unconvincing on its own because the changes in the beta subunit are subtle. The Western blot in Figure 4C suggests that the KD changes the mass or mobility of the beta subunit, and most importantly, there are no Western blots measuring the levels of eIF2a, eIF2a-phospho, or eIF2-gamma.

      We appreciate the reviewer’s comment and agree that the stoichiometric mismatch due to elevated eIF2β may interfere with ISR. We found overexpression of eIF2β lowered p-eIF2 alpha (Figure S2 in V1), which supports this model. We included this data in the main figure in the revised manuscript (Figure 6D) and revised the text as below:

      Lines 279-281: ‘Since milton knockdown reduced the p-eIF2α level (Figure 4K), we asked whether an increase in eIF2β affects p-eIF2α. Neuronal overexpression of eIF2β did not affect the eIF2α level but significantly decreased the p-eIF2α level (Figure 6D, E).’

      Expression data of eIF2α and eIF2γ from proteomic analyses has been extracted from proteome analyses and included as a table (Figure 4D). Western blots of phospho-eIF2a (Figure S1 in V1) in the main figure (Figure 4G). The result section was revised as below;

      Lines 242-245: ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      Reviewer #1 (Recommendations For The Authors):

      L125-128: In this section, while the efficiency of Milton knockdown is referenced from a previous publication, it is necessary to also mention that the Miro knockdown has been similarly reported in the literature. Additionally, the Methods section lacks details on the Miro RNAi line used, and Table 2 does not include the genotype for Miro RNAi. This information should be included for clarity and completeness.

      Thank you for pointing it out. Knockdown efficiency with this strain has been reported (Iijima-Ando et al., PLoS Genet, 2012). We revised the text to include citation and knockdown efficiency as follows:

      Lines 139-147: ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1). We also analyzed the effect of the neuronal knockdown of Miro, a partner of milton, on the accumulation of ubiquitin-positive proteins. Since severe knockdown of Miro in neurons causes lethality, we used UAS-Miro RNAi strain with low knockdown efficiency, whose expression driven by elav-GAL4 caused 30% reduction of Miro mRNA in head extract(24). Although there was a tendency for increased ubiquitin-positive puncta in Miro knockdown brains, the difference was not significant (Figure 1B, p>0.05 between control RNAi and Miro RNAi). These data suggest that the depletion of axonal mitochondria induced by milton knockdown leads to the accumulation of ubiquitinated proteins before neurodegeneration occurs.’

      L132-L136: The current phrasing in this section suggests an increase in ubiquitinated proteins for both Milton and Miro knockdowns. However, since there is no significant difference noted for Miro, it is incorrect to state an increase in ubiquitin-positive puncta. Furthermore, combining the results of Milton knockdown to claim an increase in ubiquitinated proteins prior to neurodegeneration is misleading. At the very least, the expression here needs to be moderated to accurately reflect the findings.

      Thank you for pointing it out. We revised the text as above.

      L137-L141: Results in Figure 1 indicate that Milton knockdown leads to an increase in ubiquitinated proteins at 14 days, while Miro knockdown shows no difference from the control at either 14 or 30 days. Conversely, both the control and Miro exhibit an increase in ubiquitinated proteins with aging, but this trend does not seem to apply to Milton knockdown. This observation suggests that Milton KD may not affect the changes in protein quality control associated with aging. It implies that Milton's function might be more related to protein homeostasis in younger cells, or that changes due to aging might overshadow the effects of Milton knockdown. These interpretations should be included in the Results or Discussion sections for a more comprehensive analysis.

      Thank you for your insightful comment. We revised the text to include those points as follows:

      Lines 152-153: ‘These results suggest that depletion of axonal mitochondria may have more impact on proteostasis in young neurons than in old neurons.’

      Lines 355-362: ‘The depletion of axonal mitochondria and accumulation of abnormal proteins are both characteristics of aged brains(37,38). Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. Neuronal knockdown of milton had more impact on proteostasis in young neurons than the old neurons (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). The reduction in axonal transport of mitochondria may be one of the triggering events of age-related changes and accelerates the onset of aging in the brain.’

      L143 : Please remove the erroneously included quotation mark.

      Thank you for pointing it out. We corrected it.

      L145-L147:

      - While it is understood that Milton knockdown results in a reduction of mitochondria in axons, as reported previously and seemingly indicated in Figure 1E, this paper repeatedly refers to axonal depletion of mitochondria. Therefore, it would be beneficial to quantitatively assess the number of mitochondria in the axonal terminals located in the lamina via electron microscopy. Such quantification would robustly reinforce the argument that mitochondrial absence in axons is a consequence of Milton knockdown.

      Thank you for pointing it out. We included quantitation of the number of mitochondria in the synaptic terminals (Figure 1E).

      The text and figure legend was revised accordingly:

      Lines 156-157: ‘As previously reported(24), the number of mitochondria in presynaptic terminals decreased in milton knockdown (Figure 1E).’

      - The knockdown of Milton is known to reduce mitochondrial transport from an early stage, but what about swelling? By observing swelling at 1 day and 14 days, it may be possible to confirm the onset of swelling and discuss its correlation with the accumulation of ubiquitinated proteins.

      Quantitation of axonal swelling has also been included (Figure 1F).

      We appreciate reviewer’s comments on the correlation between the accumulation of ubiquitinated proteins and axonal swelling. Axonal swelling was not observed at 3-days-old (Iijima-Ando et al., PLoS Genetics, 2012), indicating that axonal swelling is an age-dependent event. Dense materials are found in swollen axons more often than in normal axons, suggesting a positive correlation between disruption of proteostasis and axonal damage. It would be interesting to analyze the time course of events further; however, we feel it is beyond the scope of this manuscript. We revised the text as below to include this discussion:

      Lines 157-159: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old(24) but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 162-167: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H). In milton knockdown neurons, dense materials are found in swollen presynaptic terminals more often than in presynaptic terminals without swelling, suggesting a positive correlation between the disruption of proteostasis and axonal damage (Figure 1G).’

      Lines 362-365: ‘Disruption of proteostasis is expected to contribute neurodegeneration(38), and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown ((24,29) and Figure 1) in detail with higher time resolution.’

      L147-L151: Though Figures 1F and 1G provide qualitative representations, it is advisable to quantitatively assess whether dense materials significantly accumulate. Such quantitative analysis would be required to verify the accumulation of dense materials in the context of the study.

      Thank you for pointing it out. We included quantitation of the number of neurons with dense material (Figure 1G). We revised the manuscript as follows:

      Line 161-163: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H).’

      Regarding Figure 1B, C:

      - Even though the count of puncta in the whole brain appears to be fewer than 400, the magnification of the optic lobe suggests a substantial presence of puncta. Please clarify in the Methods section what constitutes a puncta and whether the quantification in the whole brain is based on a 2D or 3D analysis. Detail the methodology used for quantification.

      Thank you for your comment. We revised the method section to include more details as below:

      Lines 434-437: ‘Quantitative analysis was performed using ImageJ (National Institutes of Health) with maximum projection images derived from Z-stack images acquired with same settings. Puncta was identified with mean intensity and area using ImageJ.’

      - What about 1-day-old specimens? Does Milton knockdown already show an increase in ubiquitinated protein accumulation at this early stage? Investigating whether ubiquitin-protein accumulation is involved in aging promotion or is already prevalent during developmental stages is a necessary experiment.

      Thank you for your comment. We carried out immunostaining with an anti-ubiquitin antibody in the brains at 1-day-old. No significant difference was detected between the control and milton knockdown. This result has been included as Figure S1 in the revised manuscript. The result section was revised as below:

      Line 136-139 ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1).’

      For Figure 1E: In the Electron Microscopy section of the Methods, define how swollen axons were identified and describe the quantification methodology used.

      Thank you for your comment. Swollen axons are, unlike normal axons, round in shape and enlarged. We revised the text as below;

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old(24) but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 683-684, Figure 1 legend: ‘Swollen presynaptic terminals (asterisks in (F)), characterized by the enlargement and higher circularity, were found more frequently in milton knockdown neurons.’

      L218-L219: Throughout the text, the expression 'eIF2β is "upregulated" in response to Milton knockdown' is frequently used. However, considering the presented results, it might be more accurate to interpret that under the condition of Milton knockdown, eIF2β is not undergoing degradation but rather remains stable.

      Thank you for pointing it out. We replaced ‘upregulated’ with ‘increased’ throughout the text.

      L234-L235: On what basis is the conclusion drawn that there is a reduction? Given that three experiments have been conducted, it would be possible and more convincing to quantify the results to determine if there is a significant decrease.

      Thank you for pointing it out. We quantified the AUC of polysome fraction and carried out statistical analysis. There is a significant decrease in polysome in milton knockdown, and this result has been included in Figure 5B. We revised the figure and the legend accordingly.

      L236: 5H-> 4H

      Thank you for pointing it out, and we are sorry for the confusion. We corrected it.

      L238-L239: Since there is no significant difference observed, it may not be accurate to interpret a reduction in puromycin incorporation.

      Thank you for pointing it out. As described above, quantification of polysome fractions showed that milton knockdown significantly reduce polysome (Figure 5B). We revised the manuscript as below;

      Lines 263-264: ‘However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 5A and B).’

      Figure 5D and Figure 6D: Climbing assays have been conducted, but I believe experiments should also be performed to examine whether overexpression or heterozygous mutants of eIF2β induce or suppress degeneration.

      Thank you for pointing it out. We analyzed the eyes with eIF2_β_ overexpression for neurodegeneration. Although there was a tendency of elevated neurodegeneration in the retina with eIF2_β_ overexpression, the difference between control and eIF2_β_ overexpression did not reach statistical significance (Figure S2). This result has been included as Figure S2 in the revised manuscript, and the following sentences have been included in the text:

      Lines 288-293: ‘We asked if eIF2β overexpression causes neurodegeneration, as depletion of axonal mitochondria in the photoreceptor neurons causes axon degeneration in an age-dependent manner(24). eIF2β overexpression in photoreceptor neurons tends to increase neurodegeneration in aged flies, while it was not statistically significant (p>0.05, Figure S2).’

      L271-L272: The results in Figure 6B are surprising. I anticipated a greater increase compared to the Milton knockdown alone. While p62 appears to be reduced, it is not clear why these results lead to the conclusion that lowering eIF2β rescues autophagic impairment. Please add a discussion section to address this point.

      Thank you for pointing it out. We apologize for the unclear description of the result. Milton knockdown flies show p62 accumulation (Figure 2), and deleting one copy of eIF2beta in milton knockdown background reduced p62 accumulation (Figure 7C). We revised the text as below:

      Lines 307-315: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 7B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 7C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 7C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      L369: Please specify the source of the anti-ubiquitin antibody used.

      Thank you for pointing it out. We included the antibody information in the method section.

      Figure 7: While the relationship between Milton knockdown and the eIF2β and eIF2α proteins has been elucidated through the authors' efforts, I would like to see an investigation into whether eIF2β is upregulated and eIF2α phosphorylation is reduced in simply aged Drosophila. This would help us understand the correlation between aging and eIF2 protein dynamics.

      Thank you for your comment. We agree that it is an important question, and we are working on it. However, we feel that it is beyond the scope of the current manuscript.

      L645-L646: If the mushroom body is identified using mito-GFP, then include mito-GFP in the genotype listed in Supplementary Table 2.

      We are sorry for the oversight. We corrected it in Supplementary Table 2.

      Additionally, while it is presumed that the mito-GFP signal decreases in axons with Milton RNAi, how was the lobe tips area accurately selected for analysis? Please include these details along with a comprehensive description of the quantification methodology in the Methods section.

      Thank you for your comment. Although the mito-GFP signal in the axon is weak in the milton knockdown neurons, it is sufficient to distinguish the mushroom body structure from the background. We revised the method section to include this information in the method section:

      Line 437-438: ‘For eIF2α and p-eIF2α immunostaining, the mushroom body was detected by mitoGFP expression.’

    1. eLife Assessment

      The paper presents valuable computational findings on how growth feedback affects the performance of synthetic gene circuits designed for adaptive responses. By systematically analyzing over four hundred circuit topologies, the authors provide solid evidence for their conclusions on failure mechanisms and design features that enhance robustness against growth dynamics. While the study's significance and rigor are somewhat constrained by its reliance on previously published network topologies, these results are highly relevant for advancing the engineering of gene circuits in various applications.

    2. Joint Public Review:

      Engineered artificial gene regulatory networks ("circuits") have a wide range of applications, but their design is often hindered by unforeseen interactions between the host and circuit processes. This manuscript employs computational modeling to investigate how growth feedback influences the performance of synthetic gene circuits capable of adaptation. By analyzing 425 hypothetical circuits previously identified as achieving nearly perfect adaptation (Ma et al., 2009; Shi et al., 2017), the authors introduce growth feedback into their models using additional terms in ordinary differential equations. Their simulations reveal that growth feedback can disrupt adaptation dynamics in diverse ways but also identify core motifs that ensure robust performance under such conditions. Additionally, they establish a scaling law linking circuit robustness to the strength of growth feedback. The findings have important implications for synthetic biology, where host-circuit interactions frequently compromise desired behaviors, and for systems biology, by advancing the understanding of network motif dynamics. The authors' classification schemes will be highly valuable to the community, offering a framework for addressing growth-related challenges in circuit design.

      Strengths<br /> - A detailed investigation into the reasons for adaptation failure upon the introduction of cell growth was conducted, distinguishing this work from other studies of functional screening in gene regulatory network topologies. The comprehensiveness of the analysis is particularly noteworthy.<br /> - Approaches for assessing robustness, such as the survival ratio Q, were employed, providing tools that may be applicable to a broad range of network topologies beyond adaptation. The scaling law derived from these approaches is both novel and insightful.<br /> - A thorough numerical analysis of three gene regulatory networks exhibiting adaptation was performed. For each of the 425 topologies analyzed, approximately 2e5 circuits were sampled using Latin hypercube sampling, ensuring robust coverage of the parameter space. Among these, 1.5e5 circuits were identified as showing adaptation and subsequently subjected to further analysis, yielding approximately 350 parametric designs per topology for deeper investigation.<br /> - The systematic approach and depth of the analysis position this study as a significant contribution to the understanding of gene regulatory networks and their response to growth feedback. The combination of detailed investigation, novel robustness metrics, and rigorous computational techniques enhances the impact of this work within the field.

      Weaknesses<br /> - The study focuses exclusively on a preselected set of 425 topologies previously shown to achieve adaptation, limiting the exploration of whether growth feedback could enable adaptation in circuits not inherently adaptive. While the authors have discussed and justified this choice, the focus restricts the generality of the conclusions, as the potential for growth feedback to induce adaptation in non-adaptive circuits remains unaddressed. The analysis includes scenarios where higher growth feedback restores adaptation in circuits that lose it at intermediate levels, but further elaboration on the implications for circuit design would strengthen the impact. The numerical framework and parameter choices align well with established methods, and an overview of the selected topologies has been provided. However, offering detailed information in supplementary materials or a public repository would further enhance the paper's accessibility and reproducibility.

      - The model fails to capture the influence of protein levels on growth. To ensure accurate modeling of protein-level effects on growth, the b(t) term should be scaled appropriately, similar to Tan et al. Nature Chemical Biology 5:842-848 (2009).

      - The authors propose bistability or multistability as the primary mechanisms behind different types of adaptation failure, explaining why the failures do not occur precisely at bifurcation points. They argue that their ODE simulations provide evidence for oscillation-related bifurcations, and an included appendix explores this phenomenon further, detailing how it can be observed in their results. While the authors choose not to apply semi-analytic methods, such as numerical continuation and eigenvalue analysis, to validate the existence of bifurcations, their approach offers valuable insights into the underlying dynamics of adaptation failures.

      - The analysis in this work is carried out exclusively in a deterministic regime, as the focus is on scenarios where the effects of noise are assumed to be minimal. This approach is justified, and the authors acknowledge the complexity of extending their analysis to include stochasticity, which they suggest as an avenue for future research. The discussion has been expanded to address the potential impact of noise, its handling, and the assumptions underlying its exclusion. It is important to note, however, that noise can significantly alter system behavior-for instance, stabilizing trajectories and removing oscillations, as shown in prior studies (e.g., 10.1016/j.cels.2016.01.004). Additionally, variability in experimental implementations may influence the dynamics beyond what is predicted in deterministic models. These factors should be considered when interpreting the results.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point response to the public review:

      General Comment: “Using computational modeling, this manuscript explores the effect of growth feedback on the performance of gene networks capable of adaptation. The authors selected 425 hypothetical synthetic circuits that were shown to achieve nearly perfect adaptation in two earlier computational studies (see Ma et al. 2009, and Shi et al. 2017). They examined the effects of cell growth feedback by introducing additional terms to the ordinary differential equation-based models, and performed numerical simulations to check the retainment and the loss of the adaptation responses of the circuits in the presence of growth feedback. The authors show that growth feedback can disrupt the gene network adaptation dynamics in different ways, and report some exceptional core motifs which allow for robust performance in the presence of growth feedback. They also used a metric to establish a scaling law between a circuit robustness measure and the strength of growth feedback. These results have important implications in the field of synthetic biology, where unforeseen interactions between designed gene circuits and the host often disrupt the desired behavior. The paper’s conclusions are supported by their simulation results, although these are presented in their summary formats and it would be useful for the community if the detailed results for each topology were available as a supplementary file or through the authors’ GitHub repository.”

      We are grateful for the referee’s positive evaluation of our work. We have updated our GitHub and OSF repositories with detailed results for each topology. Additionally, we have included other simulation codes, result data, and detailed explanations in these two repositories that may be of interest to our readers.

      Strength 1: “This work included a detailed investigation of the reasons for adaptation failure upon introducing cell growth to the systems. The comprehensiveness of the analysis makes the work stand out among studies of functional screening of network topologies of gene regulation.”

      We are grateful for the referee’s positive assessment of our work, notably the recognition of the ‘detailed investigation’ we conducted, and the ‘comprehensiveness of the analysis’ we provided.

      Strength 2: “The authors’ approaches for assessment of robustness, such as the survival ratio Q, can be useful for a wide range of topologies beyond adaptation. The scaling law obtained with those approaches is interesting.”

      We are grateful for the referee’s positive evaluation of our defined factors for assessing circuit robustness. We also appreciate the acknowledgment of the “interesting” nature of the scaling law we discovered using the assessment factor R.

      Weaknesses 1: “The title suggests that the work investigates the ’effects of growth feedback on gene circuits’. However, the performance of ’nearly perfect adaptation’ was chosen for the majority of the work, leaving the question of whether the authors’ conclusion regarding the effects of growth feedback is applicable to other functional networks.”

      We agree that our present title can be too broad, and we have changed it from “Effects of growth feedback on gene circuits: A dynamical understanding” to “Effects of growth feedback on adaptive gene circuits: A dynamical understanding”. Although we have some brief results and discussions on the gene circuits with bistability, we admit that most of our results and discussions are focused on circuits that have adaptation.

      The new title is more specific and should be a more appropriate summary of the paper.

      Weaknesses 2: “This work relies extensively on an earlier study, evaluating only a selected set of 425 topologies that were shown to give adaptive responses (Shi et al., 2017). This limited selection has two potential issues. First, as the authors mentioned in the introduction, growth feedback can also induce emerging dynamics even without existing function-enabling gene circuits, as an example of the ”effects of growth feedback on gene circuits”. Limiting the investigation to only successful circuits for adaptation makes it unclear whether growth feedback can turn the circuits that failed to produce adaptation by themselves into adaptation-enabling circuits. Secondly, as the Shi et al. (2017) study also used numerical experiments to achieve their conclusions about successful topologies, it is unclear whether the numerical experiments in the present study are compatible with the earlier work regarding the choice of equation forms and ranges of parameter values. The authors also assumed that all readers have sufficient understanding of the 425 topologies and their derivation before reading this paper.”

      We agree with the reviewer that several issues need to be clarified in our new manuscript. We have added new discussions for all of them.

      We agree with the reviewer that growth feedback could turn the non-adaptive circuits into adaptationenabling circuits, and this indeed presents a compelling topic for future research. We have added the following discussions to our paper, talking about a relevant matter. We find that in our simulated dataset, there are cases where a higher degree of growth feedback can restore the adaptation that has been lost in a circuit. However, as we discussed in this new paragraph, a comprehensive study in the direction of turning non-adaptive circuits into adaptation-enabling circuits will “require entirely different approaches for sampling circuit parameters and selecting candidate network topologies, demanding significantly high computational costs.” Given that this topic extends beyond the scope of the current paper, we leave this matter to future research.

      “Although the primary focus of this paper is on how growth feedback can undermine an originally adaptive circuit and how to design circuits that are robust against such feedback, our simulated dataset reveals instances where growth feedback can benefit the circuit within certain ranges. Specifically, we identified 2,092 circuits across 306 different topologies where adaption, lost at an intermediate level of growth feedback, is restored at higher levels. This is 1.4% of all circuits tested. We anticipate that additional circuits exhibiting this loss-and-recovery behavior exist, as our sampling of six discrete levels of k<sub>g</sub> (0,0.2,0.4,0.6,0.8,1.0) might have overlooked numerous cases. This result again suggests the possible advantages of growth feedback in gene circuits (Tan et al., 2009; Nevozhay et al., 2012; Deris et al., 2013; Feng et al., 2014; Melendez-Alvarez and Tian, 2022). A comprehensive study into how growth feedback can endow or enhance adaption in circuits would require entirely different approaches for sampling circuit parameters and selecting candidate network topologies, demanding significantly high computational costs. Given that this topic extends beyond the scope of the current paper, we leave this matter to future research.”

      We have added the following discussions about the reasoning behind using the 425 network topologies selected from the study Shi et al. (2017).

      “We use these 425 network topologies from the study (Shi et al., 2017), avoiding redundancy with established results. Due to the unique focus of our research on the effects of growth feedback and the need to evaluate quantitative ratios of robust circuits among all functional ones, we have chosen to use a 20-fold increase in the number of random parameter sets for each network topology compared to the simulations in (Shi et al., 2017). This approach makes it computationally prohibitive to scan all possible 16,038 three-node circuits. We carefully follow the settings in (Shi et al., 2017), which also analyzed TRNs with the AND logic as in this paper. Detailed descriptions of our simulation experiments are provided in the Methods section. To make our results more convincing, we have adopted a set of adaptation criteria that are stricter than those used in (Shi et al., 2017). Consequently, the ratio of adaptive circuits is somewhat lower in our study, with 4 out of the 425 network topologies not demonstrating adaptation.”

      Other than the more strict adaptation criteria and much larger sampling sizes, as we mentioned in this paragraph, we have carefully followed the simulation details of the study Shi et al. (2017). This includes but is not limited to: the dynamical equations (when k<sub>g</sub> = 0), the input signals, the scales and ranges of the circuit parameters to be randomly sampled, and the sampling method (Latin hypercube sampling). One of the authors of the current paper was also the first author of the study Shi et al. (2017), who helped us verify the details of simulations (among many other contributions). These identical settings justify our usage of the established results with the 425 network topologies.

      To provide more information about these 425 network topologies, We have added the following introduction. It introduces the structural features of the networks, especially the shared core motifs for adaptation. In our GitHub and OSF repositories, we have also provided relevant data about the 425 topologies, including the topology structures and the parameter sets we scanned.

      “These topologies can be classified into two families based on the core topology: networks with a negative feedback loop (NFBL) and networks with an incoherent feed-forward loop (IFFL) (Shi et al., 2017). More specifically, there are 206 network topologies in the NFBL family. All of these NFBL topologies have a negative feedback loop for node B. This negative feedback loop can be formed by the loop from node B to A and back to B (such as the circuit shown in Fig. 1 (a)), by node B to C and back to B, or by a longer route, from node B to A and then to C and back to B. There is always a self-activation link from B to B in all these 206 NFBL networks. There are 219 network topologies in the IFFL family. All of them have two feed-forward pathways from the input node A to the output node C. One pathway goes from node A to C directly, while the other involves node B in the middle. One of the pathways is activating while the other one is inhibitory.”

      Weaknesses 3: “The authors’ model does not describe the impact of growth via a biological mechanism: they model growth as an additional dilution rate and calculate growth rate based on a phenomenological description with growth rate occurring at a maximum (k<sub>g</sub>) scaled by the circuit ’burden’ b(t). Therefore, the authors’ model does not capture potential growth rate changes in parameter values (e.g., synthetic protein production falls with increasing growth rate; see Scott & Hwa, 2023).”

      In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work, Zhang, et al. Nature chemical biology 16.6 (2020): 695-701. We agree that an increased growth rate can change synthetic protein production. However, the dynamic roles of the dilution and growthaffected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth as mentioned by the reviewer. Still, we agree that taking the growth effect on the production rate into account would provide a more comprehensive study, but it is beyond the scope of the present work. We have added the following paragraph in the Discussion section of our paper.

      “In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work (Zhang et al. (2020)). However, growth feedback is inherently complex (Klumpp et al. (2009)). For instance, an increased growth rate can change protein synthesis rate (Hintsche and Klumpp (2013); Scott and Hwa (2023)), and cell growth rates can affect the distribution of protein expression in cell populations (Gouda et al. (2019)). In our paper, we concentrate on a simplified model with dilution, which we consider to have captured the dominant factor. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. Incorporating the impact of growth rate on protein synthesis into our model would offer a more comprehensive analysis, a task beyond the scope of this paper but presenting an intriguing opportunity for future research to address the complexities of growth feedback.”

      Weaknesses 4: “The authors made several claims about the bifurcations (infinite-period, saddle-node, etc) underlying the abrupt changes leading to failures of adaptations. There is a lack of evidence supporting these claims. Both local and global bifurcations can be demonstrated with semi-analytic approaches such as numerical continuation along with investigations of eigenvalues of the Jacobian matrix. The claims based on ODE solutions alone are not sound.”

      After our further simulations and verification, we found that most of the bifurcation-induced failures we mentioned in type-V and type-VI failures should be categorized as bistability or multistability-induced failures. They are still abrupt switching between adaptive and non-adaptive states, as we described in the previous version of the manuscript. However, they are actually still far away from the bifurcation points at the critical k<sub>g</sub>. We have corrected all relevant descriptions and figures, including panel Fig. 4 (c) and its captions. We have added the following paragraph in the paper to explain this issue.

      “One might expect bifurcations to play an important role in many type-V and type-VI failures. However, in our simulations, failures precisely at the bifurcation point are not observed. This is because the bifurcation points under consideration, such as fold bifurcations, are where one of the attraction basins diminishes to zero. For a failure to occur exactly at the bifurcation point, the initial condition would need to coincide precisely with the infinitesimally small basin just before it vanishes. More realistically, failures almost always largely precede the exact bifurcation point. They happen while the basin is still contracting and the basin boundary crosses the initial condition or O<sub>1</sub>. An example is shown in Fig. 4(b), where bistability persists, yet the lighter orange basin with a larger O<sub>1</sub>(C) cannot be reached as the boundary shifts away from the initial condition A<sub>0</sub> and B<sub>0</sub>. As another example, in Fig. 4 (c) from a different circuit, the higher O<sub>2</sub>(C) state disappears at k<sub>g</sub> ≈ 0.012 and switches to a lower O<sub>2</sub>(C), but this point is not a bifurcation.

      It is the point where the stable O<sub>1</sub> continuously crosses the basin boundary of O<sub>2</sub>.”

      Our further simulations have verified the existence of the oscillation-related bifurcations. We have added a new appendix discussing the phenomena associated with them in more detail.

      Weaknesses 5: “The impact of biochemical noise is not evaluated in this work; the author’s analysis is only carried out in a deterministic regime.”

      In this paper, we have not taken into account biochemical noise as we focus solely on scenarios where all protein concentrations are high. In these circumstances, the influence of noise is relatively minor. Incorporating biochemical noise, which originates from various sources and possesses diverse characteristics, would significantly complicate the analysis beyond the scope of our current work. However, exploring this aspect could be an intriguing avenue for future research. We have included the following discussions in our paper.

      “Our study focuses on scenarios where random noises are ignored. Realistically, gene circuits are subjected to diverse types of noise, which can complicate their predictable behavior and design. These noises can originate externally from a noisy input signal I, or intrinsically, directly affecting the circuit components. Further, these noises can be classified based on various mechanisms that cause them (Colin et al. (2017); Sartori and Tu (2011)) . And with different mechanisms, each type of noise can be characterized by different attributes such as frequency, amplitude, and noise color. These variances can lead to different impacts on the circuits, potentially necessitating unique mechanisms or designs for the attenuation of each category (Sartori and Tu (2011); Qiao et al. (2019) ). Given the extensive complexity and the need for thorough investigation, these noise-related challenges are beyond the scope of this paper and require a series of future studies.”

      Point-by-point response to the recommendations for the authors:

      Comment 1: - The authors’ github repository, detailed in their code availability statement, is currently unavailable and likely contains some of the answers to the queries here.

      We have updated our GitHub and OSF repositories with simulation codes, result data, and detailed explanations. The link to our GitHub repository in the previous version of the manuscript contained a format error, making it inaccessible to the referees. We apologize for this mistake and have corrected it.

      Comment 2:   - At present, it is not clear how the 425 topologies are created from the system of equations (Eq. 6-8) or from the circuit diagram in Fig 1a. This could do with being explicitly stated for the reader.

      We have added the following paragraph to discuss how the 425 topologies are selected and what the common motifs and connections they share.

      “Previous research identified 425 different three-node TRN network topologies that can achieve adaptation in the absence of growth feedback (Shi et al., 2017), providing the base of our computational study. These topologies can be classified into two families based on the core topology: networks with a negative feedback loop (NFBL) and networks with an incoherent feed-forward loop (IFFL) (Shi et al., 2017). More specifically, there are 206 network topologies in the NFBL family. All of these NFBL topologies have a negative feedback loop for node B. This negative feedback loop can be formed by the loop from node B to A and back to B (such as the circuit shown in Fig. 1 (a)), by node B to C and back to B, or by a longer route, from node B to A and then to C and back to B. There is always a self-activation link from B to B in all these 206 NFBL networks. There are 219 network topologies in the IFFL family. All of them have two feed-forward pathways from the input node A to the output node C. One pathway goes from node A to C directly, while the other involves node B in the middle. One of the pathways is activating while the other one is inhibitory. We use these 425 network topologies from the study (Shi et al., 2017), avoiding redundancy with established results. Due to the unique focus of our research on the effects of growth feedback and the need to evaluate quantitative ratios of robust circuits among all functional ones, we have chosen to use a 20-fold increase in the number of random parameter sets for each network topology compared to the simulations in (Shi et al., 2017). This approach makes it computationally prohibitive to scan all possible 16,038 three-node circuits. We carefully follow the settings in (Shi et al., 2017), which also analyzed TRNs with the AND logic as in this paper. Detailed descriptions of our simulation experiments are provided in the Methods section. To make our results more convincing, we have adopted a set of adaptation criteria that are stricter than those used in (Shi et al., 2017). Consequently, the ratio of adaptive circuits is somewhat lower in our study, with 4 out of the 425 network topologies not demonstrating adaptation.”

      Comment 3: - In the main text, the authors mentioned that they chose 425 network topologies for this study, whereas the number is 435 in the abstract. Please correct the error.

      The number 435 in our previous abstract referred to the 10 four-node circuits that we studied in the appendix, in addition to the 425 three-node network topologies. To avoid confusion and potential misunderstandings among readers, we have revised this expression of “435 distinct topological structures” to “more than four hundred topological structures”.

      Comment 4: - Please can the authors include the topologies they have studied in an appendix or as supplementary material. The impact of this work would increase significantly if for each topology the authors could include a pie chart similar to the one shown in Fig 2 so that others can use these results.

      We fully acknowledge the potential benefits of providing simulation results for each topology. However, including over four hundred more figures in this paper is not feasible. Moreover, we expect that many readers may also be interested in results not only for individual topologies but also for subsets sharing specific motifs or regulatory connections. Therefore, we have provided all the necessary data and codes in our GitHub repository to make these pie charts. We have included a detailed guide on how to generate these pie charts in the GitHub Readme file. These allow readers to plot the pie chart and extract distributions for any individual topology or use conditions to filter any subset of topologies as required. We believe this approach offers greater flexibility for our readers. We have also added the following explanation in the Methods section.

      “The codes implementing these criteria are available in our GitHub repository, with the link provided in the ”Code Availability” section. The failure type results for all circuits tested are available in our OSF repository, with the link provided in the ”Data Availability” section. An additional note is provided in the README file of our GitHub repository for further guidance on generating pie charts similar to Fig. 2 for any network topology or subset of topologies.”

      Comment 5: - At present, the authors have not given sufficient detail for their numerical methods (e.g. to identify bistability or oscillations) to enable the work to be repeated. I would appreciate it if the authors could expand their Methods section or provide a description of their method as an appendix. Additionally, the authors must clarify how many parameter sets per topology showed successful adaptation.

      In response to this comment, we have reorganized and expanded our Methods section, especially the new “Numerical simulations of circuit dynamics” and “Numerical criteria for functional adaptation and failure types” subsections. We added details on how we define and evaluate a “relatively steady state”, how to determine if there is an oscillation, how to determine the critical k<sub>g</sub> value, and how to determine if a failure is continuous or abrupt. Readers can also find the corresponding codes in our GitHub repository, where we provide a README file to help the readers locate the script file they need.

      The number of parameter sets per topology showed successful adaptation is precisely our definition of the Q-value. Q-values of most of the circuits we tested are shown in multiple figures in the paper. A complete table of Q-values with different topologies and different k<sub>growth</sub> values can be found in our OSF repository.

      Comment 6: - Looking at the Model Description, there seem to be multiple issues, as follows. The model should be rewritten and all simulations redone with the model corrected as described below:

      (a) The ”strength of growth feedback” is modeled by the maximal growth parameter k<sub>g</sub> in Equation (12). However, this rate does not represent growth feedback. In fact, this parameter must be present also for the system without growth feedback, Equations (6 - 8), because those cells grow as well! So Equation (12) with b(t)=0 should also be added to Equations (6 - 8), in addition to the dilution terms in each equation.

      (b) The dilution due to growth (dN/dt)*(B/N) is only added to Equations (9 - 11). This is wrong - growthaffects (dilutes) all protein concentrations, even without growth feedback, so similar terms must be added even to equations without growth feedback, i.e., to Equations (6 - 8).

      (c) The term representing growth feedback is actually the fraction 1/(1+b(t)). To adjust the strength ofgrowth feedback, some parameters should be introduced into this term. Specifically, the term currently has a Hill form with Hill coefficient = 1 and sensitivity = 1. The term should be converted into a general Hill function, and the parameters of that function should be altered to represent growth feedback. This Hill function is called a cellular (phenotypic) fitness landscape, see Nevozhay et al., 2012.

      Equations (6-8) only describe one part of the entire model we are studying. We are having these equations presented solely for the purpose of not overwhelming readers with a large number of parameters that are defined for the first time. They are not actually used in our simulations, but were only for explanations of the meaning of parameters. In our simulations throughout the paper, we only used Eqs. (9-13) (with various topologies). We have revised the texts to make this point clear. We have added the following descriptions in the section Model Description:

      “In order not to overwhelm readers with too many terms and parameters, we first describe a partial model (an isolated circuit without growth feedback) before introducing the complete model that we study in this work.”

      “Equations. (9) to (13) are the dynamical equations we actually use for simulating the circuit dynamics.”

      Additionaly, in the newly added subsection “Numerical simulations of circuit dynamics683” in the Methods, we explicitly mention that:

      “The dynamical equations we use are similar to Eqs. (9-13) but with different topologies.”

      We consider dilution due to cell growth as the dominant factor of growth feedback. In fact, we study the adaptive circuits without growth and their ability to maintain their adaptive behaviors after dilution into a fresh medium, based on a recent work [Zhang, et al., Nature Chemical Biology 16.6 (2020): 695-701]. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. The term mentioned in the comment is about how the burden of the circuit affects cell growth. We agree that it can be interesting to have a more comprehensive study on how different degrees of nonlinearity of this term can have different effects on the overall robustness towards the growth feedback problem, but this is not part of our primary focus and is beyond the scope of this paper. In this paper, we are mostly concerned with the variability of the strength of the growth feedback/dilution, controlled by the parameter k<sub>g</sub>, instead of the different types of nonlinearity.

      Comment 7:  - On the right side of Equation (7), the first term should be inhibitory, right?

      This is indeed an error. We accidentally reversed the regulation from A to B and B to A when inputting the formula. We have corrected both terms.

      Comment 8: - It seems to me that a better transition from Figs 6 and 7 to Fig 8 can be made. Did the authors choose the three circuits in Fig 8 based on the three distinct groups shown in Fig 6 and 7? The rationale for choosing the three topologies given the clusters identified earlier can be explained more clearly.

      We agree more explanation can be provided here. We have added the following descriptions, in the caption of Fig.8:

      “The other three curves represent circuits with different robustness levels: high (Circuit No. 98), moderate (Circuit No. 3), and low (Circuit No. 28) values of R, to demonstrate that this scaling behavior is generic. Each of these three circuit topologies is selected from one of the three groups illustrated in Fig. 6 and Fig. 7, and they have the highest Q(k<sub>g</sub> = 0) value within their respective groups.”

      and in the main text:

      “The three other curves represent circuit topologies that have a relatively high, moderate, and low value R among the 425 topologies tested, to demonstrate that this scaling behavior is generic. (These three topologies are the highest Q(k<sub>g</sub> = 0) topology in each of the three groups shown in Fig. 6 and Fig. 7.”

      Comment 9: - The insights from the neural network model seem to be very limited. It would be interesting to see if the model can predict the performance of network topologies that have not been exposed to the model during training.

      Machine learning is not a focus of this paper. For the section the comment was referring to, the main research question is on the relationship between circuit robustness and topology, and the point we are trying to make is that the robustness dependency varies across different connections — some connections are critical, while others are less impactful. The neural-network-based analysis was only used to provide further support to this point by demonstrating that through optimization, neural networks automatically assign different levels of weights to different connections in the circuits.

      We agree that it can be an interesting topic to study how machine learning can be used to help us design functional and robust circuits, as discussed in the final paragraph of the Discussion section. However, such an investigation would require a series of more comprehensive and carefully designed simulation experiments to validate if “neural networks can predict the performance of network topologies that have not been exposed to the model during training”. One point one should take extra care of is that many network topologies we study are very similar to many others, with shared motifs and links. These considerations extend beyond the scope of this paper.

      Other potential improvements or future work

      Comment 10: - The growth feedback examined in this paper comes from the effect of protein levels on the cell division rate (growth rate). However, the opposite effect can also occur; cell growth rates can affect the distribution of protein expression in cell populations. A good reference is Kheir Gouda et al., which is already on the list of references. These opposite effects should be described and discussed.

      We agree that growth feedback is inherently complex and has many biological effects, and in our paper, we are using a simplified model to study the dominant factor of growth feedback. We have added the following paragraph in the Discussion section, which involves the opposite effect mentioned in the comment.

      “In our paper, we consider dilution due to cell growth as the dominant factor of growth feedback. Here we compared the adaptive circuits under no-growth conditions and their ability to maintain their adaptive behaviors after dilution into a fresh medium, which mediated a significant dilution to the circuits. This is based on our previous work (Zhang et al. (2020)). However, growth feedback is inherently complex (Klumpp et al. (2009)). For instance, an increased growth rate can change protein synthesis rate (Hintsche and Klumpp (2013); Scott and Hwa (2023)), and cell growth rates can affect the distribution of protein expression in cell populations (Gouda et al. (2019)). In our paper, we concentrate on a simplified model with dilution, which we consider to have captured the dominant factor. The dynamic roles of the dilution and growth-affected production rate should be analogous, given that they both act as inhibitory factors arising from cell growth. Incorporating the impact of growth rate on protein synthesis into our model would offer a more comprehensive analysis, a task beyond the scope of this paper but presenting an intriguing opportunity for future research to address the complexities of growth feedback.”

      Comment11: - It may be worth mentioning that growth feedback can lead to persistence, see PMID:27010473.

      We have included this research as a citation.

      Comment 12: - While some other networks (two-node) are discussed, it would be worth doing this analysis for all one- and two-node networks, perhaps controlled by small molecules added externally. If not here, then as a future plan.

      We agree that this is an interesting idea for future studies.

      Comment 13: - The manuscript analyzes the deterministic dynamics of a set of gene networks. However, gene expression is always stochastic, and gene circuits have been designed to control stochastic gene expression. For example, gene expression distributions can be reshaped, or even new peaks can appear, which would be worth mentioning, PMID: 30341217. The effect of growth feedback on stochastic gene expression and future perspectives of systematically studying this should be discussed.

      We have added the following paragraph in the Discussion section to discuss the effects of noises and stochasticity. The research mentioned in the comment is also included.

      “Our study focuses on scenarios where random noises are ignored. Realistically, gene circuits are subjected to diverse types of noise, which can complicate their predictable behavior and design. These noises can originate externally from a noisy input signal I, or intrinsically, directly affecting the circuit components. Further, these noises can be classified based on various mechanisms that cause them (Colin et al. (2017); Sartori and Tu (2011)). And with different mechanisms, each type of noise can be characterized by different attributes such as frequency, amplitude, and noise color. These variances can lead to different impacts on the circuits, potentially necessitating unique mechanisms or designs for the attenuation of each category (Sartori and Tu (2011); Qiao et al. (2019)). Given the extensive complexity and the need for thorough investigation, these noise-related challenges are beyond the scope of this paper and require a series of future studies.”

    1. eLife Assessment

      This manuscript provides an important overview of potential resistance mutations within MET Receptor Tyrosine Kinase. The evidence supporting the findings is convincing - it should be pointed out that the approach is comparatively new for the application of protein kinases and the results are therefore of potentially great value. The results will be of value for clinicians facing drug resistance mutations, computational biologists who are training models of drug resistance mechanisms and biologists with an interest in cell signaling.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript provides a comprehensive overview of potential resistance mutations within MET Receptor Tyrosine Kinase and defines how specific mutations affect different inhibitors and modes of target engagement. The goal is to identify inhibitor combinations with the lowest overlap in their sensitivity to resistant mutations and determine if certain resistance mutations/mechanisms are more prevalent for specific modes of ATP-binding site engagement. To achieve this, the authors measured the ability of ~6000 single mutants of MET's kinase domain (in the context of a cytosolic TPR fusion) to drive IL-3-independent proliferation (used as a proxy for activity) of Ba/F3 cells (deep mutational profiling) in the presence of 11 different inhibitors. The authors then used co-crystal and docked structures of inhibitor-bound MET complexes to define the mechanistic basis of resistance and applied a protein language model to develop a predictive model of inhibitor sensitivity/resistance.

      Strengths:

      The major strengths of this manuscript are the comprehensive nature of the study and the rigorous methods used to measure the sensitivity of ~6000 MET mutants in a pooled format. The dataset generated will be a valuable resource for researchers interested in understanding kinase inhibitor sensitivity and, more broadly, small molecule ligand/protein interactions. The structural analyses are systematic and comprehensive, providing interesting insights into resistance mechanisms. Furthermore, the use of machine learning to define inhibitor-specific fitness landscapes is a valuable addition to the narrative. Although the ESM1b protein language model is only moderately successful in identifying the underlying mechanistic basis of resistance, the authors' attempt to integrate systematic sequence/function datasets with machine learning serves as a foundation for future efforts.

      Weaknesses:

      The main limitation of this study is that the authors' efforts to define general mechanisms between inhibitor classes were only moderately successful due to the challenge of uncoupling inhibitor-specific interaction effects from more general mechanisms related to the mode of ATP-binding site engagement. However, this is a minor limitation that only minimally detracts from the impressive overall scope of the study.

    3. Reviewer #3 (Public review):

      Summary:

      In the manuscript 'Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning' by Estevam et al, deep mutational scanning is used to assess the impact of ~5,764 mutants in the MET kinase domain on the binding of 11 inhibitors. Analyses were divided by individual inhibitor and kinase inhibitor subtype (I,II, I 1/2, and III). While a number of mutants were consistent with previous clinical reports, novel potential resistance mutants were also described. This study has implications for the development of combination therapies, namely which combination of inhibitors to avoid based on overlapping resistance mutant profiles. While one suggested pair of inhibitors with least overlapping resistance mutation profiles was suggested, this manuscript presents a proof of concept toward a more systematic approach for improved selection of combination therapeutics. Furthermore, in a final part of this manuscript the data was used to train a machine learning model, the ESM-1b protein language model augmented with an XG Boost Regressor framework, and found that they could improve predictions of resistance mutations above the initial ESM-1b model.

      Strengths:

      Overall this paper is a tour-de-force of data collection and analysis to establish a more systematic approach for the design of combination therapies, especially in targeting MET and other kinases, a family of proteins significant to therapeutic intervention for a variety of diseases. The presentation of the work is mostly concise and clear with thousands of data points presented neatly and clearly. The discovery of novel resistance mutants for individual MET inhibitors, kinase inhibitor subtypes within the context of MET, and all resistance mutants across inhibitor subtypes for MET has clinical relevance. However, probably the most promising outcome of this paper is the proposal of the inhibitor combination of Crizotinib and Cabozantib as Type I and Type II inhibitors, respectively, with the least overlapping resistance mutation profiles and therefore potentially the most successful combination therapy for MET. While this specific combination is not necessarily the point, it illustrates a compelling systematic approach for deciding how to proceed in developing combination therapy schedules for kinases. In an insightful final section of this paper, the authors approach using their data to train a machine learning model, perhaps understanding that performing these experiments for every kinase for every inhibitor could be prohibitive to applying this method in practice.

      Weaknesses:

      This paper presents a clear set of experiments with a compelling justification. The content of the paper is overall of high quality. Below are mostly regarding clarifications in presentation.

      Two places could use more computational experiments and analysis, however. Both are presented as suggestions, but at least a discussion of these topics would improve the overall relevance of this work. In the first case it seems that while the analyses conducted on this dataset were chosen with care to be the most relevant to human health, further analyses of these results and their implications of our understanding of allosteric interactions and their effects on inhibitor binding would be a relevant addition. For example, for any given residue type found to be a resistance mutant are there consistent amino acid mutations to which a large or small or effect is found. For example is a mutation from alanine to phenylalanine always deleterious, though one can assume the exact location of a residue matters significantly. Some of this analysis is done in dividing resistance mutants by those that are near the inhibitor binding site and those that aren't, but more of these types of analyses could help the reader understand the large amount of data presented here. A mention at least of the existing literature in this area and the lack or presence of trends would be worthwhile. For example, is there any correlation with a simpler metric like the Grantham score to predict effects of mutations (in a way the ESM-1b model is a better version of this, so this is somewhat implicitly discussed).

      Indeed, this discussion relates to the second point this manuscript could improve upon: the machine learning section. The main actionable item here is that this results section seems the least polished and could do a better job describing what was done. In the figure it looks like results for certain inhibitors were held out as test data - was this all mutants for a single inhibitor, or some other scheme? Overall I think the implications of this section could be fleshed out, potentially with more experiments. As mentioned in the 'Strengths' section, one of the appealing aspects of this paper is indeed its potential wide applicability across kinases -- could you use this ML model to predict resistance mutants for an entirely different kinase? This doesn't seem far-fetched, and would be an extremely compelling addition to this paper to prove the value of this approach.

      Another area in which this paper could improve its clarity is in the description of caveats of the assay. The exact math used to define resistance mutants and its dependence on the DMSO control is interesting, it is worth discussing where the failure modes of this procedure might be. Could it be that the resistance mutants identified in this assay would differ significantly from those found in patients? That results here are consistent with those seen in the clinic is promising, but discrepancies could remain. Furthermore a more in depth discussion of the MetdelEx14 results is warranted. For example, why is the DMSO signature in Figure 1 - supplement 4 so different from that of Figure 1? And finally, there is a lot of emphasis put on the unexpected results of this assay for the tivantinib "type III" inhibitor - could this in fact be because the molecule "is highly selective for the inactive or unphosphorylated form of c-Met" according to Eathiraj et al JBC 2011? These points are addressed in previous work (Estevam et al 2024) or in the detailed methods section, but are not obvious in the main text of the paper.

      This paper is crisply written with beautiful figures, and the complexity of the data is easy to understand from an in depth discussion of the mutants that have been previously reported.

      Finally, the potential impacts and follow-ups of this excellent study could be used as a resource for the community both as a dataset and as a proof of concept. It is exciting that his approach can be altered and/or improved in the future to facilitate the general application of this approach for combination therapies and the understanding of mechanism for other targets.

      Comments on revisions:

      Thank you for your additions and changes - they have improved the quality of this paper.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors present a cornucopia of data generated using deep mutational scanning (DMS) of variants in MET kinase, a protein target implicated in many different forms of cancer. The authors conducted a heroic amount of deep mutational scanning, using computational structural models to augment the interpretation of their DMS findings.

      Strengths:

      This powerful combination of computational models, experimental structures in the literature, dose-response curves, and DMS enables them to identify resistance and sensitizing mutations in the MET kinase domain, as well as consider inhibitors in the context of the clinically relevant exon-14 deletion. They then try to use the existing language model ESM1b augmented by an XGBoost regressor to identify key biophysical drivers of fitness. The authors provide an incredible study that has a treasure trove of data on a clinically relevant target that will appeal to many.

      We thank Reviewer 1 for their generous assessment of our manuscript!

      Weaknesses:

      However, the authors do not equally consider alternative possible mechanisms of resistance or sensitivity beyond the impact of mutation on binding, even though the measure used to discuss resistance and sensitivity is ultimately a resistance score derived from the increase or decrease of the presence of a variant during cell growth.

      For this resistance screen, Ba/F3 was a carefully chosen cellular selection system due to its addiction to exogenously provided IL-3, undetected expression of endogenous RTKs (including MET), and dependence on kinase transgenes to promote signaling and growth under IL-3 withdrawal. Together this allows for the readout of variants that alter kinase-driven proliferation without the caveat of bypass resistance. In our previous phenotypic screen (Estevam et al., 2024, eLife), we also carefully examined the impact of all possible MET kinase domain mutations both in the presence and absence of IL-3 withdrawal, but no inhibitors. There, we identified a small group of mutations that were associated with gain-of-function behavior located at conserved regulatory motifs outside of the catalytic site, yet these mutations were largely sensitive to inhibitors within this screen.

      Here, the majority of resistance mutations were located at or near the ATP-binding pocket, suggesting an impact on resistance through direct drug interactions. However, there was also a small population of distal mutations that met our statistical definitions of resistance. Within the crizotinib selection, sites such as T1293, L1272, T1261, amongst others, demonstrated resistance profiles but were located in C-lobe away from the catalytic site. While we did not experimentally validate these specific mutations, it is possible that non-direct drug binders instead promote resistance through allosteric or conformational mechanisms which preserve kinase activity and signaling. Indeed, our ML framework explicitly included conformational and stability effects as significant in improving predictions.

      We would be happy to further discuss any specific alternative resistance mechanisms Reviewer 1 has in mind! Thank you for highlighting this!

      There are also points of discussion and interpretation that rely heavily on docked models of kinase-inhibitor pairs without considering alternative binding modes or providing any validation of the docked pose. Lastly, the use of ESM1b is powerful but constrained heavily by the limited structural training data provided, which can lead to misleading interpretations without considering alternative conformations or poses.

      The majority of our interpretations are grounded in the X-ray structures of WT MET bound to the inhibitors studied (or close analogs). The use of docked models (note - to mutant structures predicted by UMol, not ESM, that can have conformational changes) is primarily in the ML part of the manuscript. Indeed, in our models, conformational and binding mode changes are taken into account as features (see Ligand RMSD, Residue RMSD). There are certainly improved methods (AF3 variants) emerging that might have even more power to model these changes, but they come with greater computational costs and are something we will be evaluating in the future.

      We added to the results section: “While our features can account for some changes in MET-mutant conformation and altered inhibitor binding pose, the prediction of these aspects can likely be improved with new methods.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript provides a comprehensive overview of potential resistance mutations within MET Receptor Tyrosine Kinase and defines how specific mutations affect different inhibitors and modes of target engagement. The goal is to identify inhibitor combinations with the lowest overlap in their sensitivity to resistant mutations and determine if certain resistance mutations/mechanisms are more prevalent for specific modes of ATP-binding site engagement. To achieve this, the authors measured the ability of ~6000 single mutants of MET's kinase domain (in the context of a cytosolic TPR fusion) to drive IL-3-independent proliferation (used as a proxy for activity) of Ba/F3 cells (deep mutational profiling) in the presence of 11 different inhibitors. The authors then used co-crystal and docked structures of inhibitor-bound MET complexes to define the mechanistic basis of resistance and applied a protein language model to develop a predictive model of inhibitor sensitivity/resistance.

      Strengths:

      The major strengths of this manuscript are the comprehensive nature of the study and the rigorous methods used to measure the sensitivity of ~6000 MET mutants in a pooled format. The dataset generated will be a valuable resource for researchers interested in understanding kinase inhibitor sensitivity and, more broadly, small molecule ligand/protein interactions. The structural analyses are systematic and comprehensive, providing interesting insights into resistance mechanisms. Furthermore, the use of machine learning to define inhibitor-specific fitness landscapes is a valuable addition to the narrative. Although the ESM1b protein language model is only moderately successful in identifying the underlying mechanistic basis of resistance, the authors' attempt to integrate systematic sequence/function datasets with machine learning serves as a foundation for future efforts.

      We thank Reviewer 2 for their thoughtful assessment of our manuscript!

      Weaknesses:

      The main limitation of this study is that the authors' efforts to define general mechanisms between inhibitor classes were only moderately successful due to the challenge of uncoupling inhibitor-specific interaction effects from more general mechanisms related to the mode of ATP-binding site engagement. However, this is a minor limitation that only minimally detracts from the impressive overall scope of the study.

      We agree. We have added to the discussion: “A full landscape of mutational effects can help to predict drug response and guide small molecule design to counteract acquired resistance. The ability to define molecular mechanisms towards that goal will likely require more purposefully chosen chemical inhibitors and combinatorial mutational libraries to be maximally informative.”

      Reviewer #3 (Public review):

      Summary:

      In the manuscript 'Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning' by Estevam et al, deep mutational scanning is used to assess the impact of ~5,764 mutants in the MET kinase domain on the binding of 11 inhibitors. Analyses were divided by individual inhibitor and kinase inhibitor subtypes (I, II, I 1/2, and III). While a number of mutants were consistent with previous clinical reports, novel potential resistance mutants were also described. This study has implications for the development of combination therapies, namely which combination of inhibitors to avoid based on overlapping resistance mutant profiles. While one suggested pair of inhibitors with the least overlapping resistance mutation profiles was suggested, this manuscript presents a proof of concept toward a more systematic approach for improved selection of combination therapeutics. Furthermore, in a final part of this manuscript the data was used to train a machine learning model, the ESM-1b protein language model augmented with an XG Boost Regressor framework, and found that they could improve predictions of resistance mutations above the initial ESM-1b model.

      Strengths:

      Overall this paper is a tour-de-force of data collection and analysis to establish a more systematic approach for the design of combination therapies, especially in targeting MET and other kinases, a family of proteins significant to therapeutic intervention for a variety of diseases. The presentation of the work is mostly concise and clear with thousands of data points presented neatly and clearly. The discovery of novel resistance mutants for individual MET inhibitors, kinase inhibitor subtypes within the context of MET, and all resistance mutants across inhibitor subtypes for MET has clinical relevance. However, probably the most promising outcome of this paper is the proposal of the inhibitor combination of Crizotinib and Cabozantib as Type I and Type II inhibitors, respectively, with the least overlapping resistance mutation profiles and therefore potentially the most successful combination therapy for MET. While this specific combination is not necessarily the point, it illustrates a compelling systematic approach for deciding how to proceed in developing combination therapy schedules for kinases. In an insightful final section of this paper, the authors approach using their data to train a machine learning model, perhaps understanding that performing these experiments for every kinase for every inhibitor could be prohibitive to applying this method in practice.

      We thank Reviewer 3 for their assessment of our manuscript (we are very happy to have it described as a tour-de-force!)

      Weaknesses:

      This paper presents a clear set of experiments with a compelling justification. The content of the paper is overall of high quality. Below are mostly regarding clarifications in presentation.

      Two places could use more computational experiments and analysis, however. Both are presented as suggestions, but at least a discussion of these topics would improve the overall relevance of this work. In the first case it seems that while the analyses conducted on this dataset were chosen with care to be the most relevant to human health, further analyses of these results and their implications of our understanding of allosteric interactions and their effects on inhibitor binding would be a relevant addition. For example, for any given residue type found to be a resistance mutant are there consistent amino acid mutations to which a large or small or effect is found. For example is a mutation from alanine to phenylalanine always deleterious, though one can assume the exact location of a residue matters significantly. Some of this analysis is done in dividing resistance mutants by those that are near the inhibitor binding site and those that aren't, but more of these types of analyses could help the reader understand the large amount of data presented here. A mention at least of the existing literature in this area and the lack or presence of trends would be worthwhile. For example, is there any correlation with a simpler metric like the Grantham score to predict effects of mutations (in a way the ESM-1b model is a better version of this, so this is somewhat implicitly discussed).

      Indeed we experimented with including these types of features in the XGBoost scheme (particularly residue volume change and distance) to augment the predictive power of the ESM model - see Figure 8 - figure supplement 1; however, we didn’t find them as significant. Therefore, the signal is likely very small and/or incorporated into the baseline ESM model.

      Indeed, this discussion relates to the second point this manuscript could improve upon: the machine learning section. The main actionable item here is that this results section seems the least polished and could do a better job describing what was done. In the figure it looks like results for certain inhibitors were held out as test data - was this all mutants for a single inhibitor, or some other scheme? Overall I think the implications of this section could be fleshed out, potentially with more experiments.

      Figure 8A and the methods section contain a very detailed explanation of test data. We have thought about it and do not have any easy path to improve the description, which we reproduce here:

      “Experimental fitness scores of MET variants in the presence of DMSO and AMG458 were ignored in model training and testing since having just one set of data for a type I ½ inhibitor and DMSO leads to learning by simply memorizing the inhibitor type, without generalizability. The remaining dataset was split into training and test sets to further avoid overfitting (Figure 8A). The following data points were held out for testing - (a) all mutations in the presence of one type I (crizotinib) and one type II (glesatinib analog) inhibitor, (b) 20% of randomly chosen positions (columns) and (c) all mutations in two randomly selected amino acids (rows) (e.g. all mutations to Phe, Ser). After splitting the dataset into train and test sets, the train set was used for XGBoost hyperparameter tuning and cross-validation. For tuning the hyperparameters of each of the XGBoost models, we held out 20% of randomly sampled data points in the training set and used the remaining 80% data for Bayesian hyperparameter optimization of the models with Optuna (Akiba et al., 2019), with an objective to minimize the mean squared error between the fitness predictions on 20% held out split and the corresponding experimental fitness scores. The following hyperparameters were sampled and tuned: type of booster (booster - gbtree or dart), maximum tree depth (max_depth), number of trees (n_estimators), learning rate (eta), minimum leaf split loss (gamma), subsample ratio of columns when constructing each tree (colsample_bytree), L1 and L2 regularization terms (alpha and beta) and tree growth policy (grow_policy - depthwise or lossguide). After identifying the best combination of hyperparameters for each of the models, we performed 10-fold cross validation (with re-sampling) of the models on the full training set. The training set consists of data points corresponding to 230 positions and 18 amino acids. We split these into 10 parts such that each part corresponds to data from 23 positions and 2 amino acids. Then, at each of 10 iterations of cross-validation, models were trained on 9 of 10 parts (207 positions and 16 amino acids) and evaluated on the 1 held out part (23 positions and 2 amino acids). Through this protocol we ensure that we evaluate performance of the models with different subsets of positions and amino acids. The average Pearson correlation and mean squared error of the models from these 10 iterations were calculated and the best performing model out of 8192 models was chosen as the one with the highest cross-validation correlation. The final XGBoost models were obtained by training on the full training set and also used to obtain the fitness score predictions for the validation and test sets. These predictions were used to calculate the inhibitor-wise correlations shown in Figure 8B.“

      As mentioned in the 'Strengths' section, one of the appealing aspects of this paper is indeed its potential wide applicability across kinases -- could you use this ML model to predict resistance mutants for an entirely different kinase? This doesn't seem far-fetched, and would be an extremely compelling addition to this paper to prove the value of this approach.

      This is exactly where we want to go next! But as we see here, it is going to be hard and require more purposeful selection of chemicals and likely combinatorial mutations to be maximally informative (see also reviewer 2 response where we have added text)

      Another area in which this paper could improve its clarity is in the description of caveats of the assay. The exact math used to define resistance mutants and its dependence on the DMSO control is interesting, it is worth discussing where the failure modes of this procedure might be. Could it be that the resistance mutants identified in this assay would differ significantly from those found in patients? That results here are consistent with those seen in the clinic is promising, but discrepancies could remain.

      Thank you for pointing this out. The greatest trade-off of probing the intracellular MET kinase (juxtamembrane, kinase domain, c-tail) in the constitutively active TPR system is that while we gain cytoplasmic expression, constitutive oligomerization, and HGF-independent activation, other features like membrane-proximal effects are lost and translatability of some mutations in non-proliferative conditions may also be limited. Nevertheless, Ba/F3 allows IL-3 withdrawal to serve as an effective variant readout of transgenic kinase variant effects due to its undetectable expression of endogenous RTKs and addiction to exogenous interleukin-3 (IL-3).

      In our previous study, we were also interested in comparing the phenotypic results to available patient populations in cBioPortal. We observed that our DMS captured known oncogenic MET kinase variants, in addition to a population of gain-of-function variants within clinical residue positions that have not been clinically reported. Interestingly, the population of possible novel gain-of-function mutant codons were more distant in genetic space (2-3 Hamming distance) from wild type than the clinically reported variant codon (1-2 Hamming distance).

      For this inhibitor screen, we also carefully compared previously reported and validated resistance mutations across referenced publications to that of our inhibitor screen, and observed large agreement as noted in-text. While discrepancies could definitely remain, there is precedence for consistency.

      Furthermore a more in depth discussion of the MetdelEx14 results is warranted. For example, why is the DMSO signature in Figure 1 - supplement 4 so different from that of Figure 1?

      In our previous study (Estevam et al., 2024), we more directly compared MET and METΔExon14, and while observed several differences, especially at conserved regulatory motifs, the TPR expression system did not provide a robust differential. Therefore, we hypothesize that a membrane-bound context is likely necessary to obtain a differential that captures juxtamembrane regulatory effects for these two isoforms. For that reason, we did not place heavy emphasis on the differences between MET and METΔExon14 in this study. Nevertheless, we performed parallel analysis of the METΔExon14 inhibitor DMS and provided all source and analyzed data in our GitHub repository (https://github.com/fraser-lab/MET_kinase_Inhibitor_DMS).

      In our analysis of resistance, we used Rosace to score and compare DMSO and inhibitor landscapes. We present the full distribution of raw scores in Figure 1 for each condition. However, to visually highlight resistance mutations as a heatmap, we subtracted the scores of each variant in each inhibitor condition from the raw DMSO score, making the heatmaps in Figure 1 - supplement 4 appear more “blue.”

      And finally, there is a lot of emphasis put on the unexpected results of this assay for the tivantinib "type III" inhibitor - could this in fact be because the molecule "is highly selective for the inactive or unphosphorylated form of c-Met" according to Eathiraj et al JBC 2011?

      The work presented by Eathiraj et al JBC 2011 is a key study we reference and is foundational to tivantinib. While the point brought up about tivantinib’s selective preference for an inactive conformation is valid, this is also true for type II kinase inhibitors. In our study, regardless of inhibitor conformational preference, tivantinib was the only one with a nearly identical landscape to DMSO and exhibited selection even in the absence of Ba/F3 MET-addiction (Figure 1E). This result is in closer agreement with MET agnostic behavior reported by Basilico et al., 2013 and Katayama et al., 2013.

      While this paper is crisply written with beautiful figures, the complexity of the data warrants a bit more clarity in how the results are visualized. Namely, clearly highlighting mutants that have previously reported and those identified by this study across all figures could help significantly in understanding the more novel findings of the work.

      To better compare and contrast novel mutation identified in this study to others, we compiled a list of reported resistance mutations from recent clinical and experimental studies (Pecci et al 2024; Yao et al., 2023; Bahcall et al., 2022; Recondo et al., 2020; Rotow et al ., 2020; Fujino et al., 2019), since a direct database with resistance annotations does not exist for MET, to the best of our knowledge. In total, this amounted to 31 annotated resistance mutations across crizotinib, capmatinib, tepotinib, savolitinib, cabozantinib, merestinib, and glesatinib, which we have now tabulated in a new figure (Figure 4) and commentary in the main text:

      To assess the agreement between our DMS and previously annotated resistance mutations, we compiled a list of reported resistance mutations from recent clinical and experimental studies (Pecci et al 2024; Yao et al., 2023; Bahcall et al., 2022; Recondo et al., 2020; Rotow et al ., 2020; Fujino et al., 2019) (Figure 4A,B). Overall, previously discovered mutations are strongly shifted to a GOF distribution for the drugs where resistance is reported from treatment or experiment; in contrast, the distribution is centered around neutral for those sites for other drugs not reported in the literature (Figure 4C). However, even in cases such as L1195V, we observe GOF DMS scores indicative of resistance to previously reported inhibitors. Given this overall strong concordance with prior literature and clinical results, we can also provide hypotheses to clarify the role of mutations that are observed in combination with others. For example, H1094Y is a reported driver mutation that has been linked to resistance in METΔEx14 for glesatinib with either the secondary L1195V mutation or in isolation (Recodo et al., 2020). However, in our assay H1094Y demonstrated slight sensitivity to gelesatinib, suggesting that either resistance is linked to the exon14 deletion isoform, the L1195V mutation, or a cellular factor not modeled well by the BaF3 system.

      Finally, the potential impacts and follow-ups of this excellent study could be communicated better - it is recommended that they advertise better this paper as a resource for the community both as a dataset and as a proof of concept. In this realm I would encourage the authors to emphasize the multiple potential uses of this dataset by others to provide answers and insights on a variety of problems.

      Please see below

      Related to this, the decision to include the MetdelEx14 results, but not discuss them at all is interesting, do the authors expect future analyses to lead to useful insights? Is it surprising that trends are broadly the same to the data discussed?

      Our previous paper suggests that Ba/F3 isn’t a great model for measuring the differences between MET and METΔEx14, so we haven’t emphasized other than to point to our previous paper. We include the full analysis here nonetheless as a resource. Potentially where the greatest differences between resistance mutant behaviors would be observed is in the full-length, membrane-bound MET and METΔEx14 receptor isoforms. While outside of the scope of this study, there is great potential to use the resistance mutations identified in this study as a filtered group to test and map differential inhibitor sensitivities between receptor isoforms.

      And finally it could be valuable to have a small addition of introspection from the authors on how this approach could be altered and/or improved in the future to facilitate the general application of this approach for combination therapies for other targets.

      See also reviewer 2 response where we have added text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major points of revision:

      (1) It seems like much of the structural interpretation of the inhibitor binding mode, outside of crizotinib binding, appears to come from docked models of the inhibitor to the MET kinase domain. Given the potential variability of the docked structure to the kinase domain, it would be useful for the authors to consider alternative possible binding modes that their docking pipeline may have suggested. It could also be useful to provide some degree of validation or contextualization of their docking models.

      All individual figures are very carefully inspected based on either existing crystal structures of the inhibitor or closely related inhibitors (ATP, 3DKC; crizotinib, 2WGJ; tepotinib, 4R1V; tivantinib, 3RHK; AMG-458, 5T3Q; NVP-BVU972, 3QTI; merestinib, 4EEV; savolitinib, 6SDE). In total, four structural interpretations were the result of docking onto reference experimental structures (capmatinib, cabozantinib, glumetinib, glesatinib). As we wrote above, different conformations and binding modes are possible in predicted mutant structures (as we did here at scale) and included in the ML analysis already.

      (2) In the first section, the authors classify an inhibitor as Type Ia on docking models, but mention the conflicting literature describing it as type Ib - it would be helpful to provide a contextualization of why this distinction between Ia and Ib matters, and what difference it might make. It would also be useful to know if their docking score only suggested poses compatible with Ia or if other poses were provided as well. Validation using other method might be beneficial, especially since they acknowledge the conflicting literature for classification. Or at least recontextualization that more evidence would be needed.

      Kinase inhibitors have several canonical structural definitions we use to base the classifications in this study. Specifically, type I inhibitors are classified in MET by interactions with Y1230, D1228, K1110 in addition to its conformation in the ATP-binding site. Type I inhibitors are further subdivided into type 1a in MET if it leverages interactions with the solvent front and residue G1163. In prior literature referenced, tepotinib was classified as type 1b, which would imply it does not have solvent front interactions, like savolitinib (PDB 6SDE) or NVP-BVU972 (PDB 3QTI). However, in the tepotinib experimental structure (PDB 4R1V), we observed a greater structural resemblance to other type 1a inhibitors opposed to type 1b (Figure 1 - figure supplement 1b).

      (3) The measure used to discuss resistance and sensitivity is ultimately a resistance score derived from the increase or decrease of the presence of a variant during cell growth. This is not a measure of direct binding. It would be helpful if the authors discussed alternative mechanisms through which these variants may impact resistance and/or sensitivity, such as stability, protonation effects, or kinase activity. The score itself may be convolving over all these potential mechanisms to drive GOF and LOF observed behavior.

      See the response to the public review. Indeed, our ML framework explicitly included conformational and stability effects as significant in improving predictions.

      (4) While it is promising to try and improve the predictive properties of ESM1b, it is not exactly clear why the authors considered their structural data of 11 inhibitors a sufficient dataset with which to augment the model. It would be useful for the authors to provide some additional context for why they wished to augment ESM1b in particular with their dataset, and provide any metrics indicating that their training data of 11 inhibitors provided an adequate statistical sample.

      We don’t understand what this means. Sorry!

      (5) The authors use ESM-1b to predict the fitness impact of each mutation and augment it using protein structural data of drug-target interactions. However, using an XGBoost regressor on a single set of 11 kinase-inhibitor interaction pairs is an incredibly sparse dataset to train upon. It would be useful for the authors to consider the limitations of their model, as well as its extensibility in the context of alternate binding poses, alternate conformations, or changes in protonation states of ligand or inhibitor.

      On the contrary - this is 11 chemicals across 3000 mutations. We have discussed alternative interpretations above.

      Minor points:

      (1) It would also be useful for the authors to provide more context around their choice of regressor. XGBoost is a powerful regressor but can easily overfit high dimensional data when paired with language models such as ESM-1b. This would be particularly useful since some of the features to train on were also generated using existing models such as ThermoMPNN.

      Yes - we are quite concerned about overfitting and have tried to assess overfitting by careful design of test and validation sets.

      (2) The authors also mention excluding their DMSO and AMG458 scores in the model training and testing due to overfitting issues - it would be useful to have an SI figure pointing to this data.

      No - we exclude the DMSO because that is the reference (baseline) and AMG because it has a different binding mode. This isn’t related to overfitting.

      (3) The authors mention in their docking pipeline that 5 binding modes were used for each ligand docking, but it appears that only one binding mode is considered in the main figures. It would be useful for the authors to provide additional details about what were the other binding modes used for, how different were each binding mode, and how was the "primary" mode selected (and how much better was its score than the others).

      The reviewer misinterprets the difference between poses shown in figures, based on mostly crystal structures or carefully selected templates, and the use of docked models in feature engineering for the ML part of the study. Where existing crystal structures do not exist, we performed docking for capmatinib, cabozantinib, glumetinib, glesatinib onto reference structures bound to type I (2WGJ) and type II (4EEV) inhibitors. We selected one representative binding mode based on the reference inhibitor, and while not exact, at a minimum these models provide a basis for structural interpretation.

      Reviewer #2 (Recommendations for the authors):

      My main suggestion is for the authors to add a few sentences (in non-technical language) to the results section, specifically before the results shown in Figure 3, defining gain-of-function, loss-of-function, resistance, and sensitivity. While these definitions are present in the materials and methods section, explicitly discussing them prior to the relevant results would significantly improve the overall readability of the manuscript.

      We defined “gain-of-function” and “loss-of-function” mutations as those with fitness scores statistically greater or lower than wild-type. Within the DMSO condition, gain-of-function and loss-of -function labels describe mutational perturbation to protein function, whereas within inhibitor conditions, the labels describe the difference in fitness introduced by an inhibitor.

      We have also clarified these definitions where the terms are first introduced: “As expected, the DMSO control population displayed a bimodal distribution with mutations exhibiting wild-type fitness centered around 0, with a wider distribution of mutations that exhibited loss- or gain-of-function effects, as defined by fitness scores with statistically significant lower or greater scores than wild-type, respectively.”

      Figure 7D. Please add a bit more detail to the legend on how fold change (y-axis) was calculated.

      Here, fold change represents the number of viable cells at each inhibitor concentration relative to the TKI control, measured with the CellTiter-Glo® Luminescent Cell Viability Assay (Promega) as an end point readout. We have updated the legend of Figure 7D with calculation details: “Dose-response for each inhibitor concentration is represented as the fraction of viable cells relative to the TKI free control.”

      I must admit, I did not understand what "Specific inhibitor fitness landscapes also aid in identifying mutations with potential drug sensitivity, such as R1086 and C1091 in the MET P-loop" means. These are positions where most mutations lead to greater sensitivity to crizotinib. Is the idea that there are potentially clinically-relevant MET mutations that can be targeted over wild type with crizotinib?

      Thank you for highlighting this! The P-loop (phosphate-binding loop) is a glycine-rich structural motif conserved in kinase domains. This motif is located in the N-lobe, where its primary role is to gate ATP entry into the active site and stabilize the phosphate groups of ATP when bound. Therefore, the P-loop is a common target region for ATP-competitive inhibitor design, but also a site where resistance can emerge (Roumiantsev et al., 2002). The idea we’d like to convey is that identifying residues that offer the potential for drug stabilization with the added benefit of having lower risk resistance, is an attractive consideration for novel inhibitor design.

      We have added to the text: “Individual inhibitor resistance landscapes also aid in identifying target residues for novel drug design by providing insights into mutability and known resistance cases. This enables the selection of vectors for chemical elaboration with potential lower risk of resistance development. Sites with mutational profiles such as R1086 and C1091, located in the common drug target P-loop of MET, could be likely candidates for crizotinib.”

      Reviewer #3 (Recommendations for the authors):

      (1) Suggested Improvements to the Figures:

      a)  Figure 4A - T1261 seems to be mislabeled

      b)  In Figure 3A it's suggested to highlight mutants determined to be resistance mutants by this scheme.

      c)  In Figure 3D it would be informative to highlight which of these resistance mutants have already been previously reported and which are novel to this study

      d)  Throughout figures 3A, 3D, and 4G the graphical choices on how to highlight synonymous mutations and mutations not performed in the assay needs improvement.

      The Green vs Grey 'TRUE' vs 'FALSE' boxes are confusing. Just a green box indicating synonymous mutations would be sufficient. Additionally these green boxes are hard to see, and often edges of this green box are currently missing making it even more difficult to see and interpret.

      * In Figure 4A mutants do not seem to be indicated by a line or plus sign, but this is not explained in the legend or the caption. Please add.

      * In 3D and 4G it is not clear if the mutants not performed are indicated at all - perhaps they are indicated in white, making them indistinguishable from scores with 0. Please clarify.

      T1261 and G1242 are now correctly labeled.

      In text we have also highlighted reported resistance mutations for crizotinib, which are inclusive of clinical reports and in vitro characterization: “These sites, and many of the individual mutations, have been noted in prior reports, such as: D1228N/H/V/Y, Y1230C/H/N/S, G1163R.”

      We have adjusted the heatmaps to improve visual clarity. Mutations with score 0 are white, as indicated by the scale bar, and mutations uncaptured by the screen are now in light yellow. The green outline distinguishing WT synonymous mutations have also been adjusted so edges are no longer cut off. In our representations, we only distinguished mutations by the score color scale bar and WT outline. What looked like a “plus” or “line” in the original figure was only the heatmap background, which now should be resolved in the updated figure and legends for Figure 3 and Figure 4.

      (2) Some Minor Suggested Improvements to the Text:

      a)  The abbreviation CBL for 'CBL docking site' is used without being defined.

      b)  Figure 3G is referenced, but it does not exist.

      c)  In the sentence 'Beyond these well characterized sites, regions with sensitivity occurred throughout the kinase, primarily in loop-regions which have the greatest mutational tolerance in DMSO, but do not provide a growth advantage in the presence of an inhibitor (Figure 1 - Figure Supplement 1; Figure 1 - Figure Supplement 2).'. It is not clear why these supplemental figures are being referenced.

      d)  In the supplement section 'Enrich2 Scoring' has what seem like placeholders for citations in [brackets]

      Cbl is a E3 ubiquitin ligase that plays a role in MET regulation through engagement with exon 14, specifically at Y1003 when phosphorylated. This mode of regulation was more highlighted in our previous study. However, since Cbl was only mentioned briefly in this study, we have removed reference to it to simplify the text.

      In addition, we have removed the figure 3G reference and corrected the in-text range. We have also removed references to figure supplements where unnecessary and edited the “Enrich2 scoring” method section to now reference missing citations.

    1. eLife Assessment

      The study is important to show the role of MED26 in red cell formation. Linking transcription pausing with erythropoiesis is a key discovery. The data are solid although there are still spaces to improve. The in vivo data are limited by specificity concerns on their Cre model. Having RNA-seq, using more erythroid markers such as band3 and a4-integrin, and orthogonal validation with iPSC-erythropoiesis model will improve the study.

    2. Reviewer #1 (Public review):

      Summary:

      In this study from Zhu and colleagues, a clear role for MED26 in mouse and human erythropoiesis is demonstrated that is also mapped to amino acids 88-480 of the human protein. The authors also show the unique expression of MED26 in later-stage erythropoiesis and propose transcriptional pausing and condensate formation mechanisms for MED26's role in promoting erythropoiesis. Despite the author's introductory claim that many questions regarding Pol II pausing in mammalian development remain unanswered, the importance of transcriptional pausing in erythropoiesis has actually already been demonstrated (Martell-Smart, et al. 2023, PMID: 37586368, which the authors notably did not cite in this manuscript). Here, the novelty and strength of this study is MED26 and its unique expression kinetics during erythroid development.

      Strengths:

      The widespread characterization of kinetics of mediator complex component expression throughout the erythropoietic timeline is excellent and shows the interesting divergence of MED26 expression pattern from many other mediator complex components. The genetic evidence in conditional knockout mice for erythropoiesis requiring MED26 is outstanding. These are completely new models from the investigators and are an impressive amount of work to have both EpoR-driven deletion and inducible deletion. The effect on red cell number is strong in both. The genetic over-expression experiments are also quite impressive, especially the investigators' structure-function mapping in primary cells. Overall the data is quite convincing regarding the genetic requirement for MED26. The authors should be commended for demonstrating this in multiple rigorous ways.

      Weaknesses:

      (1) The authors state that MED26 was nominated for study based on RNA-seq analysis of a prior published dataset. They do not however display any of that RNA-seq analysis with regards to Mediator complex subunits. While they do a good job showing protein-level analysis during erythropoiesis for several subunits, the RNA-seq analysis would allow them to show the developmental expression dynamics of all subunit members.

      (2) The authors use an EpoR Cre for red cell-specific MED26 deletion. However, other studies have now shown that the EpoR Cre can also lead to recombination in the macrophage lineage, which clouds some of the in vivo conclusions for erythroid specificity. That being said, the in vitro erythropoiesis experiments here are convincing that there is a major erythroid-intrinsic effect.

      (3) The donor chimerism assessment of mice transplanted with MED26 knockout cells is a bit troubling. First, there are no staining controls shown and the full gating strategy is not shown. Furthermore, the authors use the CD45.1/CD45.2 system to differentiate between donor and recipient cells in erythroblasts. However, CD45 is not expressed from the CD235a+ stage of erythropoiesis onwards, so it is unclear how the authors are detecting essentially zero CD45-negative cells in the erythroblast compartment. This is quite odd and raises questions about the results. That being said, the red cell indices in the mice are the much more convincing data.

      (4) The authors make heavy use of defining "erythroid gene" sets and "non-erythroid gene" sets, but it is unclear what those lists of genes actually are. This makes it hard to assess any claims made about erythroid and non-erythroid genes.

      (5) Overall the data regarding condensate formation is difficult to interpret and is the weakest part of this paper. It is also unclear how studies of in vitro condensate formation or studies in 293T or K562 cells can truly relate to highly specialized erythroid biology. This does not detract from the major findings regarding genetic requirements of MED26 in erythropoiesis.

      (6) For many figures, there are some panels where conclusions are drawn, but no statistical quantification of whether a difference is significant or not.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al describes a novel role for MED26, a subunit of the Mediator complex, in erythroid development. The authors have discovered that MED26 promotes transcriptional pausing of RNA Pol II, by recruiting pausing-related factors.

      Strengths:

      This is a well-executed study. The authors have employed a range of cutting-edge and appropriate techniques to generate their data, including: CUT&Tag to profile chromatin changes and mediator complex distribution; nuclear run-on sequencing (PRO-seq) to study Pol II dynamics; knockout mice to determine the phenotype of MED26 perturbation in vivo; an ex vivo erythroid differentiation system to perform additional, important, biochemical and perturbation experiments; immunoprecipitation mass spectrometry (IP-MS); and the "optoDroplet" assay to study phase-separation and molecular condensates.

      This is a real highlight of the study. The authors have managed to generate a comprehensive picture by employing these multiple techniques. In doing so, they have also managed to provide greater molecular insight into the workings of the MEDIATOR complex, an important multi-protein complex that plays an important role in a range of biological contexts. The insights the authors have uncovered for different subunits in erythropoiesis will very likely have ramifications in many other settings, in both healthy biology and disease contexts.

      Weaknesses:

      There are almost no discernible weaknesses in the techniques used, nor the interpretation of the data. The IP-MS data was generated in HEK293 cells when it could have been performed in the human CD34+ HSPC system that they employed to generate a number of the other data. This would have been a more natural setting and would have enabled a more like-for-like comparison with the other data.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to explore whether other subunits besides MED1 exert specific functions during the process of terminal erythropoiesis with global gene repression, and finally they demonstrated that MED26-enriched condensates drive erythropoiesis through modulating transcription pausing.

      Strengths:

      Through both in vitro and in vivo models, the authors showed that while MED1 and MED26 co-occupy a plethora of genes important for cell survival and proliferation at the HSPC stage, MED26 preferentially marks erythroid genes and recruits pausing-related factors for cell fate specification. Gradually, MED26 becomes the dominant factor in shaping the composition of transcription condensates and transforms the chromatin towards a repressive yet permissive state, achieving global transcription repression in erythropoiesis.

      Weaknesses:

      In the in vitro model, the author only used CD34+ cell-derived erythropoiesis as the validation, which is relatively simple, and more in vitro erythropoiesis models need to be used to strengthen the conclusion.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study from Zhu and colleagues, a clear role for MED26 in mouse and human erythropoiesis is demonstrated that is also mapped to amino acids 88-480 of the human protein. The authors also show the unique expression of MED26 in later-stage erythropoiesis and propose transcriptional pausing and condensate formation mechanisms for MED26's role in promoting erythropoiesis. Despite the author's introductory claim that many questions regarding Pol II pausing in mammalian development remain unanswered, the importance of transcriptional pausing in erythropoiesis has actually already been demonstrated (Martell-Smart, et al. 2023, PMID: 37586368, which the authors notably did not cite in this manuscript). Here, the novelty and strength of this study is MED26 and its unique expression kinetics during erythroid development.

      Strengths:

      The widespread characterization of kinetics of mediator complex component expression throughout the erythropoietic timeline is excellent and shows the interesting divergence of MED26 expression pattern from many other mediator complex components. The genetic evidence in conditional knockout mice for erythropoiesis requiring MED26 is outstanding. These are completely new models from the investigators and are an impressive amount of work to have both EpoR-driven deletion and inducible deletion. The effect on red cell number is strong in both. The genetic over-expression experiments are also quite impressive, especially the investigators' structure-function mapping in primary cells. Overall the data is quite convincing regarding the genetic requirement for MED26. The authors should be commended for demonstrating this in multiple rigorous ways.

      Thank you for your positive feedback.

      Weaknesses:

      (1) The authors state that MED26 was nominated for study based on RNA-seq analysis of a prior published dataset. They do not however display any of that RNA-seq analysis with regards to Mediator complex subunits. While they do a good job showing protein-level analysis during erythropoiesis for several subunits, the RNA-seq analysis would allow them to show the developmental expression dynamics of all subunit members.

      Thank you for this helpful suggestion. While we did not originally nominate MED26 based on RNA-seq analysis, we have analyzed the transcript levels of Mediator complex subunits in our RNA-seq data across different stages of erythroid differentiation (Author response image 1). The results indicate that most Mediator subunits, including MED26, display decreased RNA expression over the course of differentiation, with the exception of MED25, as reported previously (Pope et al., Mol Cell Biol 2013. PMID: 23459945).

      Notably, our study is based on initial observations at the protein level, where we found that, unlike most other Mediator subunits that are downregulated during erythropoiesis, MED26 remains relatively abundant. Protein expression levels more directly reflect the combined influences of transcription, translation and degradation processes within cells, and are likely more closely related to biological functions in this context. It is possible that post-transcriptional regulation (such as m6A-mediated improvement of translational efficiency) or post-translational modifications (like escape from ubiquitination) could contribute to the sustained levels of MED26 protein, and this will be an interesting direction for future investigation.

      Author response image 1.

      Relative RNA expression of Mediator complex subunits during erythropoiesis in human CD34+ erythroid cultures. Different differentiation stages from HSPCs to late erythroblasts were identified using CD71 and CD235a markers, progressing sequentially as CD71-CD235a-, CD71+CD235a-, CD71+CD235a+, and CD71-CD235a+. Expression levels were presented as TPM (transcripts per million).

      (2) The authors use an EpoR Cre for red cell-specific MED26 deletion. However, other studies have now shown that the EpoR Cre can also lead to recombination in the macrophage lineage, which clouds some of the in vivo conclusions for erythroid specificity. That being said, the in vitro erythropoiesis experiments here are convincing that there is a major erythroid-intrinsic effect.

      Thank you for this insightful comment. We recognize that EpoR-Cre can drive recombination in both erythroid and macrophage lineages (Zhang et al., Blood 2021, PMID: 34098576). However, EpoR-Cre remains the most widely used Cre for studying erythroid lineage effects in the hematopoietic community. Numerous studies have employed EpoR-Cre for erythroid-specific gene knockout models (Pang et al, Mol Cell Biol 2021, PMID: 22566683; Santana-Codina et al., Haematologica 2019, PMID: 30630985; Xu et al., Science 2013, PMID: 21998251.).

      While a GYPA (CD235a)-Cre model with erythroid specificity has recently been developed (https://www.sciencedirect.com/science/article/pii/S0006497121029074), it has not yet been officially published. We look forward to utilizing the GYPA-Cre model for future studies. As you noted, our in vivo mouse model and primary human CD34+ erythroid differentiation system both demonstrate that MED26 is essential for erythropoiesis, suggesting that the regulatory effects of MED26 in our study are predominantly erythroid-intrinsic.

      (3) Te donor chimerism assessment of mice transplanted with MED26 knockout cells is a bit troubling. First, there are no staining controls shown and the full gating strategy is not shown. Furthermore, the authors use the CD45.1/CD45.2 system to differentiate between donor and recipient cells in erythroblasts. However, CD45 is not expressed from the CD235a+ stage of erythropoiesis onwards, so it is unclear how the authors are detecting essentially zero CD45-negative cells in the erythroblast compartment. This is quite odd and raises questions about the results. That being said, the red cell indices in the mice are the much more convincing data.

      Thank you for your careful and thorough feedback. We have now included negative staining controls (Author response image 2A, top). We agree that CD45 is typically not expressed in erythroid precursors in normal development. Prior studies have characterized BFU-E and CFU-E stages as c-Kit+CD45+Ter119−CD71low and c-Kit+CD45−Ter119−CD71high cells in fetal liver (Katiyar et al, Cells 2023, PMID: 37174702).

      However, our observations indicate that erythroid surface markers differ during hematopoiesis reconstitution following bone marrow transplantation.  We found that nearly all nucleated erythroid progenitors/precursors (Ter119+Hoechst+) express CD45 after hematopoiesis reconstitution (Author response image 2A, bottom).

      To validate our assay, we performed next-generation sequencing by first mixing mouse CD45.1 and CD45.2 total bone marrow cells at a 1:2 ratio. We then isolated nucleated erythroid progenitors/precursors (Ter119+Hoechst+) by FACS and sequenced the CD45 gene locus by targeted sequencing. The resulting CD45 allele distribution matched our initial mixing ratio, confirming the accuracy of our approach (Author response image 2B).

      Moreover, a recent study supports that reconstituted erythroid progenitors can indeed be distinguished by CD45 expression following bone marrow transplantation (He et al., Nature Aging 2024, PMID: 38632351. Extended Data Fig. 8). 

      In conclusion, our data indicate that newly formed erythroid progenitors/precursors post-transplant express CD45, enabling us to identify nucleated erythroid progenitors/precursors by Ter119+Hoechst+ and determine their origin using CD45.1 and CD45.2 markers.

      Author response image 2.

      Representative flow cytometry gating strategy of erythroid chimerism following mouse bone marrow transplantation. A. Gating strategy used in the erythroid chimerism assay. B. Targeted sequencing result of Ter119+Hoechst+ cells isolated by FACS. The cell sample was pre-mixed with 1/3 CD45.2 and 2/3 CD45.1 bone marrow cells. Ptprc is the gene locus for CD45.

      (4) The authors make heavy use of defining "erythroid gene" sets and "non-erythroid gene" sets, but it is unclear what those lists of genes actually are. This makes it hard to assess any claims made about erythroid and non-erythroid genes.

      Thank you for this helpful suggestion. We defined "erythroid genes" and "non-erythroid genes" based on RNA-seq data from Ludwig et al. (Cell Reports 2019. PMID: 31189107. Figure 2 and Table S1). Genes downregulated from stages k1 to k5 are classified as “non-erythroid genes,” while genes upregulated from stages k6 to k7 are classified as “erythroid genes.” We will add this description in the revised manuscript.

      (5) Overall the data regarding condensate formation is difficult to interpret and is the weakest part of this paper. It is also unclear how studies of in vitro condensate formation or studies in 293T or K562 cells can truly relate to highly specialized erythroid biology. This does not detract from the major findings regarding genetic requirements of MED26 in erythropoiesis.

      Thank you for the rigorous feedback. Assessing the condensate properties of MED26 protein in primary CD34+ erythroid cells or mouse models is indeed challenging. As is common in many condensate studies, we used in vitro assays and cellular assays in HEK293T and K562 cells to examine the biophysical properties (Figure S7), condensation formation capacity (Figure 5C and Figure S7C), key phase-separation regions of MED26 protein (Figure S6), and recruitment of pausing factors (Figure 6A-B) in live cells. We then conducted functional assays to demonstrate that the phase-separation region of MED26 can promote erythroid differentiation similarly to the full-length protein in the CD34+ system and K562 cells (Figure 5A). Specifically, overexpressing the MED26 phase-separation domain accelerates erythropoiesis in primary human erythroid culture, while deleting the Intrinsically Disordered Region (IDR) impairs MED26’s ability to form condensates and recruit PAF1 in K562 cells.

      In summary, we used HEK293T cells to study the biochemical and biophysical properties of MED26, and the primary CD34+ differentiation system to examine its developmental roles. Our findings support the conclusion that MED26-associated condensate formation promotes erythropoiesis.

      (6) For many figures, there are some panels where conclusions are drawn, but no statistical quantification of whether a difference is significant or not.

      Thank you for your thorough feedback. We have checked all figures for statistical quantification and added the relevant statistical analysis methods to the corresponding figure legends (Figure 2L and Figure S4C) to clarify the significance of the observed differences. The updated information will be incorporated into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al describes a novel role for MED26, a subunit of the Mediator complex, in erythroid development. The authors have discovered that MED26 promotes transcriptional pausing of RNA Pol II, by recruiting pausing-related factors.

      Strengths:

      This is a well-executed study. The authors have employed a range of cutting-edge and appropriate techniques to generate their data, including: CUT&Tag to profile chromatin changes and mediator complex distribution; nuclear run-on sequencing (PRO-seq) to study Pol II dynamics; knockout mice to determine the phenotype of MED26 perturbation in vivo; an ex vivo erythroid differentiation system to perform additional, important, biochemical and perturbation experiments; immunoprecipitation mass spectrometry (IP-MS); and the "optoDroplet" assay to study phase-separation and molecular condensates.

      This is a real highlight of the study. The authors have managed to generate a comprehensive picture by employing these multiple techniques. In doing so, they have also managed to provide greater molecular insight into the workings of the MEDIATOR complex, an important multi-protein complex that plays an important role in a range of biological contexts. The insights the authors have uncovered for different subunits in erythropoiesis will very likely have ramifications in many other settings, in both healthy biology and disease contexts.

      Thank you for your thoughtful summary and encouraging feedback.

      Weaknesses:

      There are almost no discernible weaknesses in the techniques used, nor the interpretation of the data. The IP-MS data was generated in HEK293 cells when it could have been performed in the human CD34+ HSPC system that they employed to generate a number of the other data. This would have been a more natural setting and would have enabled a more like-for-like comparison with the other data.

      Thank you for your positive feedback and insightful suggestions. We will perform validation of the immunoprecipitation results in CD34+ derived erythroid cells to further confirm our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to explore whether other subunits besides MED1 exert specific functions during the process of terminal erythropoiesis with global gene repression, and finally they demonstrated that MED26-enriched condensates drive erythropoiesis through modulating transcription pausing.

      Strengths:

      Through both in vitro and in vivo models, the authors showed that while MED1 and MED26 co-occupy a plethora of genes important for cell survival and proliferation at the HSPC stage, MED26 preferentially marks erythroid genes and recruits pausing-related factors for cell fate specification. Gradually, MED26 becomes the dominant factor in shaping the composition of transcription condensates and transforms the chromatin towards a repressive yet permissive state, achieving global transcription repression in erythropoiesis.

      Thank you for your positive summary and feedback.

      Weaknesses:

      In the in vitro model, the author only used CD34+ cell-derived erythropoiesis as the validation, which is relatively simple, and more in vitro erythropoiesis models need to be used to strengthen the conclusion.

      Thank you for your thoughtful suggestions. We have shown that MED26 promotes erythropoiesis using the primary human CD34+ differentiation system (Figure 2 K-M and Figure S4) and have demonstrated its essential role in erythropoiesis through multiple mouse models (Figure 2A-G and Figure S1-3). Together, these in vitro and in vivo results support our conclusion that MED26 regulates erythropoiesis. However, we are open to further validating our findings with additional in vitro erythropoiesis models, such as iPSC or HUDEP erythroid differentiation systems.

    1. eLife Assessment

      This auhors present findings on the role of the sirtuins SIRT1 and SIRT3 during Salmonella Typhimurium infection. This valuable study increases our understanding of the mechanisms used by this pathogen to interact with its host and may have implications for other intracellular pathogens. The reviewers disagreed on the strength of the evidence to support the claims. Although one reviewer found the strength of the evidence convincing, the other found that it was incomplete, and that the main claims are only partially supported, as can be seen from the public reviews.

    2. Reviewer #2 (Public review):

      Dipasree Hajra et al demonstrated that Salmonella was able to modulate the expression of Sirtuins (Sirt1 and Sirt3) and regulate the metabolic switch in both host and Salmonella, promoting its pathogenesis. The authors found Salmonella infection induced high levels of Sirt1 and Sirt3 in macrophages, which were skewed toward the M2 phenotype allowing Salmonella to hyper-proliferate. Mechanistically, Sirt1 and Sirt3 regulated the acetylation of HIF-1alpha and PDHA1, therefore mediating Salmonella-induced host metabolic shift in the infected macrophages. Interestingly, Sirt1 and Sirt3-driven host metabolic switch also had an effect on the metabolic profile of Salmonella. Counterintuitively, inhibition of Sirt1/3 led to increased pathogen burdens in an in vivo mouse model. Overall, this is a well-designed study.

      The revised manuscript has addressed all of the previous comments. The re-analysis of flow cytometry and WB data by authors makes the results and conclusion more complete and convincing.

    3. Reviewer #3 (Public review):

      Summary:

      In this paper Hajra et al have attempted to identify the role of Sirt1 and Sirt3 in regulating metabolic reprogramming and macrophage host defense. They have performed gene knock down experiments in RAW macrophage cell line to show that depletion of Sirt1 or Sirt3 enhances the ability of macrophages to eliminate Salmonella Typhimurium. However, in mice inhibition of Sirt1 resulted in dissemination of the bacteria but the bacterial burden was still reduced in macrophages. They suggest that the effect they have observed is due to increased inflammation and ROS production by macrophages. They also try to establish a weak link with metabolism. They present data to show that the switch in metabolism from glycolysis to fatty acid oxidation is regulated by acetylation of Hif1a, and PDHA1.

      Strengths:

      The strength of the manuscript is that the role of Sirtuins in host-pathogen interactions have not been previously explored in-depth making the study interesting. It is also interesting to see that depletion of either Sirt1 or Sirt3 result in a similar outcome.

      Weaknesses:

      The major weakness of the paper is the low quality of data, making it harder to substantiate the claims. Also, there are too many pathways and mechanisms being investigated. It would have been better if the authors had focussed on either Sirt1 or Sirt3 and elucidated how it reprograms metabolism to eventually modulate host response against Salmonella Typhimurium. Experimental evidences are also lacking to prove the proposed mechanisms. For instance they show correlative data that knock down of Sirt1 mediated shift in metabolism is due to HIF1a acetylation but this needs to be proven with further experiments.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Public review):

      Dipasree Hajra et al demonstrated that Salmonella was able to modulate the expression of Sirtuins (Sirt1 and Sirt3) and regulate the metabolic switch in both host and Salmonella, promoting its pathogenesis. The authors found Salmonella infection induced high levels of Sirt1 and Sirt3 in macrophages, which were skewed toward the M2 phenotype allowing Salmonella to hyper-proliferate. Mechanistically, Sirt1 and Sirt3 regulated the acetylation of HIF-1alpha and PDHA1, therefore mediating Salmonella-induced host metabolic shift in the infected macrophages. Interestingly, Sirt1 and Sirt3-driven host metabolic switch also had an effect on the metabolic profile of Salmonella. Counterintuitively, inhibition of Sirt1/3 led to increased pathogen burdens in an in vivo mouse model. Overall, this is a well-designed study.<br /> The revised manuscript has addressed all of the previous comments. The re-analysis of flow cytometry and WB data by authors makes the results and conclusion more complete and convincing.

      We are immensely grateful to the reviewer for improving the strength of the manuscript by providing insightful comments and for appreciating the work.

      Reviewer #3 (Public review):

      Summary:

      In this paper Hajra et al have attempted to identify the role of Sirt1 and Sirt3 in regulating metabolic reprogramming and macrophage host defense. They have performed gene knock down experiments in RAW macrophage cell line to show that depletion of Sirt1 or Sirt3 enhances the ability of macrophages to eliminate Salmonella Typhimurium. However, in mice inhibition of Sirt1 resulted in dissemination of the bacteria but the bacterial burden was still reduced in macrophages. They suggest that the effect they have observed is due to increased inflammation and ROS production by macrophages. They also try to establish a weak link with metabolism. They present data to show that the switch in metabolism from glycolysis to fatty acid oxidation is regulated by acetylation of Hif1a, and PDHA1.

      Strengths:

      The strength of the manuscript is that the role of Sirtuins in host-pathogen interactions have not been previously explored in-depth making the study interesting. It is also interesting to see that depletion of either Sirt1 or Sirt3 result in a similar outcome.

      Weaknesses:

      The major weakness of the paper is the low quality of data, making it harder to substantiate the claims. Also, there are too many pathways and mechanisms being investigated. It would have been better if the authors had focussed on either Sirt1 or Sirt3 and elucidated how it reprograms metabolism to eventually modulate host response against Salmonella Typhimurium. Experimental evidences are also lacking to prove the proposed mechanisms. For instance they show correlative data that knockdown of Sirt1 mediated shift in metabolism is due to HIF1a acetylation but this needs to be proven with further experiments.

      As the public review of the reviewer remains unaltered as the previous version without further recommendations for authors, we are sticking to our former author’s response. We respect the reviewer’s opinion and thank the reviewer for the critical analysis of our work.

      ---------

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Dipasree Hajra et al demonstrated that Salmonella was able to modulate the expression of Sirtuins (Sirt1 and Sirt3) and regulate the metabolic switch in both host and Salmonella, promoting its pathogenesis. The authors found Salmonella infection induced high levels of Sirt1 and Sirt3 in macrophages, which were skewed toward the M2 phenotype allowing Salmonella to hyper-proliferate. Mechanistically, Sirt1 and Sirt3 regulated the acetylation of HIF-1alpha and PDHA1, therefore mediating Salmonella-induced host metabolic shift in the infected macrophages. Interestingly, Sirt1 and Sirt3-driven host metabolic switch also had an effect on the metabolic profile of Salmonella. Counterintuitively, inhibition of Sirt1/3 led to increased pathogen burdens in an in vivo mouse model. Overall, this is a well-designed study.

      Comments on revised version:

      The authors have performed additional experiments to address the discrepancy between in vitro and in vivo data. While this offers some potential insights into the in vivo role of Sirt1/3 in different cell types and how this affects bacterial growth/dissemination, I still believe that Sirt1/3 inhibitors could have some effect on the gut microbiota contributing to increased pathogen counts. This possibility can be discussed briefly to give a better scenario of how Sirt1/3 inhibitors work in vivo. Additionally, the manuscript would improve significantly if some of the flow cytometry analysis and WB data could be better analyzed.

      We are highly grateful for your valuable and insightful comments. Thank you for appreciating the merit of our manuscript. As rightly pointed out by the eminent reviewer, we acknowledge the probable link of Sirtuin on gut microbiota and its effect on increased bacterial loads as indicated by previous literature studies (PMID: 22115311, PMID: 19228061). These reports suggested that a low dose of Sirt1 activator, resveratrol treatment in rats for 25 days treatment under 5% DSS induced colitis condition led to alterations in gut microbiota profile with increased lactobacilli and bifidobacteria alongside reduced abundance of enterobacteria. This study correlates with our study wherein we have detected enhanced Salmonella (belonging to Enterobacteriaceae family) loads under both Sirt1/3 in vivo knockdown condition or inhibitor-treated condition in C57BL/6 mice and reduced burden under Sirt-1 activator treatment SRT1720.

      As per your valid suggestion, we have discussed this possibility in our discussion section. (Line- 541-548).

      We have incorporated the suggestions for the improvement in the analysis of WB data and flow cytometry.

      Reviewer #3 (Public Review):

      Summary:

      In this paper Hajra et al have attempted to identify the role of Sirt1 and Sirt3 in regulating metabolic reprogramming and macrophage host defense. They have performed gene knock down experiments in RAW macrophage cell line to show that depletion of Sirt1 or Sirt3 enhances the ability of macrophages to eliminate Salmonella Typhimurium. However, in mice inhibition of Sirt1 resulted in dissemination of the bacteria but the bacterial burden was still reduced in macrophages. They suggest that the effect they have observed is due to increased inflammation and ROS production by macrophages. They also try to establish a weak link with metabolism. They present data to show that the switch in metabolism from glycolysis to fatty acid oxidation is regulated by acetylation of Hif1a, and PDHA1.

      Strengths:

      The strength of the manuscript is that the role of Sirtuins in host-pathogen interactions has not been previously explored in-depth making the study interesting. It is also interesting to see that depletion of either Sirt1 or Sirt3 results in a similar outcome.

      Weaknesses:

      The major weakness of the paper is the low quality of data, making it harder to substantiate the claims. Also, there are too many pathways and mechanisms being investigated. It would have been better if the authors had focussed on either Sirt1 or Sirt3 and elucidated how it reprograms metabolism to eventually modulate host response against Salmonella Typhimurium. Experimental evidence is also lacking to prove the proposed mechanisms. For instance they show correlative data that knock down of Sirt1 mediated shift in metabolism is due to HIF1a acetylation but this needs to be proven with further experiments.

      We appreciate the reviewer’s critical analysis of our work. In the revised manuscript, we aimed to eliminate the low-quality data sets and have tried to substantiate them with better and conclusive ones, as directed in the recommendations for the author section. We agree with the reviewer that the inclusion of both Sirtuins 1 and 3 has resulted in too many pathways and mechanisms and focusing on one SIRT and its mechanism of metabolic reprogramming and immune modulation would have been a less complicated alternative approach. However, as rightly pointed out, our work demonstrated the shared and few overlapping roles of the two sirtuins, SIRT1 and SIRT3, together mediating the immune-metabolic switch upon Salmonella infection. As per the reviewer’s suggestion, we have performed additional experiments with HIF-1α inhibitor treatment in our revised manuscript to substantiate our correlative findings on SIRT1-mediated regulation of host glycolysis (Fig.7G). We wanted to clarify our claim in this regard. Our results suggested that loss of SIRT1 function triggered increased host glycolysis alongside hyperacetylation of HIF-1α. HIF-1α is reported to be one of the important players in glycolysis regulation (Kierans SJ, Taylor CT. Regulation of glycolysis by the hypoxia-inducible factor (HIF): implications for cellular physiology. J Physiol. 2021;599(1):23-37. doi:10.1113/JP280572.) and additionally, SIRT1 has been shown to regulate HIF-1α acetylation status (Lim JH, Lee YM, Chun YS, Chen J, Kim JE, Park JW. Sirtuin 1 modulates cellular responses to hypoxia by deacetylating hypoxia-inducible factor 1 alpha. Mol Cell. 2010;38(6):864-878. doi:10.1016/j.molcel.2010.05.023.) Further, ectopic expression of SIRT1 has been demonstrated to reduce glycolysis by negatively regulating HIF-1α. (Wang Y, Bi Y, Chen X, et al. Histone Deacetylase SIRT1 Negatively Regulates the Differentiation of Interleukin-9-Producing CD4(+) T Cells. Immunity. 2016;44(6):1337-1349. doi:10.1016/j.immuni.2016.05.009). We have subsequently shown in Fig. 7G, that the increase in host glycolysis upon SIRT knockdown in the infected macrophages gets lowered upon HIF-1α inhibitor treatment, suggesting that one of the mechanisms of SIRT-mediated regulation of host glycolysis is via regulation of HIF-1α. However, this warrants further future mechanistic research.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Figures 8I-S: are only viable cells used for analysis? Please provide gating strategy used for these analyses.

      (2) Many changes seen in WB seem to be marginal. Since the authors used densitometric plot to quantify the band intensities, I expect these experiments were repeated at least three times. Please indicate the number of repeats. For instance, Figures 7C, 7I (UI SCR vs UI shSIRT3), 7J, show marginal changes or no changes. What do other WB images look like? Are they more convincing than the ones currently shown? Please provide them in the response letter.

      (3) Figure 7C: label is a bit misleading. Please relabel the figure title to Acetylated HIF vs total levels

      (4) Figure 7J: which band is AcPDHA1?

      (1) We are highly apologetic for not clarifying our gating strategy for the analysis.

      We initially gated the viable splenocyte population based on Forward scatter (FSC) and Side Scatter (SSC). This gated population was further subjected to gating based on cell FSC-H (height) versus FSC-A (area). Subsequently, the population was gated as per SSC-A and GFP (expressed by intracellular bacteria) based on the autofluorescence exhibited by the uninfected control (Fig. 8I-J).

      Author response image 1.

      UNINFECTED

      Author response image 2.

      VEHICLE CONTROL INFECTED

      Author response image 3.

      EX-527 INFECTED

      Author response image 4.

      3TYP INFECTED

      Author response image 5.

      SRT 1720 INFECTED

      For gating different cell types such as F4/80 (PE) positive population in Fig. 8K-L, the viable cell population was gated based on SSC-A versus PE-A to gate the macrophage population. These macrophage populations were gated further based on GFP (Salmonella) + population to obtain the percentage of macrophage population harboring GFP+ bacteria. Similar strategies were followed for other cell types as depicted in Fig. 8M-S, Fig. S8.

      (2) We agree with the reviewer’s concern with the marginal changes in the western blots (Figures 7C, 7I (UI SCR vs UI shSIRT3), 7J). As per the suggestions, we have provided the alternate blot images and have indicated the number of repeats in the manuscript. The alternate blot images are provided herewith:

      Author response image 6.

      Alternate blot images for Fig. 7B-C

      Author response image 7.

      Alternate blot images for Fig. 7I, J

      (1) We are highly thankful to the reviewer for recommending this suggestion. We have made the necessary modifications of relabelling Fig. C to Acetylated HIF-1α over total HIF-1α as per the suggestion.

      (2) 7J Acetylated PDHA1 has been duly pointed as per the suggestion. We are extremely apologetic for the inconvenience caused.

      Author response image 8.

      Reviewer #3 (Recommendations For The Authors):

      The authors have done some work to improve the manuscript. However, the data presented lacks clarity.

      Fig 4B: I still do not see a change in Ac p65 in the less saturated blot. It looks reduced as the band is distorted. I am not sure how this could be quantified.

      Fig S2 b-actin bands are hyper saturated, and it is not possible to decipher the knockdown efficiency. It is probably better to provide a ponceau staining similar to S2C. The band intensity values are out of place.

      Fig 5F HADHA blot: Lane 1 expression appears to be significantly higher than lane 3, but the values mentioned do not match the intensity of the bands.

      It is hard to interpret the authors' claim that the shift in metabolism is HIF1a-dependent.

      Fig 7B: I would expect HIF1a acetylation to be increased in UI ShSIRT1 compared to UI SCR. The blot shows reduced HIF1a acetylation.

      Fig 7D: SIRT1 immunoprecipitates with HIF1a equally under all conditions. Is this what the authors expect? Labelling of the blots are not clear. It looks like the bottom SIRT1 blot is from Beads IgG control.

      Fig 7H: How does PDHA1 interact with SIRT3 so strongly in shSIRT3 cells (lane 2)?

      Authors have mentioned in their response that a knockdown of 40% has been achieved in the uninfected but the blot does not reflect that. SIRT3 expression seems to be more in the knockdown.

      Blots are also not labelled properly especially Input. The lanes are not marked.

      We thank the reviewer for acknowledging the improvements in the revised version and for suggesting further clarifications and improvements.

      We have tried to incorporate the specified modifications to the best of our abilities in the revised manuscript.

      We are highly apologetic for the inconclusive blot image in the figure 4B. We have provided an alternative blot image with better clarity for Fig.4B used for quantification analysis.

      Author response image 9.

       

      As per the reviewer’s valuable suggestions, we have provided the ponceau image in the Fig. S2B.

      We thank the reviewers for rightly pointing out the discrepancy in the band intensity quantification in the Fig. 5F. We have re-evaluated the intensities on imageJ and have provided with the correct band intensities. We are highly apologetic for the inaccuracies.

      As per the reviewer’s previous suggestion, we have performed additional experiments with HIF-1α inhibitor treatment in our revised manuscript to substantiate our correlative findings on SIRT1-mediated regulation of host glycolysis (Fig.7G). We wanted to clarify our claim in this regard. Our results suggested that loss of SIRT1 function triggered increased host glycolysis alongside hyperacetylation of HIF-1α. HIF-1α is reported to be one of the important players of glycolysis regulation (Kierans SJ, Taylor CT. Regulation of glycolysis by the hypoxia-inducible factor (HIF): implications for cellular physiology. J Physiol. 2021;599(1):23-37. doi:10.1113/JP280572.) and additionally, SIRT1 has been shown to regulate HIF-1α acetylation status (Lim JH, Lee YM, Chun YS, Chen J, Kim JE, Park JW. Sirtuin 1 modulates cellular responses to hypoxia by deacetylating hypoxia-inducible factor 1alpha. Mol Cell. 2010;38(6):864-878. doi:10.1016/j.molcel.2010.05.023.) Further, ectopic expression of SIRT1 has been demonstrated to reduce glycolysis by negatively regulating HIF-1α. (Wang Y, Bi Y, Chen X, et al. Histone Deacetylase SIRT1 Negatively Regulates the Differentiation of Interleukin-9-Producing CD4(+) T Cells. Immunity. 2016;44(6):1337-1349. doi:10.1016/j.immuni.2016.05.009). We have subsequently shown in Fig. 7G, that the increase in host glycolysis upon SIRT knockdown in the infected macrophages gets lowered upon HIF-1α inhibitor treatment, suggesting that one of the mechanisms of SIRT-mediated regulation of host glycolysis is via regulation of HIF-1α. However, this warrants further future mechanistic research.

      We agree with the reviewer’s claim of increased HIF-1α acetylation in the UI sh1 versus UI SCR. The apparent reduced acetylation depicted in UI sh1 in Fig. 7B could be attributed to lower HIF-1α levels in the UI sh1 compared to UI SCR. Therefore, we have provided an alternate blot image that been used for quantification in Fig. 7C (Author response image 6).

      To answer the reviewer’s question in Fig. 7D, we have noticed more or less equal degree of immunoprecipitation of HIF-1α under pull down of HIF-1α in all the sample cohorts under conditions of SIRT1 inhibitor treatment. However, we have observed reduced interaction of HIF-1α with SIRT1 in the infected sample upon SIRT1 inhibitor treatment.

      We thank the reviewers for suggesting improvements in the blot labelling and for raising this concern. We have corrected the blot labelling to avoid the previous confusion.

      We appreciate the reviewer’s concern and therefore we have provided an alternate blot image for Fig. 7H which might address the previous stated concern wherein we have achieved an enhanced SIRT3 knockdown percentage.

      We are extremely apologetic for the improper labelling of the Input blot with unmarked lanes. We have addressed this issue by labelling the lanes in the input section of the blots.

    1. eLife Assessment

      This important study advances our understanding of the mechanisms controlling lipid flux and ion permeation in the TMEM16 and OSCA/TMEM63 family channels. The study provides compelling new evidence indicating that side chains along the TM4/6 interface play a key role in gating lipid and ion fluxes in these channels. The authors suggest that the transmembrane channel/scramblase family proteins may have originally functioned as scramblases but lost this capacity over evolution.

    2. Reviewer #1 (Public review):

      Summary:

      TMEM16, OSCA/TMEM63, and TMC belong to a large superfamily of ion channels where TMEM16 members are calcium activated lipid scramblases and chloride channels, whereas OSCA/TMEM63 and TMCs are mechanically activated ion channels. In the TMEM16 family, TMEM16F is a well characterized calcium activated lipid scramblase that play an important role in processes like blood coagulation, cell death signaling, and phagocytosis. In a previous study the group has demonstrated that lysine mutation in TM4 of TMEM16A can enable the calcium activated chloride channel to permeate phospholipids too. Based on this they hypothesize that the energy barrier for lipid scramblase in these ion channels is low, and that modification in the hydrophobic gate region by introducing a charged side chain between TM4/6 interface in TMEM16 and OSCA/TMEM63 family can allow lipid scramblase. In this manuscript, using scramblase activity via Annexin V binding to phosphatidylserine, and electrophysiology, the authors demonstrate that lysine mutation in TM4 of TMEM16F and TMEM16A can cause constitutive lipid scramblase activity. The authors then go on to show that analogous mutations in OSCA1.2 and TMEM63A can lead to scramblase activity. The revised version does a thorough characterization of residues that form the hydrophobic gate region in TM4/6 of this superfamily of channels. Their results indicated that disrupting the TM4/6 interaction can reduce energy barrier for this channels to scramblase lipids.

      Strengths:

      Overall, the authors introduce an interesting concept that this large superfamily can permeate ions and lipids.

      Weaknesses:

      none noted in the revised version.

    3. Reviewer #2 (Public review):

      This focused study by Lowry and colleagues that identifies a key molecular motif that controls ion permeation vs combined ion permeation and lipid transport in three families of channel/scramblase proteins, in TMEM16 channels, in the plant-expressed and stress-gated cation channel OSCA, and in the mammalian homolog and mechanosensitive cation channel, TMEM63. Between them, these three channels share low sequence similarity and have seemingly differing functions, as anion (TMEM16 channels), or stress-activated cation channels (OSCA/TMEM63). The study finds that in all three families, mutating a single hydrophobic residue in the ion permeation pathway of the channels confers lipid transport through the pores of the channels, indicating that TMEM16 and related OSCA and TMEM63 channels have a conserved potential for both ion and lipid permeation. The authors interpret the findings as revealing that these channel/scramblase proteins have a relatively low "energetic barrier for scramblase" activity. The experiments are done with a high level of rigor and the revised paper is very well written and addresses the previous concerns.

    4. Reviewer #3 (Public review):

      This study was focused on the conserved mechanisms across the Transmembrane Channel/Scramblase superfamily, which includes members of the TMEM16, TMEM63/OSCA, and TMC families. In previous work, the authors have studied the role of the inner activation gate of these proteins. Here, the authors show that the introduction of mutations at the TM4-TM6 interface, which are close to the inactivation gate, can disrupt gating and confer scramblase activity to non-scramblases proteins.

      Overall, the confocal imaging experiments, patch clamping experiments, and data analysis are performed well and in line with standard methods. The molecular dynamics simulation work is focused but adds supportive evidence to their findings. Although there could have been more extensive molecular analysis to bolster the authors' arguments on the role of the TM4-TM6 interface (e.g. evaluate effects of size/hydrophobicity, double mutants, cross-linking, more in-depth simulation data), there is adequate evidence to conclude that certain residues at this interface is critical to ion conduction and phospholipid scramblase activity. The data presented only adds incremental depth of knowledge for each individual channel, but together, they show this to be true for conserved TM4 residues across TMEM16F, TMEM16A, OSCA1.2, and TMEM63A proteins. This breadth of data is a major strength of this paper, and provides strong evidence for a coupled pathway for ion conduction and phospholipid transport, though the underlying biophysical mechanism is still speculative and remains to be elucidated.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      Figures 1 and 2. How do the authors know that the lysine mutations are specific to constitutive activity and not because it is causing the channel to be now voltage sensitive? 

      As shown in the revised Figs. 1b, S2a, and 3b, TMEM16F I521K/M522K, TMEM16F I521E, and TMEM16A I546K/I547K spontaneously expose PS, respectively. Neither membrane depolarization nor calcium stimulation was introduced under these conditions and the cells were grown in calcium-free media after transfection to limit calcium-dependent activation. Our new experiments further demonstrate that TMEM16F T526K (Fig. 1b) and TMEM16A E551K (Fig. 3b), which are further away from the activation gate, exhibit either strongly attenuated or lack spontaneous lipid scrambling activity. According to these results, the gain-of-function mutants (TMEM16F

      I521K/M522K/I521E and TMEM16A I546K/I547K) are indeed constitutively active. This constitutive scramblase activity is not due to a gain of voltage sensitivity as ion channel activity is also minimal around the resting membrane potential of a HEK cell (Fig. 1d, e and Fig. 3d, e).

      The authors see very large currents of 5 -10 nA in their electrophysiology experiments in Figures 2D and 3D. I understand that Figure 2D are whole-cell recordings but are the authors confident that the currents that they are recordings from the mutants are indeed specific to TMEM16A. More importantly, in Figure 3D they see 3-5nA currents in insideout patches, which is huge. They have no added divalent in their bath solution, which could lead to larger single-channel amplitudes, but 3-5nA seems excessive. Some control to demonstrate that these are indeed OSCA1.2 currents is important. 

      TMEM16A and TMEM16F are well-known for their high cell surface expression. Therefore, the current amplitude is usually huge even in excised inside-out or outside-out patches—please see our previous publications for details: 1) 10.1016/j.cell.2012.07.036, 2) 10.7554/eLife.02772, 3) 10.1038/s41467-019-11784-8, 4) 10.1038/s41467-019-09778-7, 5) 10.1016/j.celrep.2020.108570, 6) 10.1085/jgp.202012704, and 7) 10.1085/jgp.202313460. 

      HEK293 cells do not have endogenous TMEM16A (https://doi.org/10.1038/nature07313, 10.1016/j.cell.2008.09.003 , DOI: 10.1126/science.1163518). It therefore serves as a widely used cell line for studying TMEM16A biophysics. As overexpressing the WT control barely elicited any obvious current in 0 Ca2+ (Fig. 3d), there is no doubt that the large outward-rectifying current (hallmark of CaCC) in the revised Fig. 3d (previous Fig. 2D) was elicited from the mutant TMEM16A channels. The strong outward rectification also rules out the possibility of this being leak current.

      Regarding Fig. 4d (previous Fig. 3D), OSCA1.2 has excellent surface expression as shown in Fig. 4b. OSCA1.2 also has much higher single channel conductance (121.8 ± 3.4 pS, 10.7554/eLife.41844) than TMEM16A (~3-8 pS) and TMEM16F (<1 pS). Therefore, recording nA OSCA1.2 current from excised patches is normal given larger OSCA1.2 current at depolarized voltages than the current recorded at hyperpolarized voltages (please see our explanation in the next response). As the reviewer pointed out, lack of divalent ions in our experimental conditions may also partially contribute to the large conductance. To further verify, we conducted mock transfection recordings (please see Author response image 1 below). WT- but not mock (GFP)transfected cells gave rise to large current, further supporting that the recorded current was indeed through OSCA1.2. 

      Author response image 1.

      Representative inside-out currents for mock (GFP)- and OSCA1.2 WT-transfected cells. OSCA1.2 is responsible for nA currents elicited by the pressure and voltage protocols shown.

      Figure 3D and 5D. Most of the traces and current quantification is done at positive potentials and is outward current. Do the authors observe inward currents? It is difficult to judge by the figures since currents are so large. OSCA/TMEM63s are cationic channels and all published data on these channels have demonstrated robust inward currents at negative, physiologically relevant potentials. The lack of inward currents but only large outward currents suggests that these mutations could be doing something else to the channel. 

      Yes. We indeed observe inward current at negative holding potentials under pressure clamp (Author response image 2). However, mechanosensitive OSCA and TMEM63A channels are also voltage dependent. Their outward current is an order of magnitude larger at depolarized voltages (e.g., Author response image 2, also 10.7554/eLife.41844, see Fig. 1H). 

      Author response image 2.

      Voltage-dependent rectification of OSCA1.2 current. a. Representative OSCA1.2 trace (bottom) elicited by a voltage-ramp under -50 mmHg (top). b. The difference in inward and outward current amplitudes. 

      We found that quantifying the OSCA1.2 outward current has advantages over the inward current. Usually, using the gold standard pressure clamp protocol at negative holding voltages, peak inward current amplitude is quantified. However, OSCA inward current quickly inactivates (10.7554/eLife.41844, see Fig. 1C). This makes robust quantification and comparison with mutant channels difficult. Holding the membrane at a constant pressure and measuring OSCA1.2 G-V overcomes these issues associated with the classical inward current measurements. The large depolarization-driven outward current does not inactivate, and robust tail current (Response Fig. 1, 2) allows us to construct G-V relationships. We found quantifying mutants’ voltage dependence at constant pressure is more consistent than quantifying pressure dependence at constant voltage. These advantages make our new protocol preferable to the commonly used gold standard pressure clamp protocol for characterizing and comparing the gating mutations identified in this manuscript. 

      Figure 3 and 5. Why are mechanically activated currents being recorded at random pressure stimuli (-50 mmHg for OSCA) and (-80 mmHg for Tmem63a)? The gold standard in the field is to run an entire pressure response curve. Given that only outward currents are observed at membrane potentials +120mV and above at 0mmHg, this questions whether they are indeed constitutively active. 

      As we explained in the previous response, both voltage and membrane stretch activate OSCA/TMEM63A channels. We found measuring voltage dependence under constant pressure provided more consistent quantification than the gold standard pressure response protocol. This may be due to the variability of applied membrane tension under repeated stretches versus the more consistent applied voltage. Additionally, we chose -50 mmHg and -80 mmHg to reflect the reported differences in half-maximal pressures between OSCA1.2 and TMEM63A (e.g., P50 ~55 mmHg for 1.2 and ~61 mmHg for 63A in 10.7554/eLife.41844 versus ~86 mmHg for 1.2 and -123 mmHg for 63A in 10.1016/j.neuron.2023.07.006).

      We also used higher pressure in cell attached mode to increase TMEM63A current amplitudes, which are usually tiny.  We have updated our method section (Lines 329334) to further clarify why we used these protocols. 

      Please note that in TMEM16 proteins, ions and lipids might not always co-transport.

      This means that under certain conditions, only one type of substrate may go through. For instance, in WT TMEM16F, Ca2+ stimulation can easily trigger PS exposure at resting membrane potential. No ionic currents are elicited until strong depolarization is applied. Similarly, the TMEM16F GOF mutations spontaneously transport lipids, leading to loss of lipid asymmetry (Fig. 1b, c). However, in 0 Ca2+, these TMEM16F mutant channels still need strong depolarization for ion conduction (Fig. 1d, e). Although the detailed mechanism still needs to be further investigated, the OSCA1.2 and TMEM63A GOF mutations share similar features with TMEM16 proteins, exhibiting ion conduction under high pressures and depolarizing voltages, yet constitutively active scrambling.  

      Some clarity is needed for their choice of residues. I understand that a lot of this is also informed by the structures of these ion channels. According to the alignment shown in Supplementary Figure 1, they chose LA for OSCA1.2, which is in line with the IM (TMEM16F) and II(TMEM16A) residues but for Tmem63a they chose the hydrophobic gate residue W and S. Was the A476 tested? Also, OSCA1.2 already has a K in the hydrophobic gating residue region. How do the authors reconcile this with their model? 

      We appreciate this critical comment. We have included the characterization of TMEM63A A476K (Fig. 6, corresponding to M522 in 16F, I547 in 16A, and A439 in OSCA1.2). Interestingly, A476K transfected cells did not show obvious spontaneous PS exposure yet exhibited a modest shift in V50 comparable to W472K and S475K. These differences may reflect the high-tension activated nature of the TMEM63 proteins (10.1016/j.neuron.2023.07.006) as compared to OSCA1.2, where the corresponding mutation (A439K, Fig. 4b, c) showed very little spontaneous activity and required hypotonic stimulation to promote more robust PS exposure (Fig. 5). 

      Furthermore, as we showed in Figs. 1b-c and 3b-c, there is a lower limit (towards the Cterminus) of the TM 4 lysine mutation effect, which becomes insufficient to cause a constitutively open pore for spontaneous lipid scrambling. It is possible that TMEM63A A476K represents the lower limit of TM 4 mutations that can convert TMEM63A into a spontaneous lipid scramblase.  

      Regarding OSCA1.2 K435 and TMEM63A W472, these sites correspond to the hydrophobic gate residues on TM 4 in TMEM16F (F518, Fig. 1a) and TMEM16A (L543, Fig. 3a) so it is unsurprising to us that a lysine mutation at this site causes constitutive scramblase activity in TMEM63A (Fig. 6b, c). For OSCA1.2, it is more intriguing since this residue is already a lysine (K435). In Supplementary Fig. 5 our new experiments show that neutralizing K435 with leucine (K435L) in the background of L438K significantly attenuates spontaneous PS exposure from ~63% PS positive for L438K alone (two lysine residues) to ~31% for K435L/L438K (one lysine). One the other hand, the K435L mutation by itself is also insufficient to induce PS exposure. Therefore, the endogenous lysine at residue 435 has an additive effect on the spontaneous scramblase activity of L438K. We believe the explanation for this result lies in experiments conducted in model transmembrane helices, which have shown that stacking hydrophilic side chains within the membrane interior promotes trans-bilayer lipid flipping (see 10.1248/cpb.c22-00133). 

      These same studies also support our observation (10.1038/s41467-019-09778-7) that highly hydrophilic side chains (such as lysine or glutamic acid) accelerate trans-bilayer lipid flipping more effectively than hydrophobic side chains such as isoleucine or alanine (Author response image 3, see also 10.1021/acs.jpcb.8b00298).

      Author response image 3.

      Trans-bilayer lipid flipping rates (kflip) accelerate with increasing side chain hydropathy for a residue placed in the center of a model transmembrane helical peptide

      How do the authors know that osmotic shock is indeed activating OSCA1.2 and TMEM63A? If they can record from the channels then electrophysiology data that confirms activation of the channel in the presence of hypoosmotic shock will strengthen the osmolarity active scramblase activity demonstrated in Figure 4. So far, there is conclusive data showing that they are mechanically activated but conclusive electrophysiological data for OSCA/TMEM63 osmolarity activation is not described yet, including the reference (38) they indicate in line 132. Although osmotic shock can perturb mechanical properties of the membrane it can also activate volume-regulated anion channels, which are also present in HEK cells. 

      Thank you for raising this important question. While reference 38, (now reference 39) shows direct electrophysiological evidence of hypertonicity-induced current (e.g., Fig. 4 f, g, i, and j in 10.1038/nature13593), direct electrophysiological evidence that OSCA/TMEM63 can be activated by hypotonic stimulation is still missing. To address this question, we conducted whole-cell patch clamp experiments on mocktransfected and OSCA1.2 WT-transfected cells stimulated with 120 mOsm/kg hypotonic solution, comparable to the same conditions as hypotonic-induced scrambling shown in Fig. 5. As shown in Supplementary Fig. 6, our whole-cell recording detected a slowly evolving yet robust outward rectifying current in OSCA1.2-transfected cells, which was not observed in mock transfected cells. 

      To avoid the contamination from endogenous SWELL osmo-/volume-regulated chloride channels, our new experiment used 140 mM Na gluconate to replace NaCl in both the pipette and the bath solution. Because SWELL/VRAC channels are minimally permeable to gluconate anions (e.g., 10.1007/BF00374290), we conclude that hypotonic stimulation can indeed activate OSCA1.2 albeit with perhaps lower efficiency compared to mechanical stimulation.  

      Minor comments 

      What is the timeline for the scramblase assay for all the experiments (except Figure 4)? How long is the AnnexinV incubated before imaging? 

      Thank you for pointing out this point where we have not provided sufficient detail. Cells were imaged in the scramblase assay (including in Fig. 4, now revised Fig. 5) in AnnexinV-containing buffer immediately and without a formal incubation period because AnnexinV binding to exposed PS proceeds rapidly. We have included additional detail in the methods section to eliminate any confusion (Lines 310-312).

      In some places of the document, it says OSCA/TMEM63, and in other places, it is denoted as TMEM63/OSCA. The literature so far has always called the family OSCA/TMEM63- please stay consistent with the field. 

      Thank you for pointing this out, we have corrected these instances to be consistent with the field.   

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors' statement that the channel/scramblase family members have a relatively low "energetic barrier for scramblase" activity needs further support. While mutating the hydrophobic channel gate certainly could destabilize ion conduction to cause a GOF effect on channel activity, it is still not clear why scramblase activity, which is tantamount to altered permeation, happens in the mutant channels. Are permeation and channel gating (opening) coupled in these channels? If so, what is the basis for the coupling? Is scramblase activity only observed when the gating is destabilized or are they separable? 

      We appreciate these great questions. For the question about the ‘energetic barrier’ statement, please see our response to point (3) where we have carried out MD simulations of the OSCA1.2 WT and L438K mutant to provide insight into how the permeation pathway is altered by these mutations. 

      Regarding why TMEM16A can be converted into a scramblase, we use the extensively studied TMEM16 proteins as examples to improve our current understanding of OSCA/TMEM63 proteins. For further details please see our original paper (10.1038/s41467-019-09778-7) and our review (10.3389/fphys.2021.787773), which are summarized as follows: 

      (1) The “neck region”, consisting of the exofacial halves of TMs 3-6, form the poregate region for both ion and lipid permeation (Author response image 4B). In the closed state, the neck region is constricted and TMs 4 and 6 interact with each other, preventing substrate permeation. The hydrophobic inner activation gate that we identified (10.1038/s41467-019-09778-7) resides right underneath the inner mouth of the neck region, controlling both ion and lipid permeation scrambling. 

      (2) Based on our functional observations and the available scramblase structures of TMEM16 proteins in multiple conformations, we proposed a clamshell-like gating model to describe TMEM16 lipid scrambling (Author response image 4D). According to this model, Ca2+-induced conformational changes weaken the TM 4/6 interface. This promotes the separation of the two transmembrane segments, analogous to the opening of a clam shell, allowing a membrane-spanning groove to facilitate permeation of the lipid headgroup.

      (3) For the CaCC, TMEM16A, Ca2+ binding dilates the pore. However, the binding energy likely cannot open the TM 4/6 interface at the neck region so, in the absence of groove formation, only Cl- ions but not lipids can permeate. (Pore dilation model, Author response image  4C). 

      (4) Introducing charged residues near the inner activation gate disrupts the neck region, potentially by weakening the hydrophobic interactions between TMs 4 and 6. This mutational effect results in constitutively active TMEM16F scramblases and enables spontaneous lipid permeation in the TMEM16A CaCC. 

      (5) In our revision, we tested additional mutations with different side chain properties (Supplementary Fig. 2), validating previous findings by us (10.1038/s41467-01909778-7) and others (10.1038/s41467-022-34497-x) that gate disruption increases with the side chain hydropathy of the mutation. 

      (6) We further extended lysine mutations to two helical turns below the inner activation gate on TM 4 and identified a lower limit for mutation-induced spontaneous scramblase activity in TMEM16F and TMEM16A (Figs. 1b, c and 3b, c, respectively). Together, all these points lend additional support to our proposed gating models for TMEM16 proteins, which we postulate may also relate to the OSCA/TMEM63 family based on the evidence provided in our manuscript.

      Author response image 4.

      Model of gating (and regulatory) mechanisms in the TMEM16 family. (B) overall architecture and proposed modules, (C) pore-dilation gating model for CaCCs, (D) Clamshell gating model for CaPLSases.

      Regarding the relationship between ion and lipid permeation through TMEM16 scramblases, the following is the summary of our current understanding: 

      (1) Functionally, ion and lipid permeation are not necessarily obligatory to each other. This is evidenced by our previous biophysical characterizations of TMEM16F ion channel and lipid scramblase activities. Ca2+ can trigger TMEM16F lipid scrambling at resting membrane potentials, however, Ca2+ alone is insufficient to record TMEM16F current. Strong membrane depolarization synergistically with elevated intracellular Ca2+ is required to activate ion permeation. Based on these observations, we postulate that ions and lipids may have different extracellular gates, despite sharing an inner activation gate (10.1038/s41467-019-09778-7). Ca2+ alone may sufficiently open the inner gate (and extracellular gate) for lipids, whereas depolarization is likely required to open the extracellular gate and allow ion flux. Further structure-function studies are needed to test this hypothesis. 

      (2) Structurally, the open conformation of TMEM16 scramblases such as the fungal orthologs and human TMEM16K (Supplementary Fig. 1 b-d) are widely open, which allows lipid and ion co-transport. Ion and lipid co-transport has also been demonstrated in various MD simulations (e.g., 10.7554/eLife.28671, 10.3389/fmolb.2022.903972, and 10.1038/s41467-021-22724-w)

      (3) Functionally, we (10.1085/jgp.202012704) and others (10.7554/eLife.06901.001) have measured dual recording of channel and scramblase activities, also demonstrating that ions and lipids are co-transported simultaneously when the proteins are fully activated.

      (4) In this manuscript, we also provide multiple examples (TMEM16F in Fig. 1, TMEM16A in Fig. 3, OSCA1.2 in Fig. 4, and TMEM63A in Fig. 6) of mutations showing spontaneous phospholipid scramblase activities, yet their channel activities require strong depolarization or, in the case of TMEM63A, high pressures to be elicited.

      Together, this new evidence further supports our hypothesis that there might be multiple gates for ion and lipid permeation, in addition to the shared inner gate we previously identified. We hope these detailed explanations help convey the intricacy of these intriguing questions. Of course, future studies are needed to test our hypothesis and elucidate the complex relationship between ion and lipid permeation of these proteins. 

      (2) One weakness in the experimental approach is the very limited number of substitutions used to infer the conclusion regarding the energetic barrier and other conclusions relating to scramblase activity. Additional substitutions of charged and polar amino acids at the hydrophobic gate would be helpful in illuminating the molecular determinants of the GOF phenotype and also reveal varying patterns of lipid permeation which could be enormously informative. These additional mutations for analysis of TMEM16F and OSCA should be added to the study. 

      We appreciate these great suggestions which were shared by multiple reviewers. We have included our duplicated response below.

      “Response to reviewers 2 & 3: In our 2019 paper (10.1038/s41467-019-09778-7), we have systematically tested the side chain properties at the inner activation gate of TMEM16F on lipid scrambling activity (Response Fig. 6) and, since then, these results have been supplemented by others as well (10.1038/s41467-022-34497-x). In summary, mutating the inner activation gate residues to polar or charged residues generally results in constitutively activated scramblases without requiring Ca2+ (Fig 5a in 10.1038/s41467-019-09778-7). Because these residues form a hydrophobic gate, introducing smaller side chains via alanine substitution are also gain-of-function with the Y563A mutant as well as the F518A/Y563A/I612A variant being constitutively active (Fig. 3a in 10.1038/s41467-019-09778-7). Meanwhile, mutating these gate residues to hydrophobic amino acids causes no change for I612W, a slight gain-of-function for F518W, slight loss-of-function of F518L, and complete loss-of-function for Y563W (Fig. 4b in 10.1038/s41467-01909778-7). These findings clearly demonstrate that the side-chain properties are critical for regulating the gate opening. Charged mutations including lysine and glutamic acid are the most effective to promote gate opening (Fig 5a in 10.1038/s41467-019-09778-7).

      Similarly, others have observed that side chain hydropathy at the F518 site in TMEM16F correlates with shifts in the Ca2+ EC50 (Fig. 2 of 10.1038/s41467-022-34497-x). Note that this publication resolved the structure of the TMEM16F F518H mutant, revealing a previously unseen conformation that we have highlighted in Supplementary Fig. 1e and discussed in lines 235-238. Please also see our response to Reviewer #1 above, where we discuss discoveries in model transmembrane helical peptide systems showing that transbilayer lipid flipping rates correlate with side chain hydropathy (Author response image 3), distance between stacked hydropathic residues (schematic in 10.1248/cpb.c22-00133), and even helical angle between stacked side chains (not show). 

      Following the reviewers’ suggestions, we have tested additional mutations in alternative locations and with different side chains.  

      (1) We have added data for TMEM16F I521A and I521E to demonstrate a similar effect of alternative side chains to what has previously been reported by us and others. We found that I521A failed to show spontaneous scrambling activity (Supplementary Fig. 2), yet I521E (Supplementary Fig. 2) is a constitutively active lipid scramblase, similar to I521K (Fig. 1). This further demonstrates that gate disruption correlates with the side chain hydropathy and that this site lines a critical gating interface.

      (2) We also added lysine mutations two helical turns below the conserved inner activation gate for TMEM16F T526 (Fig. 1), TMEM16A E551 (Fig. 3). We found that there is indeed a lower limit for the observed effect in TMEM16, where lysine mutations no longer induce spontaneous lipid scrambling activity. This indicates that when TM 4/6 interaction is weaker toward intracellular side (Figs. 1a, 3a), the TM 4 lysine mutation loses the ability to promoting lipid scrambling by disrupting the TM 4/6 interface to enable clamshell-like opening of the permeation pathway. 

      (3) We added a TMEM16F lysine mutation on TM 6 at residue I611 (Fig. 2). Similar to I612K (Response Fig. 6), I611K also leads to spontaneous lipid scrambling and enhanced channel activity in the absence of calcium (Fig. 2). This shows that charged mutations along TM 6 can also promote lipid scrambling, strengthening our model that hydrophobic interactions along the TM 4/6 interface are critical for gating and lipid permeation.”

      (3) Related to the above point, it would be enormously useful to perform even limited computational modelling to support the "energetic barrier" statement. Specifically, can the authors model waters in the putative pore to examine water occupancy in the WT and mutant channels to better understand how the barrier for ions and lipids is altered in the TMEM16? 

      We appreciate this suggestion and have now conducted atomistic MD simulations of OSCA1.2 WT and L438K mutant for ~1 μs (Supplementary Fig. 4). The simulations revealed, elevated water occupancy in the pore region of the L438K mutant, likely due to a widening at the TM 4/6 interface. Conversely, the WT interface remained constricted, largely disallowing water occupancy. These computational results support our previously proposed clamshell-like gating model for TMEM16 scramblases and provide strong support that the L438K mutation is disrupting the interaction of the TM 4/6 interface, in turn reducing the energetic barrier for both ion and lipid permeation. 

      (4) I am puzzled about the ability of OSCA and the TMEM63 proteins which are cation channels to conduct negatively charged lipids. How can the pore be selective for cations and yet permeate negatively charged molecules when lipids are presented? 

      This is a great question. TMEM16 scramblase (as well as other known scramblases, such as the Xkr and Opsin families) are surprisingly non-selective to phospholipids (all major phospholipid species, not just anionic lipids like PS). It is still debated whether lipid headgroups indeed insert into an open pore or hydrophilic groove (Response Fig. 5), or if they may traverse the bilayer by the so-called ‘out-of-groove’ model. Regardless of the model, the consensus is that Ca2+-induced conformational changes catalyze lipid permeation and the mutations we have introduced are designed to mimic these conformational changes by separating the TM 4/6 interface.

      Additionally, TMEM16F channel activity was first characterized as cation non-selective (10.1016/j.cell.2012.07.036), similar to OSCA/TMEM63s, which may even exhibit some chloride permeability (10.7554/eLife.41844.001). Thus, it appears as though scramblase activity is agnostic to headgroup charge and compatible with both a mutant anion channel (TMEM16A) and mutant cation channels (TMEM16F, OSCA1.2, and TMEM63A), however, more detailed structural, functional, and computational studies are needed to further clarify ion and lipid co-transport mechanisms.  

      (5) Do pore blockers like Gd3+ which block permeation also inhibit the scramblase activity of the mutant channels? This should be tested for the mutant channels. 

      While extracellular Gd3+ has been previously reported as an inhibitor of OSCA1.2 (10.7554/eLife.41844.001), we did not observe this effect (Author response image 5), but instead saw inhibition by intracellular Gd3+ (Author response image 6). Given this discrepancy, we did not test Gd3+ inhibition of the OSCA1.2 scramblases, but instead tested Ani9, a paralog-specific inhibitor of TMEM16A, on the TMEM16A I546K gain-offunction and found it attenuated both ion channel and phospholipid scramblase activities (Supplementary Fig. 3).

      Author response image 5.

      200 µM Gd3+ext fails to inhibit OSCA1.2 currents in cell-attached patches. Pressure-elicited peak currents (n=6 each). Statistical test is an unpaired Student’s t-test.

      Author response image 6.

      200 µM Gd3+int completely inhibits OSCA1.2 currents in inside-out patches. (a) representative traces in before (black), during (red), and after (blue) Gd3+ application. (b) Representative application timecourse. (c) Quantification of peak currents (n=8 each). Statistical test is one-way ANOVA.

      Minor: 

      - Some of the current amplitudes shown in Figures 2 and 3 are enormous. Is liquid junction potential corrected in these experiments? If not, it would be preferable to correct this to avoid voltage errors. 

      Thanks for the question. The large current amplitude is due to 1) great surface expression of the proteins; 2) large single channel conductance of OSCA channels, 3) much larger current at positive voltages for OSCA channels. Our control experiment showed that WT TMEM16A at 0 Ca2+ did not give rise to any current (Fig. 3d), further demonstrating that the large current was not due to liquid junction potential. For the OSCA recordings, we also did not observe current in mock-transfected cells, further excluding the possible interference of liquid junction potential (Response Fig. 1)

      - Related, authors could consider adding some evidence using selective pharmacology to support the conclusions that the observed currents arise from TMEM or OSCA channels. 

      Thanks for the suggestion. As mentioned above, we have added experiments with Ani9, a specific inhibitor of TMEM16A, in Supplementary Fig. 3. We found that Ani9 robustly attenuated both ion channel and phospholipid scramblase activities for the TMEM16A I546K gain-of-function mutant. This is also consistent with our previous publication (10.1038/s41467-019-09778-7), where Ani9 efficiently inhibited the TMEM16A L534K mutant scramblases. Additionally, we have provided mock controls (Response Fig. 1, Fig. 6d, e) to show that the observed currents are indeed attributable to OSCA1.2 and TMEM63A.

      Reviewer #3 (Recommendations For The Authors): 

      Given that the authors postulate that the introduction of a positive charge via the lysine side chain is essential to the constitutive activity of these proteins, additional mutation controls for side chain size (e.g. glutamine/methionine) or negative charge (e.g. glutamic acid), or a different positive charge (i.e. arginine) would have strengthened their argument. To more comprehensively understand the TM4/TM6 interface, mutations at locations one turn above and one turn below could be studied until there is no phenotype. In addition, the equivalent mutations on the TM6 side should be explored to rule out the effects of conformational changes that arise from mutating TM4 and to increase the strength of evidence for the importance of side-chain interactions at the TM6 interface. 

      We appreciate these great suggestions which were shared by multiple reviewers. We have included our previous responses below.

      “Response to reviewers 2 & 3: In our 2019 paper (10.1038/s41467-019-09778-7), we have systematically tested the side chain properties at the inner activation gate of TMEM16F on lipid scrambling activity (Response Fig. 6) and, since then, these results have been supplemented by others as well (10.1038/s41467-022-34497-x). In summary, mutating the inner activation gate residues to polar or charged residues generally results in constitutively activated scramblases without requiring Ca2+ (Fig 5a in 10.1038/s41467-019-09778-7). Because these residues form a hydrophobic gate, introducing smaller side chains via alanine substitution are also gain-of-function with the Y563A mutant as well as the F518A/Y563A/I612A variant being constitutively active (Fig. 3a in 10.1038/s41467-019-09778-7). Meanwhile, mutating these gate residues to hydrophobic amino acids causes no change for I612W, a slight gain-of-function for F518W, slight loss-of-function of F518L, and complete loss-of-function for Y563W (Fig. 4b in 10.1038/s41467-01909778-7). These findings clearly demonstrate that the side-chain properties are critical for regulating the gate opening. Charged mutations including lysine and glutamic acid are the most effective to promote gate opening (Fig 5a in 10.1038/s41467-019-09778-7).

      Similarly, others have observed that side chain hydropathy at the F518 site in TMEM16F correlates with shifts in the Ca2+ EC50 (Fig. 2 of 10.1038/s41467-022-34497-x). Note that this publication resolved the structure of the TMEM16F F518H mutant, revealing a previously unseen conformation that we have highlighted in Supplementary Fig. 1e and discussed in lines 235-238. Please also see our response to Reviewer #1 above, where we discuss discoveries in model transmembrane helical peptide systems showing that transbilayer lipid flipping rates correlate with side chain hydropathy (Author response image 3), distance between stacked hydropathic residues (schematic in 10.1248/cpb.c22-00133), and even helical angle between stacked side chains (not show). 

      Following the reviewers’ suggestions, we have tested additional mutations in alternative locations and with different side chains.  

      (1) We have added data for TMEM16F I521A and I521E to demonstrate a similar effect of alternative side chains to what has previously been reported by us and others. We found that I521A failed to show spontaneous scrambling activity (Supplementary Fig. 2), yet I521E (Supplementary Fig. 2) is a constitutively active lipid scramblase, similar to I521K (Fig. 1). This further demonstrates that gate disruption correlates with the side chain hydropathy and that this site lines a critical gating interface.

      (2) We also added lysine mutations two helical turns below the conserved inner activation gate for TMEM16F T526 (Fig. 1), TMEM16A E551 (Fig. 3). We found that there is indeed a lower limit for the observed effect in TMEM16, where lysine mutations no longer induce spontaneous lipid scrambling activity. This indicates that when TM 4/6 interaction is weaker toward intracellular side (Figs. 1a, 3a), the TM 4 lysine mutation loses the ability to promoting lipid scrambling by disrupting the TM 4/6 interface to enable clamshell-like opening of the permeation pathway. 

      (3) We added a TMEM16F lysine mutation on TM 6 at residue I611 (Fig. 2). Similar to I612K (Response Fig. 6), I611K also leads to spontaneous lipid scrambling and enhanced channel activity in the absence of calcium (Fig. 2). This shows that charged mutations along TM 6 can also promote lipid scrambling, strengthening our model that hydrophobic interactions along the TM 4/6 interface are critical for gating and lipid permeation.”

      The experiments for OSCA1.2 osmolarity effects on gating and scramblase in Figure 4 could be improved by adding different levels of osmolarity in addition to time in the hypotonic solution.

      We thank the reviewer for this excellent suggestion. We extensively tested this idea and found evidence (Response Fig. 10) that intermediate osmolarity (220 and 180 mOso/kg) also can enhance the scramblase activity of the A439K mutant, albeit to a milder extent compared to 120 mOso/kg stimulation. This suggests that swellinginduced membrane stretch may proportionally induce A439K activation and lipid scrambling. Due to the relatively mild sensitivity of OSCA to osmolarity and the variations induced by the experimental conditions, we believe it is better to not include this data to avoid overclaiming. We hope the reviewer would agree. 

      Author response image 7.

      AnV intensities of WT- and A439K-transfected cells after 10 minutes of hypotonic stimulation at the listed osmolarities.

      Some confocal images appear to be rotated relative to each other (e.g. Figures 2b and 3b).

      Thank you for identifying these errors, they are corrected in the revision.

    1. eLife Assessment

      This valuable study proposes that protein secreted by colon cancer cells induces cells with Paneth-like properties that favor colon cancer metastasis. The evidence supporting the conclusions is strong but would benefit from more direct experiments to test the functional role of Paneth-like cells and to monitor metastasis from colon tumors. The work will be of interest to researchers studying colon cancer metastasis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells.

      Strengths:

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC.

      Weaknesses:

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses.

      Main comments

      Novelty:<br /> The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234)

      Mouse data:<br /> (a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneth cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis.<br /> (b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was low (17 cells, Fig.3), and assuming that these cells are driving the differences seems somewhat far-fetched.<br /> (c) Fig. 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis.<br /> (d) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.?

      Human data:<br /> Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage?

      Bioinformatic analysis<br /> GEO repositories remain not open (at the time of the re-review) and SRA links for raw data are still unavailable. Without access to raw data, it is not possible to verify the analyses or fully assess the results. A part of the article was made by re-analyzing public data so the authors should make even the raw available and not just the count tables

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9.

      Strengths:

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context.

      Remaining Weaknesses after revision:

      (1) The authors have effectively explained the regulation of HNF4A at both mRNA and protein levels. To further strengthen their findings, I recommend using CRISPR technology to generate DKK2 and HNF4A double knockout organoids. This approach would allow the authors to investigate whether the AKP liver metastasis is restored in the double knockout condition. Such an experiment would provide more direct evidence that HNF4A protein stabilization is the crucial mechanism for liver metastasis suppression following DKK2 knockout.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells. 

      Strengths: 

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC. 

      Weaknesses: 

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses. Additionally, methods and statistical analyses require further clarification. 

      Main comments: 

      (1) Novelty 

      The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234). 

      (2) Mouse data 

      a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneths cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis. 

      We value the reviewer’s comment that the splenic injection model cannot represent metastasis from the primary tumors, intravasation and extravasation. Therefore, we performed the orthotopic transplantation of AKP and KO organoids into the colon directly then, tested metastasis of cancer.

      Author response image 1.

      Primary tumor formation and liver metastasis by orthotopic transplantation of AKP or KO colon cancer organoids. 6-8 week-old male C57BL/6J mice were treated with 2.5% DSS dissolved in drinking water for 5 days, followed by regular water for 2 days to remove gut epithelium. After recovery with the regular water, the colon was flushed with 1000 μl of 0.1% BSA in PBS. Then, 200,000 dissociated organoid cells in 200 μl of 5% Matrigel and 0.1% BSA in PBS were instilled into the colonic luminal space. After infusion, the anal verge was sealed with Vaseline. 8 weeks after transplantation, the mice were sacrificed to measure primary tumor formation and liver metastasis.

      As a result, 4 out 6 mice in the control group successfully formed colorectal primary tumors whereas only 2 out 6 mice showed primary tumor formation in the KO group (Author response image 1A). The size of tumors was reduced by about half (10-12 mm to 5-7 mm). Only one AKP mouse developed metastasized nodules in the liver (Author response image 1B). Next, to measure the circulating tumor cells, we harvested at least 500 ul of bloods from the portal vein and then analyzed tdTomato-positive tumor cells (Author response image 2). Flow cytometry analysis of PBMCs showed the presence of tdTomatohiCD45- cells as well as tdTomatomidCD45+ cells in 2 out of 6 AKP mice, while no tdTomato-positive cells were observed in the PBMCs of KO organoid-transplanted mice.

      Due to the limited numbers of mice showed primary and metastatic tumor formation, we cannot provide a statistic analysis of DKK2-mediated metastasis. However, our revised data indicate a trend that DKK2 KO reduced primary tumor formation, the number of circulating tumor cells and liver metastasis. This trend is consistent with our previous report in the iScience paper, which showed that DKK2 KO reduced AOM/DSS-induced polyp formation about 60 % and decreased metastasis in the splenic injection model system in this manuscript. Further studies are necessary to confirm this trend and to provide the underlying mechanisms of intravasation and extravasation of circulating tumor cells.

      Author response image 2.

      Flow cytometry analysis of tdTomato+ circulating colon tumor cells in PBMCs. PBMCs were harvested via the portal vein after euthanasia. CD45 and tdTomato were analyzed by flow cytometry.

      b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was strikingly low (17 cells, Figure 3), and assuming that these cells are driving the differences seems somewhat far-fetched. Adding to this concern is inappropriate gating in the flow plot shown in Figure 6. This should be addressed experimentally and in the interpretation of data. 

      We appreciate for reviewer’s comments to clarify this point. Since the number of LYZ+ cells is low in our scRNA-seq analysis, we performed flow cytometry in Figure 6H showing the clear population expressing LYZ in the same splenic injection model of metastasis. Figure 6H is a representative image of triplicates for each group and we performed this experiment three times, independently. As suggested, we changed the graph format and updated the gating and statistical analysis in Fig 6H and 6I. This in vivo result confirmed our in vitro data showing that DKK2 KO reduced LYZ+ cells while increase the HNF4α1 proteins.

      c) Figures 3, 5, and 6 show the individual gene analyses with unclear statistical data. It seems that the p-values were not adjusted, and it is unclear how they reached significance in several graphs. Additionally, it was not stated how many animals per group and cells per animal/group were included in the analyses. 

      In Fig. 3, mouse scRNA-seq data were generated from pooled cancer samples from 5 animals per group. The Wilcoxon signed-rank test was performed for each gene and/or regulon activity. Since multiple testing adjustments were not performed, a p-value adjustment is neither needed nor applicable..

      In Fig. 5, human data were analyzed. Cells from the same sample are dependent, but differential gene expression (DEG) analysis typically calculates statistics under the assumption that they are independent. This assumption may explain the low p-values observed in our data. To address this issue, we applied pseudobulk DEG analysis to our human single-cell data. Even after correcting for statistical error, we confirmed that the genes of interest still exhibited significantly different expression patterns (Author response image 3).

      Author response image 3.

      Pseudobulk DEG analysis confirmed the differential expression genes of interest.

      In Fig.6H-6I, the number of animals per group is provided in the figure legend.

      d) Figure 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis. 

      Sox9 is a well-established marker gene for Paneth cell formation in the gut. Therefore, overexpression or knockout of the Sox9 gene would result in either an increase or decrease in Paneth cells in the organoids. We believe that the suggested experiments fall outside the scope of this manuscript. Instead, we demonstrated the change in the Paneth cell differentiation marker, Sox9, in the presence or absence of DKK2.

      e) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.? 

      We appreciate the reviewer’s insights and perspectives. Regarding liver survival, it is well known that stem cell niche formation is a critical step for the outgrowth of metastasized cancer cells (Fumagalli et al. 2019, Cell Stem Cell). LYZ+ Paneth cells are recognized as stem cell niche cells in the intestine, and human scRNA-seq data have shown that LYZ+ cancer cells express stem cell niche factors such as Wnt and Notch ligands. To determine whether LYZ+ cancer cells act as stem cell niche cells, we performed confocal microscopy to assess whether LYZ+ cancer cells express WNT3A and DLL4 in AKP organoids (Author response image 4). The results show that LYZ labeling co-localizes with DLL4 and WNT3A expression, while the organoid reporter tdTomato is evenly distributed. Additionally, our in vitro and in vivo data indicate that DKK2 deficiency leads to a reduction of LYZ+ cancer cells, which may contribute to stem cell niche formation. Based on these findings, we propose that DKK2 is an essential factor for stem cell niche formation, which is required for cancer cell survival in the liver during the early stages of metastasis. Although our revised data confirmed the trend that DKK2 deficiency decreases liver metastasis, we have not yet determined whether DKK2 is involved in extravasation. This research topic should be addressed in future studies.

      Author response image 4.

      Confocal microscopy analysis for lysozyme (LYZ) and Paneth cell-derived stem cell niche factors, WNT3A and DLL4 in AKP colon cancer organoids.

      The method is described in the supplemental information. The list of antibodies used: DLL4 (delta-like 4) Polyclonal Antibody (Invitrogen, PA5-85931), WNT3A Polyclonal Antibody (Invitrogen, PA5-102317), Goat anti-Rabbit IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor™ 488 (Invitrogen, A-11008), Anti-Lysozyme C antibody (H-10, Santacurz, sc-518083), Goat anti-Mouse IgM (Heavy chain) Secondary Antibody, Alexa Fluor™ 647 (Invitrogen, A-21238).

      (3) Human data 

      Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage? 

      The human data were useful in identifying the presence of LYZ+ cancer cells with Paneth cell properties. However, due to the limited number of late-stage patient samples with high DKK2 expression, the results were not statistically significant. Nevertheless, the trend suggests a positive correlation between DKK2 expression and the malignant stage of CRC.

      (4) Bioinformatic analysis 

      The authors did not provide sufficient information on bioinformatic analyses. The authors did not include information about the software, cutoffs, or scripts used to make their analyses or output those figures in the manuscript, which challenges the interpretation and assessment of the results. Terms like "Quantitative gene expression analyses" (line 136) "visualized in a Uniform Approximation and Projection" (line 178) do not explain what was inputted and the analyses that were executed. There are multiple forms to align, preprocess, and visualize bulk, single cell, ATAC, and ChIP-seq data, and depending on which was used, the results vary greatly. For example, in the single-cell data, the authors did not inform how many cells were sequenced, nor how many cells had after alignment and quality filtering (RNA count, mt count, etc.), so the result on Paneth+ to Goblet+ percent in lines 184 and 185 cannot be reached because it depends on this information. The absence of a clustering cutoff for the single-cell data is concerning since this greatly affects the resulting cluster number (https://www.nature.com/articles/s41592-023-01933-9). The authors should provide a comprehensive explanation of all the data analyses and the steps used to obtain those results. 

      We apologize for the insufficient information. Below, we provide detailed information on the data analyses, which are also available in the GEO database (Bulk RNA-seq: GSE157531, ATAC-seq: GSE157529, ChIP-seq: GSE277510). Methods are updated in the current version of supplemental information.

      (5) Clarity of methods and experimental approaches 

      The methods were incomplete and they require clarification. 

      We’ve updated our methods as requested by the reviewer.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9. 

      Strengths: 

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context. 

      Weaknesses: 

      In this work by Shin et al., the authors expand upon prior work regarding the role of Dickkopf-2 in colorectal cancer (CRC) progression and the necessity of a Paneth-like population in driving CRC metastasis. The general topic of metastatic requirements for colon cancer is of general interest. However, much of the work focuses on characterizing cell populations in a mouse model of hepatic outgrowth via splenic transplantation. In particular, the concept of Paneth-like cells is primarily based on transcriptional programs seen in single-cell RNA sequencing data and needs more validation. Although including human samples is important for potential generality, the strength could be improved by doing immunohistochemistry in primary and metastatic lesions for Lyz+ cancer cells. Experiments that further bolster the causal role of Paneth-like CRC cells in metastasis are needed. 

      Recommendations for the Authors:

      Reviewing Editor (Recommendations for the Authors): 

      Here we note several key concerns with regard to the main conclusions of the paper. Additional experiments to directly address these concerns would be required to substantially update the reviewer evaluation. 

      (1) Demonstration of a causal role of Paneth-like cells in CRC metastasis, for example by sorting the Paneth-like cells - either by the markers they identified in the subsequent single cell or by scatter - to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with tumorigenicity and engraftment. 

      We sincerely appreciate the reviewing editor’s comment. First, as previously reported (Shin et al., iScience 2021), there is no difference in proliferation between WT and KO during in vitro organoid culture or in vivo colitis-induced tumors. However, DKK2 deficiency led to morphological changes, which we analyzed using bulk RNA-seq. As described in the manuscript, Paneth cell marker genes, such as Lysozymes and defensins, were significantly reduced in DKK2 KO AKP organoids.

      Due to the nature of these markers, it is technically challenging to isolate live LYZ+ cancer cells. To address this issue in the future, we plan to develop organoids that express a reporter gene specific for Paneth cells. In this manuscript, we demonstrated a correlation between DKK2 and the formation of LYZ+ cancer cells. In both the splenic injection model (Fig. 1) and the orthotopic transplantation model (Fig. R1-R2), we observed that transplantation of cancer organoids with reduced numbers of LYZ+ cells (KO organoids) led to decreased metastatic tumor formation. The number of LYZ+ cells in KO-transplanted mice remained low in liver metastasized tumor nodules (Fig. 6H-I6). Immunohistochemistry further confirmed that LYZ+ cancer cells were barely detectable in KO samples (Author response image 5). These data suggest that DKK2 is essential for the formation of LYZ+ cancer cells, which are necessary for outgrowth following metastasis.

      Author response image 5.

      Histology of Lysozyme positive cells in metastasized tumor nodules in liver of colon cancer organoid transplanted mice. Immunohistochemistry of Lysozyme positive Paneth-like cells cells in liver metastasized colon cancer (Upper panels, DAB staining). Identification of tumor nodules by H&E staining (lower panels, Scale bar = 100 μm). Magnified tumor nodules are shown in the 2nd and 3rd columns (Scale bar = 25 μm). Arrows indicate Lysozyme positive Paneth like cells in tumor epithelial cells. Infiltration of Lysozyme positive myeloid cells is detected in both AKP and KO tumor nodules. AKP: Control colon cancer organoids carrying mutations in Apc, Kras and Tp53 genes. KO: Dkk2 knockout colon cancer organoids

      (2) Further characterization of Lyz+/Paneth-like cells to further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? 

      We appreciate the reviewing editor’s comment. In response, we performed confocal microscopy analysis to examine the protein levels of LYZ, Wnt3A, and DLL4 in AKP colon cancer organoids (Author response image 4). The data presented above show that LYZ+ cancer cells express both Wnt3A and DLL4, suggesting that LYZ+ colon cancer cells may function similarly to Paneth cells, which are stem cell niche cells. Furthermore, using the Panglao database, we demonstrated that LYZ+/Paneth-like cells exhibit typical Paneth cell properties in human scRNA-seq data (Fig. 4 and Fig. 5). These findings suggest that LYZ+ colon cancer cells possess Paneth cell properties.

      (3) Experiments to test metastasis, ideally from orthotopic colonic tumors, to ensure phenotypes aren't restricted to the splenic model of hepatic colonization and outgrowth used at present. 

      We are in agreement with the reviewing editor and reviewers, which is why we conducted the orthotopic transplantation experiment. However, we encountered challenges in establishing this model effectively. After multiple trials, we observed that many mice did not form primary tumors, and the variability, particularly in metastasis, was difficult to control. Only a few AKP-transplanted mice developed liver metastasis. The representative revision data have been provided above. Nevertheless, we believe that this model needs further improvement and optimization to reliably study metastasis originating from primary tumors.

      (4) To generalize claims to human cancer, the authors should test whether loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      We agree with your point, and this will be addressed in future studies.

      (5) Clarifying inconsistencies regarding effect of DKK2 loss on HNF4A (Figure 1E vs Figure 6I). 

      In Figure 1 E, we measured the mRNA levels of HNF4A in metastasized foci by qPCR while in Figure 6I, we measured the protein level of HNF4A by flow cytometry. Recent studies, including our previous report, have shown that HNF4A protein levels are regulated by proteasomal degradation mediated by pSrc (Mori-Akiyama et al. 2007, Gastroenterology, Bastide et al. 2007, Journal of Cell Biology, Shin et al. 2021 iScience). Consequently, while the mRNA levels remained unchanged in Fig. 1E, we observed a reduction of HNF4A protein levels in Figure 6I.

      (6) Addressing concerns about statistics and reporting as outlined by Reviewer 1. 

      Thank you very much for your assistance in improving our manuscript. The updates have been incorporated as detailed above.

      These are the central reviewer concerns that would require additional experimentation to update the editorial summary. Other concerns should be addressed in a revision response but do not require additional experimentation. 

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      • Do Dkk2-KO organoids grow normally?

      Yes, in vitro.

      Since the authors reported on the effects of Dkk2 in the induction/maintenance of the Paneth cell niche, changes in AKP organoid numbers of growth rate between Dkk2-WT and KO would be an expected outcome. 

      Disruption of Paneth cell formation in normal organoids is expected to alter growth. However, DKK2 KO in colon cancer organoids with mutations in the Apc, Kras, and Tp53 genes exhibits growth rates and organoid sizes similar to those of WT AKP controls. In contrast to in vitro observations, we observed a significant reduction in metastasized tumor growth in vivo. Further analyses of factors derived from LYZ+ cancer cells will help address the discrepancy in DKK2's absence between in vitro and in vivo conditions.

      • Figure 1: 

      - Panel C: The legend indicates what c.p. stands for.

      c.p.m. stands for count per minutes for in vivo imaging analysis. This has been updated in the Figure legend.

      - Panel E: Please comment on the possible underlying reasons for the lack of change in HNF4a1 levels. 

      This has been updated in response to the reviewing editor’s comment (5) above.

      - Panel E: Number of mice from which isolated cancer nodules were harvested. 

      Total mice per group were 5. This has been updated in the legend.

      • Figure 2: 

      - Suggestion: Panel A should be presented in Figure 1 since Dkk2 KO organoids are already used in Figure 1. 

      We added this to present the recovery of DKK2 by adding recombinant DKK2 proteins in Fig.2.

      - Panel B: Please explain why these genes are marked in blue. 

      It has been described in the legend. “Paneth cell marker genes are highlighted as blue circles (AKP=3 and KO=5 biological replicates were analyzed).”

      • Figure 3: 

      - Indicate the number of cells recovered from AKP vs. KO mice (since liver metastasis was already reduced in KO mice). This should be shown in a UMAP. 

      - Panel A: 4th line in the pathways, correct "Singel" typo. 

      We appreciate your correction. It has been fixed.

      - Panel A: There are multiple versions of PanglaoDB with different markers; a list of all that was used to determine cell type should be provided. 

      - Panel C: Bar value for the WNT pathway is not displayed, and there is no legend to indicate the direction of the analysis (that is, AKPvsKO or KOvsAKP). 

      It is KOvsAKP, described in the figure legend.

      - Panel C: Ingenuity pathway analysis is not a good tool to look at this type of result because it does not include the gene fold changes in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or fold changes - recommend substituting any type of GSEA analysis, such as fgsea. -o Panel D: the term "Patient" to refer to mice is confusing. Use "Mice" or "Treatment" or "Condition" instead. 

      Corrected

      - Panel D: Information about the number of mice per group, cells per animal (or liver let) used, and additional clarification about the statistical analysis used is required, as differences shown in this panel appear subtle given the standard variation in each group. Box plots need to show individual/raw values. 

      • Figure 4: 

      - Panel E: It would be helpful to show the cutoff lines for the Paneth cell score and Lyz expression in the graphs. 

      It has been updated in response to the reviewer’s request.

      • Figure 5: 

      - Panel B: again, information about the number of "patients" or cells used and clarification about the statistical analysis used is required as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values

      It has been updated in response to the reviewer’s request.

      • Figure 6: 

      - Panel A: Add a legend to inform the direction of the process (e.g., red, activation, blue, repression). We noticed the Yap1 bar data had no color. Is there a reason for that? Please explain this point in the revised manuscript. 

      Red color added for the Yap1.

      - Panel A: Ingenuity pathway analysis is not a good tool to look at this type of results because it does not include the gene Foldchanges in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or not. I recommend substituting any type of GSEA analysis, such as fgsea. 

      - Panels A&B: Again, only p-value scores were provided, while fold changes are necessary to define the ratio of presence increase of normal vs. AKP. 

      - Panel D: No raw or pre-processed ChIP-seq data was provided. Additionally, please indicate exactly the genome location (it seems the image was edited from a raw made on UCSC genome browser-it should be remade by adding coordinates and other important information (genes around, epigenetic, etc.). 

      - Panel H/I: Flow cytometry gating is inappropriate, as its catching cells are negative for LYZ in both AKP and KO cells, resulting in an overestimation of the number of Lyz cells. Gating should specifically select very few LYZ-positive cells in the top/left quadrant. 

      The updates have been made, and the statistical data have been re-analyzed.

      - Panel J: Information about the number of animals/organoids or cells used and clarification about the statistical analysis used is required, as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values. 

      • Overall: 

      - A supplementary table with all the sequenced libraries and their depth, read length/cell count should be provided.

      All of the information is now available in the GEO database. We used previously published human epithelial datasets for human single cell analysis (Joanito*, Wirapati*, Zhao*, Nawaz* et al, Nat Genetics, 2022, PMID: 35773407).

      - The Hallmark Geneset used is very broad, and the authors should confirm the results on GO bp. 

      Using Gene Ontology biological processes (GO bp), we observed that glycolysis-related genes were enriched in our newly described cell population, although the adjusted p-value did not exceed 0.05.

      Author response image 6.

      GSEA with GOBP pathway highlighted glycoprotein and protein localization to extracellular region, both of which are related Paneth cell functions. Paneth cells secrete α-defensins, angiogenin-4, lysozyme and secretory phospholipase A2. The enriched glycoprotein process and protein localization not extracellular region reflect the characteristics of Paneth cells. 

       

      - qPCR is not a good way to confirm sequencing results; while PCR data is pre-normalized, sequencing is normalized only after quantification, so results on 6 E and F should be shown on the sequencing data. 

      The expression level of Sox9 is relatively low. In our bulk RNA-seq data, the averages for Sox9 in AKP versus DKK2 KO are 28.2 and 25.1, respectively. While there is a similar trend, the difference is not statistically significant in this dataset, and we did not include an experimental group for reconstitution. Therefore, we conducted qPCR experiments for the reconstitution study by adding recombinant DKK2 (rmDKK2) protein to the culture. Furthermore, it is well established that Sox9 is an essential transcription factor for the formation of LYZ+ Paneth cells. Based on this, we assessed the levels of LYZ and Sox9 using qPCR and confocal microscopy in the presence or absence of DKK2.

      • Edits in the text: 

      - There are several typographical errors. Specific suggestions are provided below. 

      - Line 43: "Chromatin immunoprecipitation followed by sequencing analysis," state analysis of what cells before continuing with "revealed..." revealed... 

      - Line 77: Recent findings have identified 

      - Line 138: were reduced in KO tumor samples à rephrase to clarify "KO-derived liver tumors" 

      - Line 167: Recombinant mouse DKK2 protein treatment in KO organoids partially rescued this effect. Add "partially" since adding rmDkk2 didn't fully restore Lyz1 and Lyz2 levels. 

      - Line 185-187: the authors should not reference Figure 6 because it has not been introduced yet. 

      - Line 198-199: The authors claimed a correlation between Dkk2 expression and Lgr5 expression; however, the graph presented in Figure 3B does not indicate this. The R-value was 0.11, which does not indicate a correlative expression between these genes. 

      - Line 232-233: the authors need to show any connection to Dkk2 gene expression in human samples in order to draw that conclusion. 

      - Line 294: expression, leading to the formation 

      - Line 347: Wnt ligand (correct Wng typo) 

      We have modified our manuscript in accordance with the reviewer’s suggestions.

      Reviewer #2 (Recommendations For The Authors): 

      Specific criticisms/suggestions: 

      Author claim 1: Dkk2 is necessary for liver metastasis of colon cancer organoids. <br /> This model is one of hepatic colonization and eventual outgrowth and not metastasis. Metastasis is optimally assessed using autochthonous models of cancer generation, with the concomitant intravasation, extravasation, and growth of cancer cells at the distant site. The authors should inject their various organoids in an orthotopic colonic transplantation assay, which permits the growth of tumors in the colon, and they can then identify metastasis in the liver that results from that primary cancer lesion (i.e., to better model physiologic metastasis from the colon to liver). 

      The data of orthotopic colonic transplantation data has been provided above (Author response images 1 and 2).

      Author claim 2: DKK2 is required for the formation of lysozyme-positive cells in colon cancer. 

      It would greatly strengthen the authors' claim if supraphysiologic or very high amounts of DKK2 enhance CRC organoid line engraftment ( i.e., the specific experiment being pre-treatment with high levels of DKK2 and immediate transplantation to see a number of outgrowing clones). If DKK2 is causal for the engraftment of the tumors, increased DKK2 should enhance their capacity for engraftment. 

      Paneth cells have physical properties permitting sorting and are readily identifiable on flow cytometry. The authors should demonstrate increased tumorigenicity and engraftment by sorting the Paneth-like cells-either by the markers they identified in the subsequent single cell or by scatter to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with engraftment potential. 

      Further characterization of the Paneth-like cells would help further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? Immunofluorescence, sorting, or western blots would all be reasonable methods to assess protein levels in the sorted population. 

      This has been performed and provided above (Author response images 1 and 3)

      Author claim 3: Lyzosome (LYZ)+ cancer cells exhibit Paneth cell properties in both mouse and human systems. 

      For the claim to be general to human cancer, the author should demonstrate that loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      The claims on the metabolic function of Paneth-like cells need more clarification. Do the cancer cells with Paneth features have a distinct metabolic profile compared to the other cell populations? The authors should address this through metabolic characterization of isolated LYZ+ cells with Seahorse or comparison of Dkk2 KO to WT organoids (i.e., +/-LYZ+ cancer cell population). 

      To address this question, we need to develop organoids with a Paneth cell reporter gene. We appreciate the reviewer’s comment, and this should be pursued in future studies.

      Author claim 4: HNF4A mediates the formation of Lysozyme (Lyz)-positive colon cancer cells by DKK2. 

      The authors implicate HNF4A and Sox9 as causal effectors of the Paneth-like cell phenotype and subsequent metastatic potential. There appears to be some discordance regarding the effect of DKK2 loss on HNF4A. In Figure 1E, the authors show that gene expression in metastatic colon cancer cells for HNF4A in DKK2 knockout vs AKP control is insignificant. However, in Figure 6I, there is a highly significant difference in the number of HNF4A positive cells, more than a 3-fold percentage difference, with a p-value of <0.0001. If there is the emergence of a rare but highly expressing HNF4A cell type that on aggregate bulk expression leads to no difference, but sorts differentially, why is it not identified in the single-cell data set? These data together are highly inconsistent with regards to the effect of DKK2 on HNF4A and require clarification. 

      Previous studies have demonstrated that HNF4A is regulated by proteasomal degradation mediated by pSrc. As a result, the mRNA level of HNF4A remains unchanged, while the protein level is significantly reduced in colon cancer cells. DKK2 KO leads to decreased Src phosphorylation, resulting in the recovery of HNF4A protein levels. This explains why HNF4A cannot be detected in scRNA-seq datasets, which measure mRNA. We have shown this in our previous report. In this manuscript, based on ChIP-seq data using an anti-HNF4A monoclonal antibody, as well as confocal microscopy and qPCR data for the Sox9 gene, we propose that HNF4A acts as a regulator of cancer cells exhibiting Paneth cell properties.

    1. eLife Assessment

      The ingenious design in this study achieved the observation of 3D cell spheroids from an additional lateral view and gained more comprehensive information than the traditional one angle of imaging. This extended the methods to investigate cell behaviors in the growth or migration of tumor organoids in a time-lapse manner and these extensions should be important to the field. The authors provide compelling evidence that the methods work as described.

    2. Reviewer #1 (Public review):

      Summary:

      The author developed a new device to overcome current limitations in the imaging process of 3D spheroidal structures. In particular, they created a system to follow in real-time tumour spheroid formation, fusion and cell migration without disrupting their integrity. The system has also been exploited to test the effects of a therapeutic agent (chemotherapy) and immune cells.

      Comments on revised version:

      The authors well addressed all my concerns. It is a wonderful design to view the 3D cell spheroids.

    3. Reviewer #2 (Public review):

      Summary:

      The author developed a new device to overcome current limitations in the imaging process of 3D spheroidal structures. In particular, they created a system to follow in real-time tumour spheroid formation, fusion and cell migration without disrupting their integrity. The system has also been exploited to test the effects of a therapeutic agent (chemotherapy) and immune cells.

      Strengths:

      The system allows the in situ observation of the 3D structures along the 3 axes (x,y and z) without disrupting the integrity of the spheroids; in a time-lapse manner it is possible to follow the formation of the 3D structure and the spheroids fusion from multiple angles, allowing a better understanding of the cell aggregation/growth and kinetic of the cells.

      Interestingly the system allows the analysis of cell migration/ escape from the 3D structure analysing not only the morphological changes in the periphery of the spheroids but also from the inner region demonstrating that the proliferating cells in the periphery of the structure are more involved in the migration and dissemination process. The application of the system in the study of the effects of doxorubicin and NK cells would give new insights in the description of the response of tumor 3D structure to killing agents.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:<br /> The ingenious design in this study achieved the observation of 3D cell spheroids from an additional lateral view and gained more comprehensive information than the traditional one angle of imaging, which extensively extended the methods to investigate cell behaviors in the growth or migration of tumor organoids in the present study. I believe that this study opens an avenue and provides an opportunity to characterize the spheroid formation dynamics from different angles, in particular side-view with high resolution, in other organoids study in the future.

      Thank you for your positive response.

      (1) Figure 1A and B, the images of "First surface mirror" are unclear. The authors should capture a single image of "First surface mirror" by high resolution. The corresponding information on the mirror should also be included in the manuscript.

      Thank you for your kind reminder. To make the content more intuitive, we have added the clear image of the first surface mirror to Fig. 8C.

      (2) The spheroids sizes in this study are 200-300 um. Whether this size is the limitation by the device? And which is the best size by the device? The size of spheroids suitable for this device should be characterized.

      Thank you very much for your question. As shown in Fig. 1D, the imaging principle indicates that the sample size is theoretically not affected by the device. For larger biological samples or samples exceeding the size of a 35 mm petri dish, a larger container and first surface mirror can be used. However, in practice, it is not recommended to use this device with laboratory microscopes for samples exceeding 4 mm in size.

      Firstly, the working distance of the microscope objective lens is limited by its factory specifications. Secondly, this device is designed to fit a 35 mm petri dish, and the first surface mirror can capture a maximum sample size of 4.5 mm. Fortunately, this size is more than sufficient for cell spheroids.

      (3) Figure 2F. The scale bar covered the imaging and made it unclear. It was difficult to read and evaluate the quality of the images. And it seemed no obvious difference between 5 cm and 15 cm. Please carefully check this data.

      Thank you very much for your question. First, we checked the image scale and coverage issues and made adjustments in the revised version. Secondly, when the light source was placed 5 cm from the sample, the sample itself appeared relatively clear, but the boundary with the background was less distinct. At a distance of 15 cm, the light source not only illuminated the sample effectively but also made the distinction between the spheroid and the background more apparent. To ensure consistency and stability in image capture, we ultimately selected a 15 cm distance between the sample and the light source for imaging.

      (4) Figure 3A. It seemed that the seeding cells were initially located as a ring with a hole in the center. Why do not seed the cells evenly in the well?

      Thank you very much for your question. First, the cells were added as a suspension, naturally settling at the bottom of the well during imaging. When seeded in agarose wells, the cells spontaneously aggregated over time, as shown in sVideo4. Our previous study showed that the use of agarose wells offers high fault tolerance and efficiency in cell spheroid culture (Pan, R. et al. Biofabrication, 2024, 16, 035016).

      (5) I just wonder whether this design could be extended to the fluorescent imaging and how do it. Please give an expectation in the discussion.

      Thank you very much for raising this key question regarding the imaging capability of this device. As shown in Author response image 1A, due to the specific nature of fluorescence imaging light sources, it is feasible to perform fluorescence imaging of cell spheroids using a microscope, including the built-in light source. Using 4′,6-diamidino-2-phenylindole (DAPI) staining, we captured fluorescence images of cell spheroids in both bottom-view and side-view modes (Author response image 1B), demonstrating that side-view observation of cell spheroids with this device is indeed feasible.

      Author response image 1.

      (A) The schematic diagram of the principle of fluorescence images of spheroids using an inverted microscope with the side-view observation petri dish/device. (B) Bottom-view and side-view images of a 3D cell spheroid. Scale bar = 500 µm.

      (6) The first sentence in the introduction. "Three-dimensional (3D) spheroids" should be "Three-dimensional (3D) tumor spheroids".

      (7) P11, Line 7, "both lethal and lethal" should be corrected.

      (8) The writing and grammar should be polished.

      Thank you very much for your suggestions to improve the quality of the article. We have made the necessary revisions in the updated version.

      Reviewer #2:

      Summary:

      The author developed a new device to overcome current limitations in the imaging process of 3D spheroidal structures. In particular, they created a system to follow in real-time tumour spheroid formation, fusion and cell migration without disrupting their integrity. The system has also been exploited to test the effects of a therapeutic agent (chemotherapy) and immune cells.

      Strengths:

      The system allows the in situ observation of the 3D structures along the 3 axes (x,y and z) without disrupting the integrity of the spheroids; in a time-lapse manner it is possible to follow the formation of the 3D structure and the spheroids fusion from multiple angles, allowing a better understanding of the cell aggregation/growth and kinetic of the cells.

      Interestingly the system allows the analysis of cell migration/ escape from the 3D structure analyzing not only the morphological changes in the periphery of the spheroids but also from the inner region demonstrating that the proliferating cells in the periphery of the structure are more involved in the migration and dissemination process. The application of the system in the study of the effects of doxorubicin and NK cells would give new insights in the description of the response of tumor 3D structure to killing agents.

      We sincerely thank you for your detailed and supportive review of our manuscript. Your recognition of our system’s capabilities for in situ observation of 3D structures along multiple axes, as well as its potential applications in studying therapeutic effects, is highly encouraging. Your comments on the advantages of this system for analyzing cell migration, morphological changes, and responses to therapeutic agents are especially appreciated.

      Thank you again for your thoughtful feedback and for highlighting the contributions of our work. Your insights have been invaluable in refining the focus and clarity of our study, and we hope that our revisions meet your expectations.

    1. eLife Assessment

      This useful work reveals differential activity to food and shock outcomes in central amygdala GABAergic neurons. Evidence supports claims of unconditioned stimulus activity that changes with learning. Compelling evidence that the circular shift method rigorously identifies functional neuron types is also presented. However, the evidence regarding claims related to valence or salience signaling in these neurons is incomplete. This work will be of interest to neuroscientists studying sensory processing and learning in the amygdala.

    2. Reviewer #2 (Public review):

      This study presents valuable insight on how neurons within the central amygdala may broadly encode the valence of emotional stimuli. The evidence supporting most of the authors' conclusion is solid, although some of the claims should be treated with caution due to potential alternative interpretation of the data.

      In this revised manuscript the authors have addressed the reviewers' critiques in a way that acknowledges the feedback but does not fully embrace or rigorously address the reviewers' core concerns. Here are the main observations that support this impression:

      (1) The authors repeatedly acknowledge the ambiguity in defining "valence" and "salience" in the literature, but their responses don't clarify how they address these terms more rigorously. They seem to justify their operational definitions by citing previous studies but do not address how their definitions impact the clarity and robustness of their findings.

      (2) The reviewers highlighted that using stimuli from different sensory modalities without scaling them or including neutral cues limits the ability to distinguish between valence and salience. The authors acknowledge this but argue that using same-modality stimuli would not produce distinct responses. This response doesn't address the reviewers' point about how these design limitations could weaken the conclusions. They seem to rely on citations of similar experimental designs instead of addressing the core critique or proposing additional experiments.

      (3) In response to the low number of cue-responsive units and the call for more rigorous behavioral measures (like licking or orienting), the authors provide some data but emphasize statistical rigor over behavioral insights, which was questioned during the initial review. They don't propose any methodological adjustments or consider alternative explanations.

      (4) The reviewers suggested clustering or other population-level analyses to understand functional diversity within the central amygdala. The authors argue that their statistical approach was sufficient and don't believe additional clustering analyses would add value. This response seems dismissive, as they don't consider whether population-level insights might reveal patterns that single-cell responses overlook.

      Overall, while the authors have responded to each concern, their rebuttals often reference other studies to justify their choices rather than addressing the specific limitations highlighted by the reviewers.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time (though the latter capability was not extensively utilized). Another strength is the use of a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      In the first version of this manuscript, my main critique was that the authors didn't fully test whether neurons encode valence. In their rebuttal, the authors justify their use of the terms valence and salience by citing prior works from different labs:

      (1) Li et al., 2019, doi: 10.7554/eLife.41223<br /> (2) Yang et al., 2023, doi: 10.1038/s41586-023-05910-2<br /> (3) Huang et al., 2024, doi: 10.1038/s41586-024-07819<br /> (4) Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031<br /> (5) Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006<br /> (6) Zhu et al., 2018, doi: 10.1126/science.aat0481<br /> (7) Comoli et al., 2003, doi: 10.1038/nn1113P

      Among these, items #1 and #3 primarily discuss valence, while #2, #4, #6, and #7 discuss salience, and #5 discusses both.

      Upon reviewing these references, the authors' identification of valence encoding patterns is still problematic, and indeed studies cited above show several lines of evidence for valence encoding that are absent here. For example, item #3 ranked behavioral responses to five different odors in drosophila, from most attractive to most repulsive, and saw neuronal responses correlated with the degree of attraction versus repulsion across all five odors. This is robust evidence for valence encoding that is absent here. Items #1 and #5 above are the other two valence-addressing studies cited, and although those only used one rewarding and one aversive stimulus (in rodents), both also added a neutral cue, and most critically, identified substantial subsets of neurons showing a rank-order response, e.g. either aversion > neutral > reward or aversion < neutral < reward. Again, that level of demonstration of valence encoding is not shown in the current study.

      Finally, two of the valence studies above tested responses to omission of reward/punishment, providing yet more evidence of valence encoding that is absent in the current study.

      While there is much to like about the current study, the claims of valence encoding appear hard to justify, and should be toned down.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.<br /> (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. eLife Assessment

      This is a valuable report of a spatially-extended model to study the complex interactions between immune cells, fibroblasts, and cancer cells, providing insights into how fibroblast activation can influence tumor progression. The model opens up new possibilities for studying fibroblast-driven effects in diverse settings, which is crucial for understanding potential tumor microenvironment manipulations that could enhance immunotherapy efficacy. While the results presented are convincing and follow logically from the model's assumptions, some of these assumptions, as acknowledged by the authors, may oversimplify certain aspects in light of complex experimental findings, system geometry, and general principles of active matter research. Nonetheless, the authors provide justification for their work as a meaningful step towards more comprehensive modeling approaches.

    2. Reviewer #1 (Public review):

      The authors present an important work where they model some of the complex interactions between immune cells, fibroblasts and cancer cells. The model takes into account the increased ECM production of cancer-associated fibroblasts. These fibres trap the cancer but also protect it from immune system cells. In this way, these fibroblasts' actions both promote and hinder cancer growth. By exploring different scenarios, the authors can model different cancer fates depending on the parameters regulating cancer cells, immune system cells and fibroblasts. In this way, the model explores non-trivial scenarios. An important weakness of this study is that, though it is inspired by NSCLC tumors, it is still far from modelling tumor lesions with morphologies similar to NSCLC tumors and does not explore the formation of ramified tumors. In this way, is a general model and it is challenging how it can be adapted to simulate more realistic tumor morphologies.

      Comments on revisions:

      The authors have improved the manuscript and addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors develop a computational model (and a simplified version thereof) to treat an extremely important issue regarding tumor growth. Specifically, it has been argued that fibroblasts have the ability to support tumor growth by creating physical conditions in the tumor microenvironment that prevent the relevant immune cells from entering into contact with, and ultimately killing, the cancer cells. This inhibition is referred to as immune exclusion. The computational approach follows standard procedures in the formulation of models for mixtures of different material species, adapted to the problem at hand by making a variety of assumptions as to the activity of different types of fibroblasts, namely "normal" versus "cancer-associated". The model itself is relatively complex, but the authors do a convincing job of analyzing possible behaviors and attempting to relate these to experimental observations.

      Strengths:

      As mentioned, the authors do an excellent job of analyzing the behavior of their model both in its full form (which includes spatial variation of the concentrations of the different cellular species) and in its simplified mean field form. The model itself is formulated based on established physical principles, although the extent to which some of these principles apply to active biological systems is perhaps debatable (see Weaknesses). The results of the model do indeed offer some significant insights into the critical factors which determine how fibroblasts might affect tumor growth; these insights could lead to new experimental ways of unraveling these complex sets of issues and enhancing immunotherapy. In this revised version, the authors have properly placed this work within the general context of other research on modeling the tumor-immune ecology.

      Weaknesses:

      Models of the form being studied here rely on a large number of assumptions regarding cellular behavior. One major issue is the degree to which close-to-equilibrium assumptions (such as the dynamics being driven by free energy minimization) can be taken as reliable predictors of the obviously active dynamics of biological cells. The authors have recognized this conceptual issue and have argued that these assumptions provide a reasonable first step for understanding the full complexity of dynamics in the tumor microenvironment.

      The problem of T cell infiltration as well as the patterning of the extracellular matrix (ECM) by fibroblasts necessarily involve understanding cell proliferation, cell motion and cell interactions due e.g. to cell signaling. There is evidence that inherently non-equilibrium interactions between the fibroblasts and the extracellular matrix can lead to patterning of the fiber network and trapping of potentially infiltrating T-cells. it is not clear the extent to which this type of interaction can be captured by the approach being used here, although the authors propose that they can be mimicked by proper terms in their formulation. This to me is the primary concern that I had with this paper.

      The authors have now addressed what used to be a separate weakness concerning the assumption that fibroblasts affect T cell behavior primarily by just making a more dense ECM. Instead, the organization of the ECM (for example, its anisotropy) could be playing a much more essential role than is given credit for here. This possibility is now discussed in some detail and the authors have suggested that the introduction of a nematic order parameter field would be a useful way to treat this effect.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a useful report of a spatially-extended model to study the complex interactions between immune cells, fibroblasts, and cancer cells, providing insights into how fibroblast activation can influence tumor progression. The model opens up new possibilities for studying fibroblast-driven effects in diverse settings, which is crucial for understanding potential tumor microenvironment manipulations that could enhance immunotherapy efficacy. While the results presented are solid and follow logically from the model’s assumptions, some of these assumptions may require further validation, as they appear to oversimplify certain aspects in light of complex experimental findings, system geometry, and general principles of active matter research.

      We thank the editor for recognizing the usefulness of our work. This work does not aim to precisely describe the complexity of the tumor microenvironment in lung cancer, but rather to classify and rigorously calibrate a minimum number of parameters to the clinical data we collect and generate, and reproduce the global structures of the microenvironment. We identify different scenarios, and show how they depend on the local interactions within this framework. Although we started in the first version with coalescence in the main text and anisotropic geometry in the supporting information, we realized that we needed to provide more directions to better show how our model can be extended. Thus, in Section III-4 we added an analysis of a microenvironment with blood vessels, and showed how to introduce anisotropic friction as a function of fiber orientation, as well as active stress, paving the way for further studies, that would make our model more complex. However, in a first step, it is crucial to start with a limited number of parameters that can be rigorously determined, and this is how this first work was conceived.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present an important work where they model some of the complex interactions between immune cells, fibroblasts and cancer cells. The model takes into account the increased ECM production of cancer-associated fibroblasts. These fibres trap the cancer but also protect it from immune system cells. In this way, these fibroblasts’ actions both promote and hinder cancer growth. By exploring different scenarios, the authors can model different cancer fates depending on the parameters regulating cancer cells, immune system cells and fibroblasts. In this way, the model explores non-trivial scenarios. An important weakness of this study is that, though it is inspired by NSCLC tumors, it is restricted to modelling circular tumor lesions and does not explore the formation of ramified tumors, as in NSCLC. In this way, is only a general model and it is not clear how it can be adapted to simulate more realistic tumor morphologies.

      We thank the reviewer for highligting the importance of our work. We acknowledge that although we provided anisotropic geometries and the study of the coalescence in the first version, more effort was needed to provide tools to extend our formalism to non-ideal cases. This is now added as Section III-4, where we analyze the impact of blood vessels, and the anisotropic friction due to the nematic order for the fibers; this nematic order can also be used to introduce active nematic stress.

      Reviewer #2 (Public review):

      Summary:

      The authors develop a computational model (and a simplified version thereof) to treat an extremely important issue regarding tumor growth. Specifically, it has been argued that fibroblasts have the ability to support tumor growth by creating physical conditions in the tumor microenvironment that prevent the relevant immune cells from entering into contact with, and ultimately killing, the cancer cells. This inhibition is referred to as immune exclusion. The computational approach follows standard procedures in the formulation of models for mixtures of different material species, adapted to the problem at hand by making a variety of assumptions as to the activity of different types of fibroblasts, namely ”normal” versus ”cancer-associated”. The model itself is relatively complex, but the authors do a convincing job of analyzing possible behaviors and attempting to relate these to experimental observations.

      Strengths:

      As mentioned, the authors do an excellent job of analyzing the behavior of their model both in its full form (which includes spatial variation of the concentrations of the different cellular species) and in its simplified mean field form. The model itself is formulated based on established physical principles, although the extent to which some of these principles apply to active biological systems is not clear (see Weaknesses). The results of the model do offer some significant insights into the critical factors which determine how fibroblasts might affect tumor growth; these insights could lead to new experimental ways of unraveling these complex sets of issues and enhancing immunotherapy.

      We thank the referee for this summary and for recognizing the strengths of our paper.

      Weaknesses:

      Models of the form being studied here rely on a large number of assumptions regarding cellular behavior. Some of these seemed questionable, based on what we have learned about active systems. The problem of T cell infiltration as well as the patterning of the extracellular matrix (ECM) by fibroblasts necessarily involve understanding cell motion and cell interactions due e.g. to cell signaling. Adopting an approach based purely on physical systems driven by free energies alone does not consider the special role that active processes can play, both in motility itself and in the type of self-organization that can occur due to these cell-cell interactions. This to me is the primary weakness of this paper.

      We thank the referee for this important comment, that allows us to clarify this important point. Although biological materials are out of equilibrium, their behavior often resembles that dictated by thermodynamics. Hence the usefulness of constructing a free energy, in terms of these variables. In a first approach to decipher the complex interactions and describe the different and sometimes non-trivial outcomes in this system that involves many components, we must start by minimizing the number of parameters, and identifying those complex processes, that control the evolution of the system. The free energy that we build on this biological system contains therefore out-of-equilibrium processes that can be approximated by a ”close to equilibrium” description. Our approach is a classical one in statistical physics of active systems, namely in the effort to construct an equivalent free-energy for out-of-equilibrium systems. This allows to gain a clearer insight into those complex processes.

      We have added a sentence in the main text, section III.1, to clarify this point:

      “Building a free-energy density for a biological material is justified, because, although biological materials are out of equilibrium, their behavior often resembles that dictated by thermodynamics. It is therefore useful to write a free energy in terms of state variables.”

      Nevertheless, we recognize that we should have provided more tools for using our formalism by making it active. This is why we introduced the nematic order in the fibers in Section III-4. This nematic order can be used to introduce active stress, and we have cited previous works by some of us see [?, ?, ?] as references for building active processes out of it.

      We must also note that cell signaling has been introduced a minima in our system for providing the cue for the arrival of T-cells and NAFs from the boundaries. However, we found that although we had evoked the other role of the chemicals in the transformation from NAFs to CAFs in the text, details were not well explained. We have therefore corrected and added some explanations in the introduction of section III, and III.1, III.2.

      A separate weakness concerns the assumption that fibroblasts affect T cell behavior primarily by just making a more dense ECM. There are a number of papers in the cancer literature (see, for some examples, Carstens, J., Correa de Sampaio, P., Yang, D. et al. Spatial computation of intratumoral T cells correlates with survival of patients with pancreatic cancer. Nat Commun 8, 15095 (2017);Sun, Xiujie, Bogang Wu, Huai-Chin Chiang, Hui Deng, Xiaowen Zhang, Wei Xiong, Junquan Liu et al. ” Tumour DDR1 promotes collagen fibre alignment to instigate immune exclusion.” Nature 599, no. 7886 (2021): 673-678) that seem to indicate that density alone is not a sufficient indicator of T cell behavior. Instead, the organization of the ECM (for example, its anisotropy) could be playing a much more essential role than is given credit for here. This possibility is hinted at in the Discussion section but deserves much more emphasis.

      The referee is right in his comment, and we thank him for raising this issue. We have therefore introduced the anisotropic orientation of the fibers, which induces an anisotropic friction in a new section III-4. In addition, the references pointed out were included in this section. However, although the anisotropy strongly influences the fate of the tumor when the fibers are oriented perpendicular to the surface of the cancer nest, it is less effective when the fibroblasts are oriented in the direction of surface of the cancer nest. In the latter case, which is often the case before cancer cells reshape the tumor microenvironment, the matrix density should correlate with the friction.

      Finally, the mixed version of the model is, from a general perspective, not very different from many other published models treating the ecology of the tumor microenvironment (for a survey, see Arabameri A, Asemani D, Hadjati J (2018), A structural methodology for modeling immune-tumor interactions including pro-and anti-tumor factors for clinical applications. Math Biosci 304:48-61). There are even papers in this literature that specifically investigate effects due to allowing cancer cells to instigate changes in other cells from being tumor-inhibiting to tumor-promoting. This feature occurs not only for fibroblasts but also for example for macrophages which can change their polarization from M1 to M2. There needed to be some more detailed comparison with this existing literature.

      The referee is right that the first part of our approach, namely the dynamical system may be common in this kind of system, and it needs to be mentioned. So we added the following sentence in the discussion: ”This is in line with several similar mathematical models, that study through this lens the inhibition/activation of the immune system by cancer cells either by means of compartmental nonlinear models similar to our dynamical system, for instance regarding macrophage recruitment and cytokine signaling {arabameri2018structural} {li2019computational}, or mixture models {fotso2024mixture}. We combine the two approaches in order to rigorosly derive the parameters of the model and gain insights from both.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address the following points:

      Major issues

      (1) The shape of tumors simulated differs immensely from the observed tumors in Fig. 2. Here, the tumor is constituted by irregular domains, not dissimilar from domains in phase separating mixtures. The domains simulated are circular. Since the authors are using the space dependent model to model the increase in tumor cells with time in the different scenarios (immune-desert, immune-excluded, immune inflamed), it should explain how non-spherical tumor structures can be observed in these scenarios. The authors introduce tumor coalescence in page 28, however, it is not expected that the structures observed in Fig 2 are the result from different tumors merging and coalescing, because that would result from an unlikely large number of initial mutation events in the same region of the tissue. The authors should explain what mechanisms present in the model can lead to non-spherical forms.

      We agree with the reviewer that real tumors are rarely round contrary to what our numerics suggests. In fact, only the last figure of our paper in the supporting information was more appropriate for such a discussion. We are now adding discussions and new figures to better illustrate our spatial model, see Figure 6 and section III-4. The in situ geometry of tumors depends on the shape of the host organ, the diffusive (chemical) or advected species such as T cells and fibroblasts, and on the nutrients. Thus, in our case, only cancer cells are produced locally, but during growth the tumor is strongly constrained by the microenvironment, and thus the geometry of the domain we model in the numerics and its boundary conditions. This is also true for the chemicals responsible for growth, cellular advection and phenotypic transformation. Their concentration depends on a convection-diffusion equation and boundary conditions. For a tumor in situ, such as in the lung, the available space is a constraint that will dominate the final geometry of the tumor nests. We do not think that coalescence is controlled by mutational events, but most likely by the search for space necessary for growth. Compared to the first version, we add new figures (Figure 6) that show that the geometry of the organ, as well as the localization of blood vessels, are a cause of the irregularity of the tumor shapes. We also introduce orientational order, which as suggested in section III-4, can induce anisotropic friction and stresses, as well as anisotropic growth. We cite (Ackermann, Joseph, and Martine Ben Amar. ”Onsager’s variational principle in proliferating biological tissues, in the presence of activity and anisotropy.” The European Physical Journal Plus 138.12 (2023): 1103.) where we described active stresses and coupling related to anisotropic growth.

      (2) According to the authors, the model presented in equations (1) and onwards simulates the evolution of the fraction of tumor cells in the tissue. However the fraction of tumor cells, for example, depends itself on the variation of other cell types. For example, if fibroblasts were to proliferate with rate alpha, even without tumor cells proliferating, the fraction of tumor cells in the mixture should decrease as alpha times the tumor cells fraction. These terms are missing. The equations do not describe the evolution of the cells’ fractions but of the amount of cells of each type, normalised by the total carrying capacity of non-normal cells in the tissue. The text should be rewritten accordingly.

      We agree with the referee: our definition of cell density was not precise enough and may appear misleading. In the paragraph II1, we more explictly introduce the word mass fraction which is the correct physical quantity to introduce into the spatial model.

      ”All these cells have the same mass density and the sum of their mass fraction satisfies the relationship S = C + T + F<sub>NA</sub> + F<sub>A</sub> = 1-N, where N is a healthy non active component as healthy cells, for example.”

      It is less intuitive than ”number of cells per unit volume” but necessary for the following (III)

      (3) The authors start by calculating fixed points of different versions of the dynamical system without spatial dependence. They should explain what is the relevance of these fixed points: in a real situation, where the concentration of tumor fibroblasts and T-cells depend on position, in which conditions are these fixed points relevant?

      The referee is right and we will clarify this point: the dynamic analysis is a help for understanding and predicting the scenario occurring in the system. After all the steps of paragraph 2.2, we are faced with 11 independent parameters only for the dynamical system and without the parameters generated by the space modeling itself. Our estimation concerns only lung cancer. These parameters do not appear in the literature. The parameters introduced in Sec. III which are more related to physical interactions such as friction, cell-cell adhesion, etc. can be found in the literature or can be estimated and thus measured in in vitro experiments (see Ackermann and Ben Amar, EPJP 2023, P. Benaroch, J. Nikolic et al. 2024, biorxiv). So what are the fixed points for: they help to get the right numbers for spatial analysis. To recover special features of cancer evolution, we need a model, but also correct estimates of the data in a code that is quite technical and heavy, with each simulation taking a certain amount of time. For users who only need rough predictions, the analysis in section 2 is sufficient.

      It is also important to note that the global result depends only on the source terms, and on the boundary conditions. This can be illustrated with a simple example: Consider the governing equation for the density of a component with velocity v and source term:

      Integrating the equation over a fixed volume V of surface S gives:

      . This integrated equation can then be approximated by the dynamical system that we write. Thus, while the dynamical system does not give any information about the local structure of the system, it may be indicative of its global outcome.

      (4)   In page 15, the authors identify that α<sub>NA</sub> is proportional to δ𝝐<sup>4</sup>. However, in equation (7), they replace α<sub>NA</sub> by δ𝝐<sup>4</sup> without the proportionality constant. This should be corrected.

      Thank you for your remark. This typo is now corrected.

      (5) The tumor cell movement should be much slower than the T-cells. Here, the authors assign a similar friction coefficient for the cancer cells and T-cells, for example. However, in lung cancer tumor cells are epithelial, and adhere to each other in the tissue. Their movement is very restricted by the basement membranes and by cell-cell adhesion. Immune cells and T-cells on the other hand move rapidly throughout the stroma. It is a gross simplification to not consider the low epitelial tissue mobility in the context of lung cancer.

      It is possible to assume different friction coe cients for each phase pair. This has been done in a previous publication, Ackermann et al., Physics report 2021. It is also possible to play with the cell-cell adhesion in the energy density and on the diffusion coe cient introduced in the Flory-Higgins free energy. Cell-cell adhesion is taken into account in the energy, and this makes the tumor a more dense phase, while T-cells can move towards cancer cells to which they are attracted. In the last part of the paper, we show the role of an anisotropic friction due to a nematic order for activated fibroblasts and all the other cells

      (6) What is the biological mechanism by which the T-cells form a colony with a surface tension? In the phase-field model, the authors have a surface tension assigned to the cancer cells, T-cells and fibroblasts. Can the authors justify biologically why do they consider these surface tensions?

      The fact that T-cells form a colony is due to the accumulation of T-cells at the outer boundary of the tumor, as they are attracted to it but cannot penetrate due to the strong cell-cell adhesion of the tumor cells in the nest. Adding a gradient square is standard in continuous models to limit the sharp variations. In a continuous approach, the gradient square contribution limits the sharp variations in cell density which are not physical.

      Minor issues

      (a) Page 6 (end), characterisation of the fibre barrier produced by CAFs missing: what is the fibre density, how it can hinder the spread of cancer and T-cell motility? Is it so dense that it prevents ameboid movement? Can cells move through it using matrix degradation proteins?

      The fiber density corresponds to the fibrous organic extracellular matrix secreted by cancer-associated fibroblasts. In desmotic (highly fibrous tumors such as PDAC or NSCLC), this extracellular matrix deposited around the tumor forms a physical barrier around the tumor nest, preventing both cell migration and capillary and immune cells penetration. In these cases, the fibrous belt actually prevents ameboid movement and cells must deform significantly to migrate. The role of this barrier was particularly demonstrated in the reference (Grout, John A., et al. ”Spatial positioning and matrix programs of cancer-associated fibroblasts promote T-cell exclusion in human lung tumors.” Cancer Discovery 12.11 (2022): 2606-2625.). In later stages of cancer, the tumor may adapt and develop strategies to metastasize, such as matrix degradation. This matrix can be oriented, organized or disordered. To build a minimal model, we first considered an isotropic friction and also an anisotropic friction of the nematic belt, due to the activated fibroblasts. In the case of T-cells, as mentioned in section I.1, it is true that the biological literature also considers a phenotypic transformation of the T cells by the activated fibroblasts: this concerns both their proliferative capacities, antigen recognition and also their cytotoxic function. To better document the different mechanisms, we add the following publication: Cancer associated fibroblasts-an impediment to effective anti-cancer T cell immunity, by Koppensteiner, Lilian and Mathieson, Layla and O’Connor, Richard A and Akram, Ahsan R, Frontiers in immunology (2022).

      However, our goal is to build a minimal model and to characterize and quantify the physical process in which CAFs are involved, namely the role of a physical barrier, that has been documented, as documented above.

      (b) Page 19 (Fig 3), in the figure legend it is written ”resting fibroblasts”, should be ”non-activated fibroblasts”.

      The referee is right: it will be better to write non-activated fibroblasts. This is now changed in the main text.

      (c) Page 21 (equation), what is dΩ? It is dr?

      We thank the referee for raising this point. The text was indeed ambiguous as sometimes dΩ was replaced by dr. To be clearer, all the elements of volume are now noted dV , and the element of surface of the system are noted dS.

      In the article the units are in italic and should be in roman.

      Thank you for raising this point. It has been corrected.

      (d) Page 25 (beginning section III.3), the authors mention that the simulation is 2D, however, the simulation has radial symmetry. A 1D simulation in radial coordinates could simulate a 3D spherical system. Is the simulation of this section equivalent to a 1D radial simulation (in 2D)?

      The referee is right that in radial symmetry, a 1d equation may be written. We therefore present numerics with irregular shapes of the tumor nest in order to make the system fully 2d.

      (e) Page 26 (Fig 4). Legends inside the plots of plates A, B, C and D are not clear. Colorbar range of plates A and D is different. Would facilitate if the ranges were the same.

      The referee is right: the surface plots presented in figure 4 would be easier to compare with the same colorbar range for the legends. In fact, as the referee noted, figures in A, B and C have the same legends, while figure in D has a different one. This is due to the fact that D represents the case of the immune-inflamed tumor where the cancer mass fraction is quite vanishing, resulting in values that are of 3 orders of magnitude lower than those present in A, B and C. Therefore, they would disappear if the colorbar range were equal to the others.We insist more on the change of scale in the legend of Figure 4, in the new version.

      (f) Page 29 (Fig 5), would facilitate if the order of immune-desert, immune-excluded, immune-inflamed was maintained throughout the document. In this figure the immune-inflamed case appears first.

      We agree with the reviewer that following the same order in which the different cases are presented throughout the manuscript would be helpful in comparing the different figures. Therefore, we have modified Figure 5.

      (g) Page 31, the authors indicate that pharmacodynamics and pharmacokinetics are highly dependent on tumour spatial structure. Can they provide examples and citations?

      In the discussion, we have added references concerning pharmacodynamics.

      (h) Page 33 (Fig Sup2), would facilitate if the order of immune-desert, immune-excluded, immune-inflamed was maintained throughout the document. ±±

      We thank the reviewer for pointing this out, the order of the different scenarios in Fig Sup 2 has now been changed.

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) Following on from the discussion in the public review, I feel that there are a number of critical issues that need to be addressed regarding modeling assumptions. I would like to understand why the authors believe it is possible to use a free energy-driven model of the microenvironment when many of the processes relevant for their study have an undeniably ”active media” flavor.

      The referee is right that processes in biology are active processes. However, it is a classical approach to model physical interactions between biological components with a free-energy, especially cell adhesion, as they often lead to quasi-stationary equilibrium-like patterns. The free-energy approach has also the advantage to derive straight-forwardly complex phenomena involving many components. Activity can indeed be introduced in such a framework, if we know that the fibroblasts transform into myo-fibroblasts, see for example our previous publication Ackermann and Ben Amar, EPJP 2023. However, in the interest of simplification and reduction of the number of free parameters, we have not not considered further complication of the model here, as a minimal model allows to distinguish the main processes that occur. Nevertheless, introducing more precisely activity, in the nematic approach already achieved for the friction, is a natural continuation of our work: See the new Section III-4, where we introduce the nematic order, and we indicate that active nematic stresses can be written from it.

      Next, I don’t understand the assumption that T cells do not proliferate once they detect neoantigens on the cancer cells; activation of T cells usually causes them to become more proliferative.

      We thank the referee for this question. The T-cell fraction has two origins: proliferation of T-cells in situ in the stroma or inside tumor nest or external arrival from the sources that we privilege. We recognize that a full analysis of the tumor-microenvironment would require to consider proliferation near the tumor, as many more other processes which is do able but requires the knowledge of more biological date. In addition, besides, the proliferation of T-cells will be equivalent to increase the killing abilities of T-cells and these two effect overlapp in our approach.

      In order to clarify this point, we modify the following sentence in Section II.2:

      “Although proliferation of cytotoxic T-cells has been observed, we do not consider explicitly proliferation in our study as we focus on their ability to infiltrate the tumor.”

      Rather, we consider that T-cells proliferate outside the domain boundaries, so that this proliferation is included in the boundary source contributions.

      Finally, the issue of whether the density of fibers is sufficient to understand the role of fibroblasts is not at all settled. There should be a full discussion of this issue including mentioning of the Nature paper (cited in the public review) that argues that orientation (and not density) is the key to the role of fibers, as well as the earlier cited work of Kalluri and collaborators on the role of ECM density in pancreatic cancer.

      We thank the referee for this remark. As we wrote above in the response to the public review, we introduced significant additions that aim to tackle this question in the article.

      (2) The authors present a picture of a tumor cell with fibroblasts apparently arrayed circumferentially around the tumor boundary and therefore blocking infiltration. This type of tumor structure has been seen before, for example in ”On the mechanism of long-range orientational order of fibroblasts.” Proceedings of the National Academy of Sciences 114, no. 34 (2017): 8974-8979, which should be cited. More importantly, in that paper the argument is made that positive feedback between fibroblasts and ECM geometry can cause structures like this to form. If this is indeed what is occurring, this would indicate the crucial importance of a mechanism beyond what is contained in the current model. This issue should therefore be discussed within this paper. This issue is of course connected to the previous point regarding the role of ECM structure beyond density.

      We completely agree that the interplay between the fibroblast layer and the tumor shapes the tumor boundary. One of the authors has worked recently on this precise topic (Aging and freezing of active nematic dynamics of cancer-associated fibroblasts by fibronectin matrix remodeling, C Jacques, J Ackermann, S Bell, C Hallopeau, CP Gonzalez, ... bioRxiv, 2023.11. 22.568216, Ordering, spontaneous flows and aging in active fluids depositing tracks S Bell, J Ackermann, A Maitra, R Voituriez arXiv preprint arXiv:2409.05195). Since the fibroblast layer is an active material, it contributes to an anisotropic stress that can be introduced into the model. Our first strategy was to present the simplest modeling in order to focus on the most important interactions as cell-cell adhesion and cell-tissue adhesion. However, we recognize that those questions should be discussed in the text, and we discuss it in the new section III-4

      Minor points

      There are also a number of more minor points to consider:

      (1) Since the parameter is taken to be O(1), why exactly does it matter how the other parameters scale with it?

      It is very important to compare the order of magnitude of the other parameters once the selected parameter of order O(1) is really the driving parameter of the coupling. It gives a first picture of the main interactions that has to consider.

      (2) I didn’t understand the relevance of referring specifically to IL 6 among many other possibly relevant signals, as is currently done on page 7.

      This corresponds to studies aiming to correlate lung cancer risks and the concentration of interleukin, mostly IL6 and IL8 (McKeown, D. J., et al. ”The relationship between circulating concentrations of C-reactive protein, inflammatory cytokines and cytokine receptors in patients with non-small-cell lung cancer.” British journal of cancer 91.12 (2004): 1993-1995.,Brenner, Darren R., et al. ”Inflammatory cytokines and lung cancer risk in 3 prospective studies.” American journal of epidemiology 185.2 (2017): 86-95. ) but in the absence of very detailed biological information, the modeling and its results are not modified if other chemicals intervene..We slightly modeified the following phrase in section I.1:

      “In particular, in the family of inflammatory proteins, also called cytokines, Interlukin-6 (IL6) and (IL8) seem, among others to stimulate the infiltration of CD8<sup>+</sup>.

      (3) The authors need to mention the possibility of T-cell chemotaxis to the tumor being ”self-amplified” in the T cell system, as put forth in Galeano Nin˜o, Jorge Luis, Sophie V. Pageon, Szun S. Tay, Feyza Colakoglu, Daryan Kempe, Jack Hywood, Jessica K. Mazalo et al. ”Cytotoxic T cells swarm by homotypic chemokine signalling.” eLife 9 (2020): e56554. This might again reveal a needed extension of the current modelling strategy.

      We thank the referee for his/her comment on the self-amplification of T-cell population in the stroma and we mention the indicated reference in our paper. This auto-chemoatactic process which induces a dynamic of more e cient recruitment towards the tumor, may be important for immunotherapy. To have more e cient T-cell arriving at the site of the tumor, will lead a better issue for the patient, if the swarming organization is maintained in a desmoplastic nematic stroma.

      (4) It is not obvious to me that in sub figures 3F and 3H the tumor is enroute to being totally eradicated, as is stated in the text. The blue lines seemed to asymptote at non-zero population values.

      Looking at sub-figures 3F and 3H, we stated in the main text that the tumor is eradicated as the representative population approaches a 0 value fraction, or at least decays around the 0 (0.01/0.05 to be more precise). This is even more evident when compared with the other cases where the tumor mass fraction reaches values of a higher order (up to 0.6), thus leading us to dinstinguish between these different scenarios.

      (5) The description of the interaction of cells with fibers as being increased friction might be misleading, as the real effect could be actual trapping in the network (as opposed to just slowing down the motion).

      We thank the referee for this question as it allow us to make an important distinction. Indeed, what the referee describes seems to correspond to a discrete event, namely a cell trapped in a network. However, coarse-graining the dynamics to the continuous modeling seems to us as leading to an effective friction between the two phases. Moreover, we also now introduced an anisotropic friction which can represent a trapping. The velocities are not only directed around the tumor but can also be oriented towards the tumor, so that eventually the friction along the radius mimics a trapping (see Fig.4 on top). We have introduced this anisotropic friction via a nematic model, see the appendix.

    1. eLife Assessment

      In this manuscript, the authors describe the creation of a transgenic mouse expressing a reporter for Integrated Stress Response (ISR) activation in a CRE-dependent manner. Reliable tools for detecting ISR activation in situ are lacking, so this manuscript describes a potentially valuable tool that builds on and overcomes some of the limitations of a similar viral vector described by the authors in a previous publication. Solid evidence suggests that distinct populations of cells (ChAT) in the nervous system are marked by some level of ISR activation, and that the mouse could be most helpful as a screen for cell types in which the ISR is particularly active, although it would be difficult to draw conclusions from the reporter alone. Additional validations of the reporter activity in situ will further strengthen the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      The authors created a transgenic mouse line to read out integrated stress responses with single-cell resolution.

      Strengths:

      ISR plays an important role in the development, maintenance, and degeneration of the nervous system. This mouse line represents a potentially important tool to understand ISR in situ.

      Weaknesses:

      The current manuscript is clearly written. However, more validation experiments should be performed to understand the exact meaning of the fluorescence intensity of GFP and RFP channels. This is important because these results will define how this tool will be used in the future and in the field.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors create transgenic animals with a CMV promoter driving expression of their DIO-SPOTlight construct in which uORF2 and the authentic ORF of Atf4 are replaced by GFP and tdTomato respectively, such that ISR activation is predicted to diminish GFP expression and enhance RFP expression. The major experimental finding of the paper is that cholinergic neurons have the most robust activation of the reporter, consistent with and extending upon their previous work.

      Strengths:

      It is very likely that the reporter does indeed read out on ISR activation at some level. It is mostly likely to be useful for screening and hypothesis testing than for gaining mechanistic insight, because, as the authors note in the present version, ATF4 itself is but one component of ISR activation. Cells might have robust eIF2a phosphorylation but have suppressed translational regulation (for instance by regulating the expression of eIF2B). The mRNA and protein half-lives of the GFP and Tomato are likely quite different from that of the equivalent components in ATF4, which means that the reporter is likely to behave differently from ATF4 itself over time.

      Weaknesses:

      The major element that the current manuscript lacks is a detailed comparison between how the reporter behaves and how it tracks with eIF2a phosphorylation, ATF4, and the initiation of the gene expression program downstream of ATF4. While this would be difficult to do in vivo, it would seem much more feasible to isolate primary cells (neurons, fibroblasts, hepatocytes, etc.) from the animals and thoroughly characterize the kinetics of reporter-versus-ISR activation. In that way, the reader can have a better idea of how to interpret the behavior of the reporter. As it is, the authors' attempt to account for the reporter's behavior in Figure 3F is purely speculative and not backed by experiment or modeling.

    4. Reviewer #3 (Public review):

      Summary:

      The previously described reporter SPOTlight is a fluorescence-based reporter of the integrated stress response, specifically, protein synthesis initiation dynamics. In the current study from the same lab, the authors describe the creation and characterization of a transgenic mouse that expresses SPOTlight.

      Strengths:

      The previously described reporter has now been made into a Cre-dependent transgene in mice. The authors replicate previous findings from their lab that were acquired using viral vector-mediated delivery of their reporter.

      Weaknesses:

      There is not a clear advantage to having the Cre-dependent SPOTlight reporter in a transgenic mouse over using a viral vector to deliver the same Cre-dependent SPOTlight based on the experiments presented. There are potential general advantages and disadvantages to virus vs transgenic mouse but no side-by-side comparisons are performed here.

      It is not clear whether overexpressing the reporter alters basal ISR/UPR function and gene expression. The CAG is a strong promoter and overexpression of fluorescent proteins (or any protein) can potentially stress protein synthesis and processing mechanisms. The use of the animal as a reporter may be misleading if the presence of the reporter is already altering ISR/UPR.

    1. eLife Assessment

      This is an important study that reports the mechanism by which Ankle2 (LEM4 in humans) interacts with and recruits PP2A and the ER protein Vap33 to promote BAF dephosphorylation and mediate nuclear membrane reformation, using Drosophila as their model. Using Ankle2 mutants, they find that the ER protein Vap33 is key for the normal interphase localisation of Ankle2/LEM4 and also impacts on the function of Ankle2/LEM4 during mitosis. The conclusions on the subcellular localization of Ankle2 are drawn from overexpression of constructs. Overall, the authors use a variety of complementary techniques and provide convincing evidence to support the claims and advance our knowledge in the field of mitosis and nuclear envelope biology.

    2. Reviewer #1 (Public review):

      Summary:

      In organisms with an open mitosis, nuclear envelope breakdown at mitotic entry and re-assembly of the nuclear envelope at the end of mitosis are important, highly regulated processes. One key regulator of nuclear envelope re-assembly is the BAF (Barrier-to-Autointegration) protein, which contributes to cross-linking of chromosomes to the nuclear envelope. Crucially, BAF has to be in a dephosphorylated form to carry out this function, and PP2A has been shown to be the phosphatase which dephosphorylates BAF. The Ankle2/LEM4 protein has previously been identified as an important regulator of PP2A in the dephosphorylation of BAF but its precise function is not fully understood, and Li and colleagues set out to investigate the function of Ankle2/LEM4 in both Drosophila flies and Drosophila cell lines.

      Strengths:

      The authors use a combination of biochemical and imaging techniques to understand the biology of Ankle2/LEM4. On the whole the experiments are well conducted and the results look convincing. A particular strength of this manuscript is that the authors are able to study both cellular phenotypes and organismal effects of their mutants by studying both Drosophila D-mel cells and whole flies.<br /> The work presented in this manuscript significantly enhances our understanding of how Ankle2/LEM4 supports BAF dephosphorylation at the end of mitosis. Particularly interesting is finding that Ankle2/LEM4 appears to be a bona fide PP2A regulatory protein in Drosophila, as well as the localisation of Ankle2/LEM4 and how this is influenced by the interaction between Ankle2 and the ER protein Vap33. It would be interesting to see, though, whether these insights are conserved in mammalian cells, e.g. does mammalian Vap33 also interact with LEM4? Is LEM4 also a part of the PP2A holoenzyme complex in mammalian cells?

      Weaknesses:

      This work is certainly impactful but more discussion and comparison of the Drosophila versus mammalian cell system would be helpful. Also, to attract the largest possible readership, the Ankle2 protein should be referred to as Ankle2/LEM4 throughout the paper to make it clear that this is the same molecule.

      A schematic model at the end of the final figure would be very useful to summarise the findings.

      Comments on revisions:

      The authors have carefully revised the manuscripts and have satisfactorily addressed the issues that were raised by the reviewers.

    3. Reviewer #2 (Public review):

      The authors first identify Ankle2 as a regulatory subunit and direct interactor of PP2A, showing they interact both in vitro and in vivo to promote BAF dephosphorylation. The Ankyrin domain of Ankle2 is important for the interaction with PP2A. They then show Ankle2 also interacts with the ER protein Vap33 through FFAT motifs and they particularly co-localize during mitosis. The recruitment of Ankle2 to Vap33 is essential to ER and nuclear envelop membrane in telophase while earlier in mitosis, it relies on the C terminus but not the FFAT motifs for recruitments to the nuclear membrane and spindle envelop in early mitosis. The molecular determinants and receptors are currently not known. The authors check the function of the PP2A recruitment to Ankle2/Vap33 in the context of embryos and show this recruitment pathway is functionally important. While the Ankle2/Vap33 interaction is dispensable in adult flies -looking at wing development, the PP2A/Ankle2 interaction is essential for correct wing and fly development. Overall, this is a very complete paper that reveals the molecular mechanism of PP2A recruitment to Ankle2 and studies both the cellular and the physiological effect of this interaction in the context of fly development.

      The paper is well-written and the narrative is well developed. The figures are of high quality, well-controlled, clearly labelled and easy to understand. They support the claims made by the authors.

      Comments on revisions:

      There are still issues with the statistics. On graphs where multiple conditions are shown, you cannot perform a T-test. You have to use other tests such as ANOVA if the data is normal, and other tests such as KS test if the data is not normally distributed.

    4. Reviewer #3 (Public review):

      The authors were interested in how Ankle2 regulates nuclear envelope reformation after cell division. They show that Ankle2 can bind in a PP2A complex without other known regulatory subunits of PP2A. The authors also identity a novel interaction with ER protein Vap33 that could be important for localization. This manuscript is a useful finding linking Ankle2 function during nuclear envelope reformation to the PP2A complex. The authors present solid data showing that Ankle2 can form a complex with PP2A-29B and Mts and generate a phosphoproteomic resource that is fundamentally important to understand Ankle2 biology. The caveat should be remembered that most experiments, including subcellular localization, are based on overexpression data. Keeping this in mind, the manuscript is a valuable resource.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      In organisms with open mitosis, nuclear envelope breakdown at mitotic entry and re‐assembly of the nuclear envelope at the end of mitosis are important, highly regulated processes. One key regulator of nuclear envelope re‐assembly is the BAF (Barrier‐to‐Autointegration) protein, which contributes to cross‐linking of chromosomes to the nuclear envelope. Crucially, BAF has to be in a dephosphorylated form to carry out this function, and PP2A has been shown to be the phosphatase that dephosphorylates BAF. The Ankle2/LEM4 protein has previously been identified as an important regulator of PP2A in the dephosphorylation of BAF but its precise function is not fully understood, and Li and colleagues set out to investigate the function of Ankle2/LEM4 in both Drosophila flies and Drosophila cell lines.

      Strengths: 

      The authors use a combination of biochemical and imaging techniques to understand the biology of Ankle2/LEM4. On the whole, the experiments are well conducted and the results look convincing. A particular strength of this manuscript is that the authors are able to study both cellular phenotypes and organismal effects of their mutants by studying both Drosophila D‐mel cells and whole flies.

      The work presented in this manuscript significantly enhances our understanding of how Ankle2/LEM4 supports BAF dephosphorylation at the end of mitosis. Particularly interesting is the finding that Ankle2/LEM4 appears to be a bona fide PP2A regulatory protein in Drosophila, as well as the localisation of Ankle2/LEM4 and how this is influenced by the interaction between Ankle2 and the ER protein Vap33. It would be interesting to see, though, whether these insights are conserved in mammalian cells, e.g. does mammalian Vap33 also interact with LEM4? Is LEM4 also a part of the PP2A holoenzyme complex in mammalian cells? 

      We feel that conducting experiments to test the level of conservation of our findings in mammalian cells is outside the scope of our study, and we will leave it for other labs to investigate.

      Weaknesses: 

      This work is certainly impactful but more discussion and comparison of the Drosophila versus mammalian cell system would be helpful. Also, to attract the largest possible readership, the Ankle2 protein should be referred to as Ankle2/LEM4 throughout the paper to make it clear that this is the same molecule. 

      We have reinforced our presentation and discussion of similarities and differences between Ankle2 from Drosophila vs humans where relevant throughout the Introduction and Discussion sections. Additionally, we have added the mention that Ankle2 is also called LEM4 in humans in the Abstract and Introduction. However, when referring to Drosophila Ankle2, we do not use LEM4 because it is not listed as an alternate name for this gene/protein in FlyBase.

      A schematic model at the end of the final figure would be very useful to summarise the findings.

      We have already provided a schematic model in Figure S3, where we think it is better placed.

      Reviewer #2 (Public review):

      The authors first identify Ankle2 as a regulatory subunit and direct interactor of PP2A, showing they interact both in vitro and in vivo to promote BAF dephosphorylation. The Ankyrin domain of Ankle2 is important for the interaction with PP2A. They then show Ankle2 also interacts with the ER protein Vap33 through FFAT motifs and they particularly co‐localize during mitosis. The recruitment of Ankle2 to Vap33 is essential to ER and nuclear envelop membrane in telophase while earlier in mitosis, it relies on the C terminus but not the FFAT motifs for recruitments to the nuclear membrane and spindle envelop in early mitosis. The molecular determinants and receptors are currently not known. The authors check the function of the PP2A recruitment to Ankle2/Vap33 in the context of embryos and show this recruitment pathway is functionally important. While the Ankle2/Vap33 interaction is dispensable in adult flies ‐looking at wing development, the PP2A/Ankle2 interaction is essential for correct wing and fly development. Overall, this is a very complete paper that reveals the molecular mechanism of PP2A recruitment to Ankle2 and studies both the cellular and the physiological effect of this interaction in the context of fly development.

      Strengths: 

      The paper is well written and the narrative is well‐developed. The figures are of high quality, wellcontrolled, clearly labelled, and easy to understand. They support the claims made by the authors. 

      Weaknesses: 

      The study would benefit from being discussed in the context of what is already known on Ankle2 biology in C.elegans and human cells. It is important to highlight the structures shown in the paper are alphafold models, rather than validated structures. 

      We have enhanced our presentation of what is known about LEM‐4L/Ankle2 in C. elegans and humans in the Introduction, and further developed comparisons of our findings regarding Drosophila Ankle2 with these orthologs in the Results and Discussion sections. We have also specified in all sections and figure legends that the structures shown are AlphaFold3 models.

      Reviewer #3 (Public review): 

      Summary: 

      The authors were interested in how Ankle2 regulates nuclear envelope reformation after cell division. Other published manuscripts, including those from the authors, show without a doubt that Ankle2 plays a role in this critical process. However, the mechanism by which Ankle2 functions was unclear. Previous work using worms and humans (Asencio et al., 2012) established that human ANKLE2 could bind endogenous PP2A subunits. The binding was direct and was mediated through a region before and including the first ankyrin repeat in human ANKLE2. In addition to its interaction with PP2A, Asencio et al., 2012 also show that ANKLE2 regulates VRK1 kinase activity. Together PP2A and VRK1 regulate BAF phosphorylation for proper nuclear envelope reformation. Here, the authors provide more evidence for interaction with PP2A by also mapping the domain of interaction to the ankyrin repeat in Drosophila. In addition, the ankyrin repeat is essential for nuclear envelope reformation after division. They show that Ankle2 can bind in a PP2A complex without other known regulatory subunits of PP2A. The authors also identify a novel interaction with ER protein Vap33, but functional relevance for this interaction in nuclear envelope reformation is not provided in the manuscript, which the authors explicitly state. This manuscript does not comment on the activity of Ballchen/VRK1 in relation to Ankle2 loss and BAF phosphorylation or nuclear envelope reformation, even though links were previously shown by multiple studies (Asencio et al., Link et al., Apridita Sebastian et al.,). Nuclear envelope defects were rescued by the reduction of VRK1 in two of these manuscripts. It is possible that BAF phosphorylation phenotypes can be contributed by both PP2A inactivity and VRK1 overactivity due to the loss of Ankle2.

      Strengths: 

      This manuscript is a useful finding linking Ankle2 function during nuclear envelope reformation to the PP2A complex. The authors present solid data showing that Ankle2 can form a complex with PP2A‐29B and Mts and generate a phosphoproteomic resource that is fundamentally important to understanding Ankle2 biology. 

      Weaknesses: 

      However, the main findings/conclusions about subcellular localization might be incomplete since they are drawn from overexpression experiments. In addition, throughout the text, some conclusions are overstated or are not supported by data. 

      It is true that all experiments studying subcellular localization were done with tagged proteins overexpressed in flies and cell culture. Nevertheless, we show that Ankle2‐GFP is functional since it rescues phenotypes resulting from the loss of endogenous Ankle2 in both flies and cultured cells. The antibodies we generated against Ankle2 were unable to reliably detect the endogenous protein by immunofluorescence. We have now stated this caveat in our manuscript. Regarding the validity of our conclusions in relation to our data, we address each point raised by the reviewer under the Recommendations for the authors. In some cases, we have adjusted our conclusions and in other cases, we have provided additional clarification or justification. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      There are a few experimental issues that should be addressed, specific comments are listed below: 

      (1) Figure 1F: In this experiment, the authors immunoprecipitate GFP‐PP2A‐29B or PP2A‐B29BGFP and Western blot for Ankle2 and Mts to demonstrate that both are co‐immunoprecipitated. To demonstrate that these interactions are specific, the authors should also blot for a protein that is expected to definitely NOT co‐immunoprecipitate with PP2A‐B29; e.g. tubulin. 

      Our conclusion that GFP‐PP2A‐29B and PP2A‐29B‐GFP specifically interact with Ankle2 and Mts is also based on mass spectrometry analysis of the purification products from embryos and cells in culture, comparing with products of purification of GFP alone (Fig 1E‐F, S1C‐D and Tables S2, S3). The lists of identified proteins reveal that most proteins (including tubulins) are not enriched with GFP‐PP2A‐29B or PP2A‐29B‐GFP like Ankle2 and Mts are.

      (2) Figure 2A: The colour coding of the dots is not explained in the figure legend. 

      We have now added the explanation.

      (3) Figure 2B: The competition experiment is a good idea. Do the authors get the same results when they conduct the experiment the other way round, i.e. keep the concentration of Tws the same but increase the concentration of Ankle2? 

      We have tried this reverse experiment but saw little effect. The failure to observe displacement of Tws by Ankle2 in this context could be due to a higher affinity of Tws than Ankle2 in the PP2A complex, or to lower expression levels achieved for Ankle2 (a larger protein) relative to Tws.

      (4) Figure 5D: The hyperphosphorylation of BAF is very difficult to see, and it is impossible to tell whether the hyperphosphorylation has been rescued or not by the different Ankle2 constructs. Can the phosphorylated and the hyperphosphorylated bands be separated better? This panel needs significant improvements to support the claims in the text.

      In our opinion, the hyperphosphorylated (upper band) and unphosphorylated (lower band) forms of BAF are well resolved and readily distinguishable. The fainter band in the middle could correspond to a partially phosphorylated form of BAF but we do not venture to speculate on its precise identity nor do we need it to draw our conclusions. The important information from this blot is that the level of unphosphorylated BAF after Ankle2 RNAi increases when Ankle2WT‐GFP and Ankle2Fm+FL1‐GFP are expressed but not when Flag‐GFP or Ankle2ANK‐GFP are expressed. In these experiments, the rescue of unphosphorylated BAF is incomplete because not all cells express the GFP‐tagged protein in our non‐clonal stable cell lines.

      Reviewer #2 (Recommendations for the authors):

      (1) The alphafold models need to be labelled as such better on the figures, to distinguish them from X‐ray crystallography structures. Alphafold will always propose a solution but it is not necessarily correct. 

      We have added the note “MODEL” directly in Figures 2C, 2D, 4F and S3B, in addition to the information already provided in the text and figure legends specifying that these are models generated by AlphaFold3.

      (2) Figure 4 F. Annotate the Ankle2 FL1 peptide. 

      We have indicated the amino acid residues in the figure.

      (3) Problems with the statistical tests. T‐tests cannot be used for comparing multiple groups, as this favors error propagation. 

      All of our t‐tests compare only two groups at a time, as indicated. In this regard, our labeling in Fig 5C may have been misleading. We have now changed it.

      (4) Close‐ups of ring canal in Figure S2. In Figure S2, there seem to be lots of GFP‐Ankle2 vesicles in the cytoplasm of the oocyte. 

      We agree that the image showing Ankle2‐GFP alone in the RNAi Vap33 condition suggested a cytoplasmic granular localization of unknown nature. However, upon examination, we realized that this image did not correspond to the same z‐step as the matching merged image (which also

      included DNA staining). We have now replaced the image with the correct one.

      Reviewer #3 (Recommendations for the authors): 

      Be more accurate about what conclusions can be made from reported data, particularly from overexpression and deletion studies. 

      (1) The domain analysis for physical interaction is quite thorough. However, localization information is taken from overexpressed constructs. While these data show what could happen, the authors are not using endogenous levels of Ankle2 in cells or tissues that are known to require Ankle2. As a result, it is difficult to determine whether localization results are biologically meaningful. 

      We have added the following text at the end of the third Results section:

      “We were unable to examine the localization of endogenous Ankle2 because the antibodies that we generated gave inconclusive results in immunofluorescence. For the remainder of our study, we relied on the overexpression of Ankle2‐GFP, which may not perfectly reflect the localization and function of endogenous Ankle2. However, Ankle2‐GFP is functional as it can rescue phenotypes observed when endogenous Ankle2 is depleted (see below).”

      (2) The data showing that Ankle2 is a regulator unit of the PP2A complex also relies on in vitro binding assays in an over‐expression context. Data certainly show Ankle2 can bind proteins in the PP2A complex when overexpressed. However, the authors could not isolate enough of the complex from the animal to test function, so Ankle2 acting as a regulatory subunit isn't functionally shown. There are other possibilities, such as Ankle2 acts as a scaffold for complex assembly.  

      The competition experiments shown in Fig 2 are based on complexes assembling in cells and are not in vitro binding assays. We show 4 lines of evidence supporting the idea that Ankle2 functions as a regulatory subunit of PP2A: 1) Ankle2 interacts with the structural (PP2A‐29B) and catalytic (Mts) subunits of PP2A without any known regulatory subunit of PP2A. 2) Depletion of Ankle2 leads to the hyperphosphorylation of the known PP2A substrate BAF. 3) The PP2A regulatory subunit Tws/B55 competes with Ankle2 for formation of a complex with PP2A. 4) AlphaFold3 predicts that Ankle2 engages in a complex with PP2A at a position similar to that of known regulatory subunits of PP2A including Tws/B55, and consistent with their mutually exclusive presence in PP2A complexes. If Ankle2 acted as a scaffold for the formation of a PP2A complex containing other regulatory subunits, we would expect to detect Ankle2 and another regulatory subunit in the same complex.

      (3) Throughout the text, some conclusions are overstated or are not supported by data. Examples are below: 

      a. Page 1: "we show for the first time that Ankle2 is a regulatory subunit of PP2A"  The authors show binding and changes in BAF phosphorylation levels, but changes in PP2A activity with modulation of Ankle2 weren't shown. 

      We have replaced this phrase with this one:

      “…we provide several lines of evidence that suggest that Ankle2 is a regulatory subunit of PP2A…”

      b. Page 3: "The requirement for Ankle2 in the development of the central nervous system was initially discovered through its targeting by the microcephaly‐causing Zika virus (Shah et al.,

      2018)." 

      This is not the first paper showing ANKLE2 plays a role in the development of the CNS. Yamamoto et al., 2014 identified mutants in Ankle2 with defects in CNS development in flies and humans, establishing it as a human microcephaly‐causing gene. 

      We are sorry for this oversight. We have now cited this important work.

      c. Page 6: "Moreover, BAF appears to be the only obligatory substrate of Ankle2‐dependent dephosphorylation for cell proliferation as lowering the dose of the BAF kinase NHK‐1/Ballchen rescues wing development defects caused by the partial depletion of Ankle2 (Li et al., 2024)."  It is unclear why the authors conclude this since Ballchen/VRK1 can phosphorylate many things besides BAF. 

      Although the conclusion cannot be drawn categorically, it seems to be by far the most likely scenario. However, we agree that in principle, other mechanisms could also account for these genetic observations, such as the dephosphorylation of another, still unidentified obligatory substrate of PP2A‐Ankle2 that would also be phosphorylated by NHK‐1/Ballchen. However, we have also shown that expression of an unphosphorylatable mutant form of BAF rescues phenotypes observed upon loss of Ankle2 function (Li et al, 2024). We have changed our sentence as follows:

      "Moreover, BAF could be the only obligatory substrate of Ankle2‐dependent dephosphorylation for cell proliferation as lowering the dose of the BAF kinase NHK‐1/Ballchen or expression of an unphosphorylatable mutant form of BAF rescues wing development defects caused by the partial depletion of Ankle2 (Li et al., 2024).”

      d. Page 10: "These results suggest that a Vap33‐Ankle2‐PP2A complex can mediate the recruitment of a pool of PP2A at the NE."

      There is insufficient evidence to indicate that Vap33‐Ankle2‐PP2A exists in a stable state in the cell and that this complex mediates recruitment of PP2A at the NE. The images do not include Vap33, showing no evidence it is present when PP2A is at the NE and the complex could only be detected with overexpression. 

      We agree with this caveat and recognize the need to be cautious when proposing our model. In this regard, we feel that our wording is reasonable and appropriate, using “suggest” rather than “prove”, “show” or “indicate”.

      e. Page 11: These results suggest that the interaction of Ankle2 with PP2A is essential for its function in BAF dephosphorylation and nuclear reassembly." Page 14: "these results indicate that the interaction of Ankle2 with PP2A is essential during embryo". Page 14: "These results indicate that the interaction of Ankle2 with PP2A but not with Vap33 is essential for its function during cell proliferation in imaginal wing disc development." 

      These experiments show that the ankyrin repeat in Ankle2 is necessary for these processes. It does not say PP2A interaction with Ankle2 is necessary because other things could bind the domain. 

      We have revised the segments of the text mentioned, taking the reviewer’s legitimate concerns into consideration. We have also added the following sentence to the Discussion:

      “However, it remains formally possible that the deletion of Ankyrin repeats used to disrupt the Ankle2‐PP2A interaction abrogated another, unknown aspect of Ankle2 function.”

      f. Page 12: "Overall, we conclude that in addition to its N‐terminal PP2A‐interacting Ankyrin domain, Ankle2 requires the integrity of its C‐terminal portion for its essential function in nuclear reassembly." 

      No data was shown for differences in nuclear reassembly, only the ability for ANKLE2 truncation mutants to localize to the nuclear envelope. It isn't clear whether the nuclear envelope reformation is normal in Figure S6 which the authors refer to. Lamin staining could help determine and conclude the C‐terminal region is important for nuclear envelope reformation. 

      Our conclusion is drawn from the results shown in Figures S4 and S5 (described in the same section), where a rescue assay in cells was performed to assess the functionality of different variants of Ankle2‐GFP when endogenous Ankle2 was depleted. In this assay, Lamin and DNA staining were used to examine nuclear reassembly (as in Figure 5). Figure S6 shows the localizations of the different variants of Ankle2‐GFP, but endogenous Ankle2 is not depleted in these cells.

      g. Page 13: "We conclude that the ability of Ankle2 to interact with PP2A is required for the timely recruitment of BAF at reassembling nuclei and ensuing NE reassembly."

      It's possible the Ankyrin domain in ANKLE2 is interacting with proteins other than PP2A to recruit BAF at reassembling nuclei, especially since ANKLE2 is found to regulate VRK1 (Link 2019) which has been found to phosphorylate BAF during the cell cycle (Molitor 2014). Additionally, the images in Figure 6A appear to show fully reassembled nuclear envelopes in all mutants by 180s. 

      This point relates to point e, raised above by this reviewer. We have re‐written the sentence as follows:

      “We conclude that the Ankyrin domain, required for the ability of Ankle2 to interact with PP2A, is necessary for the timely recruitment of BAF at reassembling nuclei and ensuing NE reassembly.”

      Please note that in this paragraph, we discuss a delay in RFP‐BAF recruitment, rather than the complete elimination of this recruitment. 

      h. Page 16: "Our unbiased phosphoproteomic analysis confirmed that BAF dephosphorylation depends on Ankle2, despite the absence of a detectable interaction between Drosophila Ankle2 and BAF, which may be due to the lack of a LEM domain in the former (Fishburn et al., 2024). Moreover, while Ankle2 was shown to bind and inhibit the BAF counteracting kinase VRK1 in humans (Asencio et al., 2012), we detected no interaction between Ankle2 and NHK‐1/Ballchen (VRK1 ortholog) in Drosophila. This suggests that the loss of Ankle2 causes BAF hyperphosphorylation by preventing PP2A‐dependent dephosphorylation rather than by preventing inhibition of NHK‐1"

      There could be transient binding between Ankle2 and Ballchen/VRK1/NHK‐1 or activity can be indirect, but that doesn't mean there is not a contribution of BAF phosphorylation by Ballchen/VRK1/NHK‐1. Genetic evidence from three model systems, including Drosophila, indicates there is a strong genetic interaction between Ankle2 and Ballchen/VRK1/NHK‐1 that includes rescue of lethality.

      We agree and we have re‐written in this way:

      “While a putative interaction between Ankle2 and NHK‐1 in Drosophila could occur transiently, thereby escaping detection, the simplest interpretation of our results is that the loss of Ankle2 causes BAF hyperphosphorylation by preventing PP2A‐dependent dephosphorylation rather than by preventing inhibition of NHK‐1.”

      We do not question the fact that Ballchen/VRK1/NHK‐1 phosphorylates BAF and genetically interacts with Ankle2. The antagonistic relationship between Ballchen/VRK1/NHK‐1 and Ankle2 observed genetically can be explained by the fact that the kinase phosphorylates BAF while PP2AAnkle2 dephosphorylates it, without the need to invoke an additional inhibition of the kinase by Ankle2.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The hypothesis is based on the idea that inversions capture genetic variants that have antagonistic effects on male sexual success (via some display traits) and survival of females (or both sexes) until reproduction. Furthermore, a sufficiently skewed distribution of male sexual success will tend to generate synergistic epistasis for male fitness even if the individual loci contribute to sexually selected traits in an additive way. This should favor inversions that keep these male-beneficial alleles at different loci together at a cis-LD. A series of simulations are presented and show that the scenario works at least under some conditions. While a polymorphism at a single locus with large antagonistic effects can be maintained for a certain range of parameters, a second such variant with somewhat smaller effects tends to be lost unless closely linked. It becomes much more likely for genomically distant variants that add to the antagonism to spread if they get trapped in an inversion; the model predicts this should drive accumulation of sexually antagonistic variants on the inversion versus standard haplotype, leading to the evolution of haplotypes with very strong cumulative antagonistic pleiotropic effects. This idea has some analogies with one of predominant hypotheses for the evolution of sex chromosomes, and the authors discuss these similarities. The model is quite specific, but the basic idea is intuitive and thus should be robust to the details of model assumption. It makes perfect sense in the context of the geographic pattern of inversion frequencies. One prediction of the models (notably that leads to the evolution of nearly homozygously lethal haplotypes) does not seem to reflect the reality of chromosomal inversions in Drosophila, as the authors carefully discuss, but it is the case of some other "supergenes", notably in ants. So the theoretical part is a strong novel contribution.

      We appreciate the detailed and accurate summary of our main theoretic results.

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      As I have argued in my comments on previous versions, the experiment only addresses one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on male reproductive success and other fitness components, in particular of females. Furthermore, the design of this experiment is not ideal from the viewpoint of the biological hypothesis it is aiming to test. This is in part because, rather than testing for the effects of inversion on male reproductive success versus the key fitness components of survival to maturity and female reproductive output, it looks at the effects on male reproductive success versus survival to a rather old age of 2 months. The relevance of survival until old age to fitness under natural conditions is unclear, as the authors now acknowledge. Furthermore, up to 15% of males that may have contributed to the next generation did not survive until genotyping, and thus the difference between these males' inversion frequency and that in their offspring may be confounded by this potential survival-based sampling bias. The experiment does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing for synergistic epistasis would be exceedingly difficult, and the authors have now included a discussion of the above caveats and limitations, making their conclusions more tentative. This is good but of course does not make these limitations of the experiment go away. These limitations mean that the paper is stronger as a theoretical than as an empirical contribution.

      We discuss the choice to focus on exploring the potential antagonistic effects of the inversion karyotype on male reproductive success and survival in our general response above. Primarily, this prediction seemed to be the most specific to the proposed model as compared to other alternate models. Still, further studies are clearly needed to elucidate the potential frequency dependence and genetic architecture of the inversions.

      Regarding the choice of age at collection, it is unknown to what degree our selected collection age of 10 weeks correlates with survival in the wild, but we feel confident that there will be some positive correlation.

      We now further clarify that across our experiments, a minimum of 5% and a mean of 9% of the males used in the parental generation died before collection. These proportions do not appear sufficient to explain the differences between paternal and embryo inversion frequencies shown in Figure 9.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript the authors address the question whether the inversion polymorphism in D. melanogaster can be explained by sexually antagonistic selection. They designed a new simulation tool to perform computer simulations, which confirmed their hypothesis. They also show a tradeoff between male reproduction and survival. Furthermore, some inversions display sex-specific survival.

      Strengths:

      It is an interesting idea on how chromosomal inversions may be maintained

      Weaknesses:

      The authors motivate their study by the observation that inversions are maintained in D. melanogaster and because inversions are more frequent closer to the equator, the authors conclude that it is unlikely that the inversion contributes to adaptation in more stressful environments. Rather the inversion seems to be more common in habitats that are closer to the native environment of ancestral Drosophila populations.

      While I do agree with the authors that this observation is interesting, I do not think that it rules out a role in local adaptation. After all, the inversion is common in Africa, so it is perfectly conceivable that the non-inverted chromosome may have acquired a mutation contributing to the novel environment.

      Based on their hypothesis, the authors propose an alternative strategy, which could maintain the inversion in a population. They perform some computer simulations, which are in line with the predicted behavior. Finally, the authors perform experiments and interpret the results as empirical evidence for their hypothesis. While the reviewer is not fully convinced about the empirical support, the key problem is that the proposed model does not explain the patterns of clinal variation observed for inversions in D. melanogaster. According to the proposed model, the inversions should have a similar frequency along latitudinal clines. So in essence, the authors develop a complicated theory because they felt that the current models do not explain the patterns of clinal variation, but this model also fails to explain the pattern of clinal variation.

      To the contrary – in the Discussion paragraph beginning on Line 671, we explain why we would predict that a tradeoff between survival and reproduction should lead to clinal inversion frequencies. We suggest that a karyotype associated with a survival penalty should be increasingly disadvantageous in more challenging environments (such as high altitudes and latitudes for this species). Furthermore, an advantage in male reproductive competition conferred by that same haplotype may be reduced by the lower population densities that we would expect in more challenging environments (meaning that each female should encounter fewer males). Individually or jointly, these two factors predict that the equilibrium frequency of a balanced inversion frequency polymorphism should depend on a local population’s environmental harshness and population density, with the ensuing prediction that inversion frequency should correlate with certain environmental variables.

      Reviewer #3 (Public review):

      Summary:

      In this study, McAllester and Pool develop a new model to explain the maintenance of balanced inversion polymorphism, based on (sexually) antagonistic alleles and a trade-off between male reproduction and survival (in females or both sexes). Simulations of this model support the plausibility of this mechanism. In addition, the authors use experiments on four naturally occurring inversion polymorphisms in D. melanogaster and find tentative evidence for one aspect of their theoretical model, namely the existence of the above-mentioned trade-off in two out of the four inversions.

      Strengths:

      (1) The study develops and analyzes a new (Drosophila melanogaster-inspired) model for the maintenance of balanced inversion polymorphism, combining elements of (sexually) antagonistically (pleiotropic) alleles, negative frequency-dependent selection and synergistic epistasis. Simulations of the model suggest that the hypothesized mechanism might be plausible.

      (2) The above-mentioned model assumes, as a specific example, a trade-off between male reproductive display and survival; in the second part of their study, the authors perform laboratory experiments on four common D. melanogaster inversions to study whether these polymorphisms may be subject to such a trade-off. The authors observe that two of the four inversions show suggestive evidence that is consistent with a trade-off between male reproduction and survival.

      Open issues:

      (1) A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would thus be important and interesting, as the authors mention, to fill this gap in future work.

      (2) It will also be important to further explore and corroborate the potential importance and generality of trade-offs between different fitness components in maintaining inversion polymorphisms in future work.

      We appreciate the work put in to evaluating, improving, and summarizing our study. We agree that further work studying the effects of dominance and of the fitness components of the inversions is important.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 354 : I don't understand what the authors mean by "an antagonistic and non-antagonistic allele". If there is a antagonistic polymorphism at a locus, then both alleles have antagonistic effects; i.e., allele B increases trait 1 and reduced trait 2 relative to allele A and vice versa.

      Edited, agreed that the terminology used here was sub-optimal.

      Reviewer #2 (Recommendations for the authors):

      The motivation for their model is their claim that the clinal inversion frequencies are not compatible with local adaptation. The reviewer doubts this strong statement. Furthermore, the proposed model also fails to explain the inversion frequencies in natural populations.

      Hence, rather than building a straw man, it would be better if the authors first show their experiments and then present their model as an explanation for the empirical results. Nevertheless, it is also clear that the empirical data are not very strong and cannot be fully explained by the proposed model.

      This claim that we reject any role of local adaptation in clinal variation and selection upon inversion polymorphism does not hold up in a reading of our manuscript. We even suggest that locally varying selective pressures must be playing some role, although that does not imply that local adaptation is the ultimate driver of inversion frequencies. Indeed, we suggest that local adaptation alone is an insufficient explanation for inversion frequency clines in D. melanogaster, including because (1) these frequency clines do not approach the alternate fixed genotypes predicted by local directional selection, (2) these derived inversions tend to be more frequent in more ancestral environments (l.113-158).

      In our public review response above, and in the Discussion section of our paper, we explain why our model can predict both the clinal frequencies of many Drosophila inversions and their intermediate maximal frequencies. Of course, we do not predict that most inversions in this species should follow the specific tradeoff investigated here. In fact, we were surprised to find even two inversions that experimentally supported our predicted tradeoff. Still, it remains possible that other inversions in this species are subject to other balanced tradeoffs not investigated here, which could help explain why they rarely reach high local frequencies.

      Reviewer #3 (Recommendations for the authors):

      My previous comments have been adequately addressed.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […]

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      There are several reasons why the support from these data for the proposed theory is not waterproof.

      (1) As I have already pointed out in my previous review, survival until 2 months (in fact, it is 10 weeks and so 2.3 months) of age is of little direct relevance to fitness, whether under natural conditions or under typical lab conditions.

      The authors argue this objection away with two arguments

      First, citing Pool (2015) they claim that the average generation time (i.e. the average age at which flies reproduce) in nature is 24 days. That paper made an estimate of 14.7 generations per year under the North Carolina climate. As also stated in Pool (2015), the conditions in that locality for Drosophila reproduction and development are not suitable during three months of the year. This yields an average generation length of about 19.5 days during the 9 months during which the flies can reproduce. On the highly nutritional food used in the lab and at the optimal temperature of 25 C, Drosophila need about 11-12 days to develop from egg to adult. Even assuming these perfect conditions, the average age (counted from adult eclosion) would be about 8 days. In practice, larval development in nature is likely longer for nutritional and temperature reasons, and thus the genomic data analyzed by Pool imply that the average adult age of reproducing flies in nature would be about 5 days, and not 24 days, and even less 10 weeks. This corresponds neatly to the 2-6 days median life expectancy of Drosophila adults in the field based on capture-recapture (e.g., Rosewell and Shorrocks 1987).

      Second, the authors also claim that survival over a period of 2 month is highly relevant because flies have to survive long periods where reproduction is not possible. However, to survive the winter flies enter a reproductive diapause, which involves profound physiological changes that indeed allow them to survive for months, remaining mostly inactive, stress resistant and hidden from predators. Flies in the authors' experiment were not diapausing, given that they were given plentiful food and kept warm. It is still possible that survival to the ripe old age of 10 weeks under these conditions still correlates well with surviving diapause under harsh conditions, but if so, the authors should cite relevant data. Even then, I do not think this allows the authors to conclude that longevity is "the main selective pressure" on Drosophila (l. 936).

      This is overall a thoughtfully presented critique and we have endeavored to improve our discussion of Pool (2015) and to clarify some of the language used about survival elsewhere. While we agree that challenges other than survival to 10 weeks are very relevant to Drosophila melanogaster, collection at 10 weeks does encompass some of these other challenges. Egg to adult viability still contributes to the frequencies of the inversions at collection and is not separable from longevity in this data. Collection at longevity was chosen in part to encompass all lifetime fitness challenges that might influence the inversion frequency at collection, albeit still within permissive laboratory conditions. Future experiments exploring specific stressors independently and beyond permissive lab conditions would generate a clearer picture.

      In addition to general edits, the specific phrase mentioned at 1. 936 [now line 1003] has been revised from “In many such cases females are in reproductive diapause, and so longevity is the main selective pressure.” to “While longevity is a key selective pressure underlying overwintering, the relationship between longevity in permissive lab conditions without diapause and in natural conditions under diapause is unclear (Schmidt et al. 2005; Flatt 2020), and our experiment represents just one of many possible ways to examine tradeoffs involving survival.”

      (2) It appears that the "parental" (in fact, paternal) inversion frequency was estimated by sequencing sires that survived until the end of the two-week mating period. No information is provided on male mortality during the mating period, but substantial mortality is likely given constant courtship and mating opportunities. If so, the difference between the parental and embryo inversion frequency could reflect the differential survival of males until the point of sampling rather than / in addition to sexual selection.

      We have further clarified that when referenced as parental frequency, the frequency presented is ½ the paternal frequency as the mothers were homokaryotypic for the standard arrangement. We chose to present both due to considerations in representing the frequency change from paternal to embryo frequencies, where a hypothetical change from 0.20 frequency in fathers to 0.15 frequency in embryos represents a selective benefit (a frequency increase in the population), despite the reality that this is a decrease in allele frequency between paternal and embryo cohorts.

      We mentioned a maximum 15% paternal mortality at line 827 [now l.1056], but have now added complete data on the counts of flies in the experiment as a supplemental table (Table S1) and have added or corrected further references to this in the results and methods [lines 555, 638, 975]. It is true that this may influence the observed frequency changes to some degree, and while we adjusted our sampling method to account for the effects of this mortality on statistical power [l.1056ff], we have now edited the manuscript to better highlight potential effects of this phenomenon on the recorded frequency changes.

      It is also worth noting that, if mortality among fathers over the mating period is codirectional with mortality among aged offspring, this would bias the results against detecting an opposing antagonistic selective effect of the inversions on paternity share. This is now also mentioned in the manuscript, l.639ff.

      (3) Finally, irrespective of the above caveats, the experimental data only address one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on reproduction and survival, notably that of females. It does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing the latter prediction would be exceedingly difficult. Nonetheless, these limitations of the experiment mean that the paper is much stronger theoretical than empirical contribution.

      This is a fair criticism of the limitations of our results, and we now summarize such caveats more directly in the discussion summary, lines 876ff.

      Reviewer #2 (Public Review): 

      […]

      Comments on the latest version:

      I would like to give an example of the confusing terminology of the authors:

      "Additionally, fitness conveyed by an allele favoring display quality is also frequency-dependent: since mating success depends on the display qualities of other males, the relative advantage of a display trait will be diminished as more males carry it..."

      I do not understand the difference to an advantageous allele, as it increases in frequency the frequency increase of this allele decreases, but this has nothing to do with frequency dependent selection. In my opinion, the authors re-define frequency dependent selection, as for frequency dependent selection needs to change with frequency, but from their verbal description this is not clear.

      We have edited this text for greater clarity, now line 232ff. We did not seek to redefine frequency dependence, and did mean by “the relative advantage of a display trait will be diminished” that an equivalent s would diminish with frequency. We have now remedied terminological issues introduced in the prior revision with regard to frequency dependent selection.

      One example of how challenging the style of the manuscript is comes from their description of the DNA extraction procedure. In principle a straightforward method, but even here the authors provide a convoluted uninformative description of the procedure.

      We have edited for clarity the text on lines 1016-1020. Citing a published protocol and mentioning our modifications seems an appropriate trade-off between representing what was done accurately, citing the sources we relied on in doing it, and limiting the volume of information in the main text for such a straightforward and common method. 

      It is not apparent to the reviewer why the authors have not invested more effort to make their manuscript digestible.

      We have invested a great deal of effort in making this manuscript as clear as we are able to.  We regret that our writing has not been to this reviewer’s liking. We believe we have been highly responsive to all specific criticisms, including revising all passages cited as unclear. In this round, we have again scrutinized the entire manuscript for any opportunity to clarify it, and we have made further changes throughout.  Although our subject matter is conceptually nuanced, we nevertheless remain optimistic that a careful, fresh reading of our revised manuscript would yield a more favorable impression.

      Reviewer #3 (Public Review):

      […]

      Weaknesses:

      A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would be important and interesting to fill this gap in future work.

      Agreed, and now reinforced at lines 892ff.

      Comments on the latest version:

      Most of the comments which I have made in my public review have been adequately addressed.

      Some of the writing still seems somewhat verbose and perhaps not yet maximally succinct; some additional line-by-line polishing might still be helpful at this stage in terms of further improving clarity and flow (for the authors to consider and decide).

      We have made further changes and some polishing in this draft, and greatly appreciate the guidance provided in improving the draft so far. 

      Reviewer #1 (Recommendations For The Authors):

      (1) While the model results are convincing, some of the verbal interpretation is confusing. In particular, the authors state that in their model the allele favoring male display quality shows a negative frequency dependence whereas the alternative allele has a positive frequency dependence. This does not make sense to me in the context of population genetics theory. For a one-locus, two-allele model the change of allele frequency under selection depends on the fitness of the genotypes concerned relative to each other. Thus, at least under no dominance assumed in this model, if the relative fitness of AA decreases with the frequency of allele A, the relative fitness of aa must decrease with the frequency of allele a. I.e., if selection is negatively frequency dependent, then it is so for both alleles.

      This phrasing was wrong, and we have edited the relevant section.

      (2) I am still not entirely sure that the synergistic epistasis assumed in the verbal model is actually generated in the simulations; this would be easy enough to check by extracting the mating success of males with different genotypes from the simulation output should be reported, e.g., as a figure supplement.

      Our new Figure S2, which depicts haplotype frequencies for a set of the simulations presented in Figure 4, should demonstrate a necessary presence of synergistic epistasis. These results further clarify that the weaker allele B is only kept when linked to A. The same fitness classes of genotype are present in the simulations with and without the inversion, so the only mechanical difference is the rate of recombination, and the only way this might change selection on the alleles is if a variant has a different fitness in one haplotype background than another – i.e. epistasis. The maintenance of haplotypes AB and ab to the exclusion of Ab and aB relies on the lesser relative fitness of Ab and aB. And since survival values are multiplicative, this additional contribution must come from the mate success of AB being disproportionately larger than Ab or aB, indicating the emergent synergistic epistasis posited by our model. We have clarified this point in the text at line 363ff.

      (3) l. 318ff: What was this set number of males? I could not find this information anywhere. Also, this model of the mating system is commonly referred to as "best of N", so the authors may want to include this label in the description.

      We indicate this detail just after the referenced line, now reworded and on l. 338-340 as “For each female’s mating competition, 100 males were sampled, though see Figure S1 for plots with varying encounter number.”  Among these edits, “one hundred” has been changed to a numeral for easier skimming, and Figure S1 is now referenced here earlier in the text. Several edits have also been made in the caption of Figures 2 and 3, and in the relevant methods section to clarify the number of encountered males simulated, mention best of N terminology, and clarify how the quality score is used in the mate competition.

      (4) The description of the experiment is still confusing. The number of individuals of each sex entered in each mating cage is missing from the Methods (l. 914); although I did finally find it in the Results. These flies were laying over 2 weeks - does this mean that offspring from the entire period were used to obtain the embryo and aged offspring frequencies, or only from a particular egg collection? If the former, does this mean that the offspring obtained from different egg batches were aged separately? Were the offspring aged in cages or bottles, at what density? Given that only those males that survived until the end of the two-week mating period were sequenced, it is important to know what % of the initial number of males these survivors were. A substantial mortality of the parental males could bias the estimate of parental frequencies. How many parental males, embryos and aged offspring were sequenced? Were all individuals of a given cage and stage extracted and sequenced as a single pool or were there multiple pools? The description could also be structured better. For example, the food and grape agar recipes and cage construction are inserted at random points of the description of the crossing design, which does not help.

      We have now reorganized and edited these portions of the Methods text. Portions of this comment overlap with edits responding to (2) of the Public Review and below for l. 921 in Details. Offspring from different laying periods were aged in different bottles, further separated by the time at which they eclosed. They were then pooled for DNA extraction and library preparation by sex and a binary early or late eclosion time. This data was present in the “D. mel. Sample Size” column of supplemental tables S6 and S7 (now S7 and S8), but we have added and referenced a new table to specifically collate the sample sizes of different experimental stages, table S1. Now referenced at lines 555, 638, 975, 1057.

      (5) The caption of figure 9 and the discussion of its results should be clear and explicit about the fact that "adult offspring" in Fig 9A and "female" and "male" refers to adults surviving to old age (whereas "parental" in Fig 9A refers to young adults in their reproductive prime. This has consequences for the interpretation of the difference between "parental" and "adult offspring", as it combines one generation of usual selection as it occurs under the conditions of the lab culture (young adult at generation t -> young adult in generation t+1) with an additional step of selection for longevity. Thus, a marked change in allele frequency does not imply that the "parental" frequency does not represent an equilibrium frequency of the inversions under the lab culture conditions. Furthermore, it would be useful to state explicitly that Figure 9B represents the same results as figure 9A, but with the aged offspring split by sex.

      Figure caption edited to provide further clarity on the age of cohorts and presented data, along with the relevant results section (2.3) referencing this figure.

      We avoid making any statements about the equilibrium frequencies of inversions under lab conditions, and whether or not any step of our experiment reflects such equilibria, because our investigation does not rely upon or test for such conditions. Instead, our analysis focuses on whether inversions have contrasting effects (as indicated by frequency changes that are incompatible with neutral sampling) between different life history components.  Under our model, such frequency reversals might be detectable both at equilibrium balanced inversion frequencies and also at frequencies some distance away from equilibria. We have now clarified this point at l. 970-972.

      Details:

      l. 211: this should be modified as male-only costs are now included.

      Edited. “survival likelihood (of either or both sexes).”

      l. 343: misplaced period

      Edited.

      l. 814: "We confirmed model predictions...": This sounds like it refers to an empirical confirmation of a theory prediction, but I think the authors just want to say that their simulations predicted antagonistic variants can be maintained at an intermediate equilibrium frequency. So the wording should be changed to avoid ambiguity.

      Edited. Now line 869.

      l. 853: How can a genome be "empty"? Do the authors mean an absence of any polymorphism?

      Edited to: “In SAIsim, a population is instantiated as a python object, and populated with individuals which are also represented by python objects. These individuals may be instantiated using genomes specified by the user, or by default carry no genomic variation.” Lines 913ff.

      l. 853: I do not see this diagramed in Figure 5

      Apologies, fixed to Fig. 2

      l. 864: is crossing-over in the model limited to female gametogenesis (reflecting the Drosophila case) or does it occur in both sexes?

      There is a variable in the simulator to make crossover female-specific. All simulations were performed with female-only crossover. Edited for clarity. “While the simulator can allow recombination in both sexes, all simulations presented only generate crossovers and gene conversion events for female gametes, in accordance with the biology of D. melanogaster.” Lines 928-929.

      l. 906: "F2" is ambiguous; does this mean that the mix of lines was allowed to breed for two generations? Also, in other places in the manuscript these flies appear to be referred to are "parental". So do not use F2.

      Edited, F2 language removed and replaced with being allowed to breed for two generations. Now lines 967ff.

      l. 910: this is incorrect/imprecise; what can be inferred is the frequency of the inversions in male gametes that contributed to fertilization. This would correspond to the frequency in successful males only if each successful male genotype had the same paternity share.

      Edited, now “Since no inversions could be inherited through the mothers, inversion frequencies among successful male gametes could be inferred from their pooled offspring.” Now line 994.

      l. 912: "without a controlled day/night cycle" meaning what? Constant light? Constant darkness? Daylight falling through the windows?

      Edited to “Unless otherwise noted, all flies were kept in a lab space of 23°C with around a degree of temperature fluctuation and without a controlled day/night cycle. Light exposure was dependent on the varying use of the space by laboratory workers but amounted to near constant exposure to at least a minimal level of lighting, with some variable light due to indirect lighting from adjacent rooms with exterior windows.” Now lines 1007-1010.

      l. 921: I cannot parse this sentence. Were the offspring isolated as virgins?

      No, the logistics of collecting virgins would have been prohibitive, and it did not seem essential for our experiment. Hopefully the edits to this section are clearer, now lines 978ff.

    2. eLife Assessment

      This study proposes a new model that could solve some long-standing puzzles about inversion polymorphisms in Drosophila melanogaster by invoking sexually antagonism and negative frequency-dependent selection. While the idea developed here is a valuable contribution to the field, the experiment only addresses one element of the hypothesis, so that the empirical evidence in support of the model remains incomplete.

    3. Reviewer #1 (Public review):

      The hypothesis is based on the idea that inversions capture genetic variants that have antagonistic effects on male sexual success (via some display traits) and survival of females (or both sexes) until reproduction. Furthermore, a sufficiently skewed distribution of male sexual success will tend to generate synergistic epistasis for male fitness even if the individual loci contribute to sexually selected traits in an additive way. This should favor inversions that keep these male-beneficial alleles at different loci together at a cis-LD. A series of simulations are presented and show that the scenario works at least under some conditions. While a polymorphism at a single locus with large antagonistic effects can be maintained for a certain range of parameters, a second such variant with somewhat smaller effects tends to be lost unless closely linked. It becomes much more likely for genomically distant variants that add to the antagonism to spread if they get trapped in an inversion; the model predicts this should drive accumulation of sexually antagonistic variants on the inversion versus standard haplotype, leading to the evolution of haplotypes with very strong cumulative antagonistic pleiotropic effects. This idea has some analogies with one of predominant hypotheses for the evolution of sex chromosomes, and the authors discuss these similarities. The model is quite specific, but the basic idea is intuitive and thus should be robust to the details of model assumption. It makes perfect sense in the context of the geographic pattern of inversion frequencies. One prediction of the models (notably that leads to the evolution of nearly homozygously lethal haplotypes) does not seem to reflect the reality of chromosomal inversions in Drosophila, as the authors carefully discuss, but it is the case of some other "supergenes", notably in ants. So the theoretical part is a strong novel contribution,

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      As I have argued in my comments on previous versions, the experiment only addresses one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on male reproductive success and other fitness components, in particular of females. Furthermore, the design of this experiment is not ideal from the viewpoint of the biological hypothesis it is aiming to test. This is in part because, rather than testing for the effects of inversion on male reproductive success versus the key fitness components of survival to maturity and female reproductive output, it looks at the effects on male reproductive success versus survival to a rather old age of 2 months. The relevance of survival until old age to fitness under natural conditions is unclear, as the authors now acknowledge. Furthermore, up to 15% of males that may have contributed to the next generation did not survive until genotyping, and thus the difference between these males' inversion frequency and that in their offspring may be confounded by this potential survival-based sampling bias. The experiment does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing for synergistic epistasis would be exceedingly difficult, and the authors have now included a discussion of the above caveats and limitations, making their conclusions more tentative. This is good but of course does not make these limitations of the experiment go away. These limitations mean that the paper is stronger as a theoretical than as an empirical contribution.

    4. Reviewer #2 (Public review):

      Summary:

      In their manuscript the authors address the question whether the inversion polymorphism in D. melanogaster can be explained by sexually antagonistic selection. They designed a new simulation tool to perform computer simulations, which confirmed their hypothesis. They also show a tradeoff between male reproduction and survival. Furthermore, some inversions display sex-specific survival.

      Strengths:

      It is an interesting idea on how chromosomal inversions may be maintained

      Weaknesses:

      The authors motivate their study by the observation that inversions are maintained in D. melanogaster and because inversions are more frequent closer to the equator, the authors conclude that it is unlikely that the inversion contributes to adaptation in more stressful environments. Rather the inversion seems to be more common in habitats that are closer to the native environment of ancestral Drosophila populations.<br /> While I do agree with the authors that this observation is interesting, I do not think that it rules out a role in local adaptation. After all, the inversion is common in Africa, so it is perfectly conceivable that the non-inverted chromosome may have acquired a mutation contributing to the novel environment.

      Based on their hypothesis, the authors propose an alternative strategy, which could maintain the inversion in a population. They perform some computer simulations, which are in line with the predicted behavior. Finally, the authors perform experiments and interpret the results as empirical evidence for their hypothesis. While the reviewer is not fully convinced about the empirical support, the key problem is that the proposed model does not explain the patterns of clinal variation observed for inversions in D. melanogaster. According to the proposed model, the inversions should have a similar frequency along latitudinal clines. So in essence, the authors develop a complicated theory because they felt that the current models do not explain the patterns of clinal variation, but this model also fails to explain the pattern of clinal variation.

    5. Reviewer #3 (Public review):

      Summary:

      In this study, McAllester and Pool develop a new model to explain the maintenance of balanced inversion polymorphism, based on (sexually) antagonistic alleles and a trade-off between male reproduction and survival (in females or both sexes). Simulations of this model support the plausibility of this mechanism. In addition, the authors use experiments on four naturally occurring inversion polymorphisms in D. melanogaster and find tentative evidence for one aspect of their theoretical model, namely the existence of the above-mentioned trade-off in two out of the four inversions.

      Strengths:

      (1) The study develops and analyzes a new (Drosophila melanogaster-inspired) model for the maintenance of balanced inversion polymorphism, combining elements of (sexually) antagonistically (pleiotropic) alleles, negative frequency-dependent selection and synergistic epistasis. Simulations of the model suggest that the hypothesized mechanism might be plausible.

      (2) The above-mentioned model assumes, as a specific example, a trade-off between male reproductive display and survival; in the second part of their study, the authors perform laboratory experiments on four common D. melanogaster inversions to study whether these polymorphisms may be subject to such a trade-off. The authors observe that two of the four inversions show suggestive evidence that is consistent with a trade-off between male reproduction and survival.

      Open issues:

      (1) A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would thus be important and interesting, as the authors mention, to fill this gap in future work,

      (2) It will also be important to further explore and corroborate the potential importance and generality of trade-offs between different fitness components in maintaining inversion polymorphisms in future work.

    1. eLife Assessment

      This valuable work provides novel insights into the substrate binding mechanism of a tripartite ATP-independent periplasmic (TRAP) transporter, which may be helpful for the development of specific inhibitors. The structural analysis is convincing, but additional work will be required to establish the transport mechanism as well as well as binding sites for all ligands. This study will be of interest to the membrane transport and bacterial biochemistry communities.

    2. Reviewer #3 (Public review):

      The manuscript by Goyal et al report substrate-bound and substrate-free structures of a tripartite ATP independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites, and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      Strengths:

      The structures are of good quality, the presentation of the structural data has improved, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism.

      Weaknesses:

      Although the possibility of the third metal site is compelling, I do not feel it is appropriate to model in a publicly deposited PDB structure without directly confirming experimentally. The authors do not extensively test the binding sites due to technical limitations of producing relevant mutants; however, their model is consistent with genetic assays of previously characterized orthologs, which will be of benefit to the field.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths: 

      The main strength of this work is the capture of the substrate bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses: 

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only test 2 residues for their involvement in substrate interactions, which is quite limited. However, comparison with previous mutagenesis studies on homologues supports the location of the Neu5Ac binding site. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not sufficiently experimentally tested for its contribution to Na+ dependent transport. This lack of experimental validation prevents the authors from unequivocally assigning this site as a Na+ binding site. However, the reporting of these new data is important as it will facilitate follow up studies by the authors or other researchers. 

      Comments on revisions: 

      Overall, the authors have done a good job of addressing the reviewers' comments. It's good to know that the authors are working on the characterisation of the potential metal binding site mutants - characterizing just a few of these will provide much-needed experimental support for this potential Na+ site. 

      The new MD simulations provide additional support for the new Na+ site and could be included.

      However, as the authors know, direct experimental characterisation of mutants is the ideal evidence of the Na+ site.

      Aside from the characterisation of mutants, which seems to be held up by technical issues, the only remaining issue is the comparison of the Na+- and Na+/Neu5Ac-bound states with ASCT2. It still does not make sense to me why the authors are not directly comparing their Na+ only and Na+/Neu5Ac states with the structures of VcINDY in the Na+-only and Na+/succinate bound states. These VcINDY structures also revealed no conformational changes in the HP loops upon binding succinate, as the authors see for SiaQM. Therefore, this comparison is very supportive. It is understood that the similarity to the DASS structure is mentioned on p.17, but it is also interesting and useful to note that TRAP and DASS transporters also share a lack of substrateinduced local conformational changes, to the extent these things have been measured.

      We acknowledge the summary weakness that experimental data to support the third Na binding site is critical.

      Based on the reviewer’s suggestion, we added the following in the main text and a supplementary figure comparing the Na ion binding sites between VcINDY and SiaQM. Page 13.

      “These two sodium ion binding sites are also conserved in the structure of VcINDY (Supplementary Figure 7) (Sauer et al., 2022). In both cases, the sodium ions are bound at the helix-loop-helix ends of HP1 and HP2. The binding sites utilize both side chains and main chain carbonyl groups. The number of main chain carbonyl interactions suggests that they are critical, and using main chain rather than side chain interactions minimizes the likelihood of point mutations affecting the binding.”

      Reviewer #3 (Public review): 

      The manuscript by Goyal et al report substrate-bound and substrate-free structures of a tripartite ATP independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism.

      Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites, and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      Strengths: 

      The structures are of good quality, the presentation of the structural data has improved, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism.

      Weaknesses: 

      Although the possibility of the third metal site is compelling, I do not feel it is appropriate to model in a publicly deposited PDB structure without directly confirming experimentally. The authors do not extensively test the binding sites due to technical limitations of producing relevant mutants; however, their model is consistent with genetic assays of previously characterized orthologs, which will be of benefit to the field. Finally, some clarifications of EM processing would be useful to readers, and it would be nice to have a figure visualizing the unmodeled lipid densities - this would be important to contextualize to their proposed mechanism.

      Reviewer #3 (Recommendations for the authors): 

      I appreciate the authors' responses to our critiques; the revised manuscript is much improved and has addressed most of my concerns. I look forward to seeing their follow up experiments testing mutational e=ects. I think MD simulations of ion-binding sites on their own are supportive but by themselves not su=icient to prove the existence of a functional Na+-binding site. Some clarifications in the methods/supplements would satisfy my concerns about data processing and analysis.

      - Unliganded map: were the 141,272 particles used for one class of ab initio? This is unusual, usually multiple ab initio classes are used to further eliminate junk particles. The authors themselves use 6 classes for the substrate-bound dataset.

      We classified the particles into multiple 3-D classes.  There was no improvement in statistics or maps on splitting these further.  Hence, we did not pursue that further. 

      - Substrate-bound map: how did the four 'identical' classes independently refine? Are similar Na+/substate densities found in each separate class?

      The other classes refined to worse than 4.5 Å resolution. We stopped characterizing them past that point.  We were hoping to see multiple conformations that are diLerent – and hopefully a class where only two sodium ions could be bound.  However, any interpretation at 4.5 Å would be unreliable.

      - Both maps: all ab initio classes prior to final refinement should be displayed in the supplementary workflow, this is common for EM processing diagrams.

      We agree it is common – however, unless there is a good reason to discuss the other classes, we are not convinced of the value of crowding the figures.

      - What specific refinement package and version of Phenix are the authors using? It seems unusual that it is not possible to refine without a metal in Phenix real-space refinement, I have seen many structures where there is no issue refining without critical ions/waters. The authors should double check that they are using the appropriate scattering table for cryo-EM, which should be "electron".

      Sorry for the confusion – we did not mean to say we cannot refine without a metal. If we want to add something to the density, we cannot refine it without suggesting a metal or solvent.  The site without anything added will refine without any issues but in the absence of additional verification, we cannot be sure of the identity of the ions. We are confident of the metal binding site – but not confident of the exact metal bound.  We used Sodium as our first hypothesis.

      We don’t think the scattering factors will help in the identification of the ions. Servalcat as part of CCP-EM can produce diLerence maps and we believe that for identification of ions, it will require higher resolution (<2.5 Å) but at this resolution, we can say that there is a nonprotein density but not more than that. We were using “electron” (which we believe is default with phenix.real_space_refine). The refinement was performed using standard protocols and appropriate scattering factors (Phenix version 1.19x), and we have previously used similar refinement protocols for other maps/models (Example -Vinothkumar KR, Arya CK, Ramanathan G, Subramanian R. 2021. Comparison of CryoEM and X-ray structures of dimethylformamidase. Progress in Biophysics and Molecular Biology, CryoEM microscopy developments and their biological applications 160:66–78. doi:10.1016/j.pbiomolbio.2020.06.008).

      To convince the reviewer of the quality of the maps, we have added figures that show the model-to-map fit of all of the main secondary structural elements in both the unliganded and the Neu5Ac bound forms.

      - I certainly understand the authors' reluctance to not model the entirety of protein densities; however, I think it would be useful to highlight these densities in the global context of the protein. A common way to show this is to show the density proximal to protein chains in one color, and the remaining densities in a contrasting color (Figure 1 somewhat demonstrates this but it is di=icult to tell). I think this would be a nice figure to show the presence and location of unmodeled densities.

      We have modified supplementary figure 3 to include unmodelled densities in panels G and H for both structures.

      - Small detail, "uniform" is misspelled as "unifrom" in supplementary Figure 3. 

      Thank you.  Corrected.

    1. eLife assessment

      The article has important scientific merit in the field of cardiovascular research and other fields where the design and rigor of scientific experiments is key for translation of preclinical research to clinical studies. This study holds convincing evidence that sheds light on the lack of progress in this area over the past decade, despite a substantial body of existing research. Although there is a need to re-evaluate the statistical test used, the descriptive paper outcomes serves as a strong call to action for the wider scientific community.

    1. eLife Assessment

      This valuable work presents an interesting strategy to interfere with the HBV infectious cycle as it identifies two previously unexplored HBc-Ag binding pockets. The experimental data is compelling and opens the door to generating and testing novel anti-HBV therapies.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors present an interesting strategy to interfere with the HBV life cycle: the preparation of geranyl and peptides' dimers that could impede the correct assembly of hepatitis B core protein HBc into viable capsids. These dimers are of different nature, depending on the HBc site the authors plan to target. A preliminary study with geranyl dimers (targeting a hydrophobic site of HBc) was first investigated. The second series deals with peptide-PEG linker-peptide dimers, targeting the tips of HBc dimer spikes.

      Strengths:

      This work is very well conducted, combining ITC experiments (for determination of dimers' KD), cellular effects (thanks to the grafting of previously developed dimers with polyarginine-based cell penetrating peptide) HBV infected HEK293 cells and Cryo-EM studies.<br /> The findings of these research teams unambiguously demonstrated the interest of such dimeric structures in impeding the correct HBV life cycle and thus, could bring solutions in the control of its development. Ultimately, a new class of HBV Capside Assembly Modulators could arise from this study.<br /> There is no doubt that this work could bring very interesting information for people working on VHB.

      Comments on revisions:

      Minor corrections have been made in this revised version of this work, according to the remarks of the reviewers.

    3. Reviewer #2 (Public review):

      Summary:

      Vladimir Khayenko et al. discovered two novel binding pockets on HBc with in vitro binding and electron microscopy experiments. While the geranyl dimer targeting a central hydrophobic pocket displayed a micromolar affinity, the P1-dimer binding to the spike tip of HBc has a nanomolar affinity. In the turbidity assay and at the cellular level, an HBc aggregation from peptide crosslinking was demonstrated.

      Strengths:

      The study identifies two previously unexplored binding pockets on HBc capsids and develops novel binders targeting these sites with promising affinities.

      Weaknesses:

      While the in vitro and cellular HBc aggregation effects are demonstrated, the antiviral potential against HBV infection is not directly evaluated in this study.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the positive assessment and agree that the experimental data offer valuable insights into HBV capsid assembly inhibition. Based on the reviewers' suggestions, we have clarified the cryo-EM data and added structural and mechanistic details throughout the manuscript, which we believe significantly enhance its overall clarity and impact. The manuscript now better reflects a promising strategy to interfere with the HBV life cycle. We have carefully addressed all comments to improve both the clarity and quality of the manuscript.

      Response to Public Reviews

      We greatly appreciate the insightful comments and suggestions from the reviewers. Below, we provide responses to the points raised in the public reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors present an interesting strategy to interfere with the HBV life cycle: the preparation of geranyl and peptides' dimers that could impede the correct assembly of hepatitis B core protein HBc into viable capsids. These dimers are of different nature, depending on the HBc site the authors plan to target. A preliminary study with geranyl dimers (targeting a hydrophobic site of HBc) was first investigated. The second series deals with peptide-PEG linker-peptide dimers, targeting the tips of HBc dimer spikes.

      Strengths:

      This work is very well conducted, combining ITC experiments (for determination of dimers' KD), cellular effects (thanks to the grafting of previously developed dimers with polyarginine-based cell penetrating peptide) HBV infected HEK293 cells and Cryo-EM studies.

      The findings of these research teams unambiguously demonstrated the interest of such dimeric structures in impeding the correct HBV life cycle and thus, could bring solutions in the control of its development. Ultimately, a new class of HBV Capside Assembly Modulators could arise from this study.

      There is no doubt that this work could bring very interesting information for people working on VHB.

      Weaknesses:

      Some minor corrections must be made, especially for a more precise description of the strategy and the chemical structure of the designed new VHB capsid assembly modulators.

      We are grateful for the positive feedback on the experimental design, the combination of ITC, cellular effects, and Cryo-EM studies, and the potential for developing new classes of HBV Capsid Assembly Modulators (CAMs). In the revised version we have clarified the design rationale for the choice of the PEG linker length in the Supplementary Information, linking it to the structural measurements of the capsid. Chemical structures and detailed molecular formulas were added and terms have been corrected. A scrambled dimeric peptide served as a negative control, which showed no binding, confirming the specificity of our designed peptide and ruling out non-specific interactions from other elements of the molecules such as the linkers. Finally, we have revised the nomenclature for the geranyl dimers to better reflect the chemical structure. All figures, including Figure 3, have been updated to high-resolution. All mentioned typos have been corrected. Consultation dates have been added to the website references. HPLC terminology was corrected.

      Reviewer #2 (Public Review):

      Summary:

      Vladimir Khayenko et al. discovered two novel binding pockets on HBc with in vitro binding and electron microscopy experiments. While the geranyl dimer targeting a central hydrophobic pocket displayed a micromolar affinity, the P1-dimer binding to the spike tip of HBc has a nanomolar affinity. In the turbidity assay and at the cellular level, an HBc aggregation from peptide crosslinking was demonstrated.

      Strengths:

      The study identifies two previously unexplored binding pockets on HBc capsids and develops novel binders targeting these sites with promising affinities.

      Weaknesses:

      While the in vitro and cellular HBc aggregation effects are demonstrated, the antiviral potential against HBV infection is not directly evaluated in this study.

      Thank you for recognizing the innovative approach of our work and the potential for developing novel antivirals targeting HBc. We have now included additional discussion on potential future experiments aimed at evaluating the compounds' effects on cellular physiology and viral infectivity.

      Reviewer #3 (public Review):

      Summary:

      HBV is a continuing public health problem and new therapeutics would be of great value. Khayenko et al examine two sites in the HBc dimer as possible targets for new therapeutics. Older drugs that target HBc bind at a pocket between two HBc dimers. In this study Khayenko et al examine sites located in the four helix bundle at the dimer interface.

      The first site is a pocket first identified as a triton100 binding site. The authors suggest it might bind terpenes and use geraniol as an example. They also test a decyl maltose detergent and a geraniol dimer intended for bivalent binding. The KDs were all in the 100µM range. Cryo-EM shows that geraniol binds the targeted site.

      The second site is at the tip of the spike. Peptides based on a 1995 study (reference 43) were investigated. The authors test a core peptide, two longer peptides, and a dimer of the longest peptide. A deep scan of the longest monomer sequence shows the importance of a core amino acid sequence. The dimeric peptide (P1-dimer) binds almost 100 fold better than the monomer parent (P1). Cryo-EM structures confirm the binding site. The dimeric peptide caused HBc capsid aggregation When HBc expressing cells were treated with active peptide attached to a cell penetrating peptide, the peptide caused aggregation of HBc antigen mirroring experiments with purified proteins.

      Strengths:

      The two sites have not been well investigated. This paper marks a start. The small collection of substrates investigated led to discovery of a dimeric peptide that leads to capsid aggregation, presumably by non-covalent crosslinking. The structures determined could be very useful for future investigations.

      Weaknesses:

      In this draft, the rational for targets for the triton x100 site is not well laid out. The target molecules bind with KDs weaker that 50µM. The way the structural results are displayed, one cannot be sure of the important features of binding site with respect to the the substrate. The peptide site and substrates are better developed, but structural and mechanistic details need to be described in greater detail.

      We appreciate the reviewer’s positive comments on identifying and targeting previously unexplored sites on HBc, and the potential utility of our dimeric peptides in future studies. We have revised the Results section to better explain the rationale behind targeting the hydrophobic binding site. Additionally, the structures have been revised for clearer presentation, and we now emphasize the key features of the binding site and the role of substrate specificity.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      For clarity, the chemical structure of SLLGRM peptide, geraniol and HAP molecules must be indicated, preferably in Fig. 1 (at least in the Supplementary Information section).

      We have now included the chemical structures of the SLLGRM peptide, geraniol, and HAP molecules for clarity in Figure 1 and in the main manuscript to ensure they are easily accessible for reference and to provide further detail and context.

      In the same idea, in Fig. 1 (and in the text): The molecular formula of heteroaryldihydropyrimidine HAP must be clearly indicated, as the nature of the heteroatom (S, O, N?) in this "heteroaryl" derivative is not indicated.

      The full molecular formula of HAP (((2S)-1-[[(4R)-4-(2-chloranyl-4-fluoranyl-phenyl)-5-methoxycarbonyl-2-(1,3-thiazol-2-yl)-1,4-dihydropyrimidin-6-yl]methyl]-4,4-bis(fluoranyl)-pyrrolidine-2-carboxylic acid), is now included the figure legend.

      with a polyethylene glycol (PEG) linker that could bridge the distance of 38 Å between the two opposing hydrophobic pockets": what is the rationale of the design of this linker? Authors must explain briefly why/how they have chosen this linker length and nature (please indicate a reference for the appropriate choice of PEG linker). Same remarks for dimers targeting the capsid spike tips, having 50 angstroms PEG linkers. So, the choice of the linker length must be clearly explained and not be only mentioned in the sentence of the discussion part "Using our structural knowledge of the capsid, particularly the distances between the spikes.

      We have now better clarified the rationale for the design of the PEG linker length. The linker lengths were specifically chosen based on structural knowledge of the capsid, particularly the measured distances between the spike tips (60 Å) and the hydrophobic pockets (40 Å). In the Supplementary Information (Supplementary Figure 1), we now clearly explain how these measurements guided the choice of PEG linker length, allowing for optimal bridging and interaction between the binding sites. This supplementary figure now explicitly connects the design rationale to the specific structural features of the capsid.

      I do not agree with the authors when they claim a "nanomolar affinity of 312 nM". To me, a nanomolar affinity would require several of few tens of nanoM (but not three hundreds) ... So, please correct with "sub-micromolar affinity of 312 nM" and all the other parts of the manuscript (title and caption of Figure 3..., "the peptide dimer (P1dC) with nanomolar affinity" "nanomolar levels"...).

      We thank the Rev#1 for pointing this out. Since the term "nanomolar affinity" can indeed be interpreted as referring to the lower end of the nanomolar range, rather than values close to 300 nM we have revised the manuscript to refer to the "sub-micromolar affinity" where applicable. This change has been made throughout the manuscript, including the subtitles and figure captions, and the text.

      The drug design strategy was to combine two peptides showing low affinity, attached by a PEG linker with an appropriate length and appears obvious to me. But a control experiment is anyway missing: the peptide-PEG linker derivative (not the dimer peptide-PEG linker-peptide...) should have been evaluated for an unambiguous proof of concept of these dimeric peptides. To my opinion, for the publication of this work, these experiments should be brought (eg, when describing the affinities of SLLGR dimers). I agree that Cryo-EM experiments bring evidences of the dimer binding but the affinity values for (peptide-PEG linker) derivatives would bring an additional proof (as the PEG flexible linkers was not resolved by Cryo-EM).

      Thank you for your thoughtful comment regarding the use of a monovalent control for the peptide-PEG linker. A scrambled dimeric peptide serves as a negative control. In ITC it showed no binding at all. Thereby ruling out possibly unspecific interactions mediated by the introduced PEG linker or handle itself.

      Given the complete lack of binding with the scrambled dimeric peptide, we believe this thoroughly excludes the need for an additional monovalent control, as it provides strong evidence that the observed binding is driven specifically by the designed peptide sequence and not by the linker or other structural components. We have now made this clarification more explicit in the revised manuscript to avoid any ambiguity. We hope this addresses your concern, and we appreciate your suggestion to further strengthen the rigor of the work. Despite its identical charge, molecular weight and atom composition the scrambled control did not cause HBc aggregation in living cells, thus indicating sequence specific action of the aggregating dimer.

      The nomenclature of the dimers must be modified because there is no logic between the name "long dimer" and the chemical structure. Particularly, the number of ethylene glycol motifs must be indicated: authors have to find an appropriate nomenclature indicating both the linker length and nature (small molecule or peptide) of the bivalent parts (and hence, do not mention anymore "short geranyl dimer" "long geranyl dimer").

      Thank you for your valuable suggestion regarding the nomenclature of the dimers. We agree that the terms "short geranyl dimer" and "long geranyl dimer" do not fully reflect the chemical structure of the molecules. In response, we have revised the nomenclature to provide a clearer indication of both the linker length and the nature of the bivalent parts. We now refer to the dimers as (Geranyl)<sub>2</sub>-Lys for the dimer with two geranyl groups attached to lysine and (Geranyl-PEG3)<sub>2</sub>-Lys for the dimer with a PEG3 linker (three ethylene glycol units) between the lysine amine and the geranyl groups. These revised names more accurately describe the structural differences and should avoid any ambiguity.

      Lines 198-199: "Among these, the dimerized P1 exhibited a higher 198 occupation of the binding site, as illustrated in Supplementary Figure 9." But in Supp. Fig. 9, dimer P1dC (10) is described. As the text above is describing P1-dimer (9), the Supp. Fig. 9 must be provided, if available. If not, please modify this conclusion accordingly. In the text, when mentioning dimerized P1 peptide, authors must indicate with which compound it deals: (9) or (10)?

      Thank you for your careful reading of the manuscript and for pointing out the discrepancy. In Supplementary Figure 9, the dimer described is P1dC, not P1d. The text has been revised to clarify this. We appreciate your attention to detail.

      Please note that the graphic quality of Figure 3 is bad as it results in pixelized drawings (especially for the chemical structures).

      Thank you for your feedback regarding the quality of Figure 3. We have now updated all figures, including Figure 3, to high-resolution PNG format with 300-500 dpi to ensure optimal graphic quality. This should resolve the pixelization issue, particularly for the chemical structures.

      Minor typos: "clinical studies, a third are CAMs.[6]" "to the spike base hydrophobic pocket" "geraniol affinity to the central hydrophobic pocket, we designed"

      We have corrected the punctuation in the mentioned sentences and appreciate your careful review of the manuscript.

      Concerning the citation of a website (references 5 and 6), I guess that the consultation date should be mentioned.

      We have now updated the references accordingly, including the consultation dates.

      In the Materials and Methods part, Peptide synthesis paragraph, authors must write "semi-preparative HPLC.

      It’s now corrected to "semi-preparative HPLC".

      In the supplementary information file, 1H and 13C NMR spectrum for the small molecule "Short Geranyl Dimer (SGD)" should be provided.

      The purity and identity of this Geranyl derivate were confirmed through UV detection in LC-MS and supported by the mass spectra, which provide robust and clear evidence of the compound's structure and well-accepted method for confirming the structure in this context. While we understand the value of NMR in structural analysis, we believe that additional analytical evidence is not critical for this study.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this study presents an innovative approach to target the HBV core protein and paves the way for developing new classes of antivirals with a distinct mechanism of action. The findings expand the current knowledge of druggable sites on HBc capsids and provide promising lead compounds. Future studies exploring the antiviral effects and optimizing the binders for therapeutic applications would be valuable next steps.

      We sincerely thank the reviewer for the positive assessment of our work and for highlighting its innovative approach to targeting the HBV core protein. We appreciate your recognition of the study's potential in paving the way for developing new classes of antivirals with distinct mechanisms of action. Below, we provide responses to each of the points raised.

      The significance of the central hydrophobic pocket as a target may require additional experiments for validation. Currently, the substrate binding activity is relatively low and appears to have a non-significant impact on HBc.

      We agree that the central hydrophobic pocket exhibits relatively weak binding affinity with the ligands tested in this study. However, we have provided additional structural evidence and affinity data to support its relevance as a druggable site. In recognition of the weak affinity of these small molecules, we expanded our focus to include peptide-based binders, which yielded higher affinities, particularly when dimerized.

      It might be more effective to present Figure 1B after summarizing all the results.

      We understand the reviewer’s suggestion. However, we decided to highlight and summarize the major findings early in the manuscript. We included Figure 1B at the beginning to allow readers to quickly grasp the core concepts and outcomes of our study.

      The labels for P1/P2 are presented in Figure 1A, yet their definitions are not provided until the second part of the Results section.

      We appreciate the reviewer’s observation. While see a benefit of showing three trackable sites on HBV early and as an overview but we also agree that the early presentation of P1/P2 could lead to some confusion. To resolve this, we have revised the figure to introduce only on the minimal peptide to avoid any ambiguity. The full dimer sequences and names are introduced later.

      Further investigation of the cytotoxic potential of peptide-induced HBc aggregation is necessary.

      Investigating the cytotoxicity together with infectivity is an important future direction but outside the scope of this study. We now elaborate on this point in the discussion.

      Reviewer #3 (Recommendations For The Authors):

      Two sites in the dimer interface are shown to bind ligands. It is not shown that filling these regions will change infection. The exhaustive studies by Bruss showed point mutations directly alter infection and would be of value to discuss.

      We thank Rev#3 for this very helpful comment. We now highlight how point mutations in these regions were shown to affect HBV infectivity. Thereby providing a link between our findings and how ligand binding might influence the viral life cycle.

      It is not shown whether the two sites interact. Molecular dynamics by Hadden or Gumbart may be informative. The failure to look for a connection between these sites is an oversight.

      We thank Rev#3 for the insightful suggestion to explore potential interactions between the two binding sites. We acknowledge that molecular dynamics (MD) simulations, such as those performed by Gumbart et al. and Hadden et al., could indeed provide valuable insights into the structural dynamics and potential cooperativity between these sites. Indeed, molecular dynamics of the HBV capsid by Perilla and Hadden has demonstrated significant flexibility in the capsid spikes and their interactions with neighboring subunits suggesting that the dynamics of binding sites could influence ligand accessibility and potential crosstalk.

      We believe that our own previous structural studies together with data in this work provide substantial experimental evidence on this topic. In Makbul et al. 2021a (doi.org/10.3390/microorganisms9050956) we observed that peptide binding (particularly P2) did not stabilize the spikes; instead, the upper part of the spikes exhibited considerable wobbling. This variability mirrored the conformational diversity reported in MD simulations. Using local classification, we noted that the variability in the spike's upper region was greater when P2 was bound than in its absence. Additionally, in Makbul et al. 2021b (doi.org/10.3390/v13112115), we showed that peptide binding had little effect on the hydrophobic pocket beneath the mobile spike region, located in the more rigid part of the capsid. While we observed F97 in the D-monomer adopting two alternate rotamer orientations upon P2 binding this was not exclusive to P2, as similar changes were noted in the L60V mutant even without bound peptide.

      We have updated the manuscript to briefly discuss this crosstalk, that provides additional context to our findings. Interestingly, only TX100—but not geraniol—completely flipped F97 into an alternate orientation, forming a new π-π stacking interaction with the mobile region of the spike. This finding suggests that interactions within the hydrophobic pocket are transmitted based on ligand specific interactions to the tips of the spikes. Thus, supporting and refining the concept of a crosstalk between binding sites, primarily initiated from the hydrophobic pocket in a ligand specific fashion.

      The logic for proposing a terpene ligand is strained. Comparisons are made to HBs and the HDV delta antigen. However, HBs is myristoylated not farnesylated and delta antigen binds HBs not HBc.

      We have revised the text to clarify the rationale for testing terpenes as ligands, focusing instead on the specific properties of the hydrophobic pocket targeted by geraniol.

      The authors suggest larger terpenes as binding agents, but there does not appear to be room for a longer molecule in the binding site. The authors do not discuss whether a longer molecule could be modeled in the site based on their density.

      We appreciate this observation and agree that the potential for larger terpenes to bind this site is not obvious from the structural data presented in this work. We have now included a more detailed visualization (Fig2D) and discussion of the hydrophobic binding pocket, based on the density observed in the presented geraniol structure and the previous triton structure and discuss its implications of the binding of larger hydrophobic molecules into the site (Fig 2D).

      The authors note that the structure could explain molecular details of this site, but these are not discussed. A more complete analysis of the geraniol protein is necessary, including an estimate of the resolution of that density.

      We agree that a more complete analysis of the hydrophobic binding site was warranted. We have now expanded the discussion of the structural details of this binding site based on the geraniol-bound structure, the density and occupancy accounted by this ligand. These additional details (Fig 2C,D and Fig 5) should provide a clearer understanding of the binding interactions observed.

      The dimeric geraniol is marginally better binding than the monomer, two-fold, but this could be due to doubling the number of geraniols per ligand or due to an undefined interaction of the extended molecule with the surface of the capsid. A geraniol linker should be tested.

      The modest improvement in binding may indeed only reflect the doubled number of geraniols rather than linker-mediated avidity effects. Interaction of the linker with the capsid surface is ruled-out by the scrambled control that included the same linkers but did not show any capacity to bind.

      Is the enhanced binding of dimer due to bivalent binding of dimer to one capsid? Is it a chance interaction of the linker with the surface of HBc, which is easily tested? Is it an avidity effect due to aggregation of capsids?

      Thank you for this insightful question. Our data suggest that the enhanced binding is due to bivalent interactions. To address the possibility of non-specific interactions from either the handle or the linker, we included a scrambled dimeric peptide as a negative control, which showed no binding. This rules out non-specific interactions from the linker or handle. Given this, we believe an additional monovalent control is unnecessary, as the scrambled control confirms that the binding is driven by the geraniol and peptide warheads alone. We have clarified this in the revised manuscript and appreciate your suggestion to strengthen the study.

      The experimental analysis of point mutation of P1 is not analyzed beyond stating that it shows the importance of the core peptide sequence. Is there rationale for the effect of R3 to E and K10 to E mutation?

      We appreciate the reviewer's curiosity and request for a more detailed discussion of the P1 deep mutational scan data and its implications. The observed low mutation tolerance of the core peptide sequence SLLGRM regarding HBc binding is highly consistent with our prior structural data and binding studies in solutions (https://doi.org/10.3390/microorganisms9050956) as well as the results from the original phage library screening (M. R. Dyson, K. Murray, Proceedings of the National Academy of Sciences 1995, 92, 2194–2198), and the binding data presented here. Notably, the data set does not suggest that additional binding interfaces contribute to the aggregation seen with N-terminal elongated P1 and P2 versus the non-aggregating shorter SLLGRM. While the positional scan largely aligns with previous phage binding hierarchy and quantified ligands, we were previously prompted by surprising affinity gains for positive to negative amino exchanges in related peptides in same way as Rev#3: Specifically, “SLLGEM” has been predicted previously and here to show enhanced affinity over “SLLGRM”. Quantification in solution, however, could not confirm this enhanced HBV binding affinity (Makbul et al. 2021 Microorganisms), which could not be recapitulated by in solution quantification. In the revised version of the manuscript we now highlight the possible limited predictive power of this assay for positions where positively charged residues are exchanged by negatively charged residues (Figure legend of Fig 3D).

      The fluctuations in Figure 3B could be largely magnification of noise due to changing the y-axis. The fluctuations can be characterized as standard variation, excluding the injections, to allow a quantitative judgment.

      Isothermal titration calorimetry heat fluctuations without injections are now shown in the supplementary information scaled to the same y-axis (Supplementary Figure 3D). 

      Molecular graphics throughout are too small and poorly labeled.

      We have revised the molecular graphics throughout the manuscript to increase their size and improve labeling for clarity. All figures are now provided in 500dpi.

      In Figure 2, compounds 1 and 2 are pyrophosphates. The label in the figure should be corrected.

      Thank you for pointing this out. These compounds were removed for clarity.

      In the introduction, the phrase "discontinuation frequently leads to relapse" should be changed to something less ambiguous.

      Thank you for highlighting this point regarding the phrasing in the introduction. We have revised the statement to more accurately reflect the clinical situation by specifying that stopping treatment often results in viral rebound and disease recurrence in many patients. This adjustment clarifies the intended meaning and addresses the ambiguity you identified. We hope this revision better aligns with the clinical context of HBV management and improves the overall clarity of the manuscript.

      Define "functional cure" in the introduction.

      Thank you for your suggestion to clarify the term 'functional cure.' We have revised the manuscript and instead of ”functional cure” we mention the goal of sustained viral suppression without detectable HBV DNA and loss of hepatitis B surface antigen (HBsAg) without the need for continuous therapy. This should provide greater clarity for readers and improve the overall comprehensibility of the introduction.

      The sentence beginning line 92 is not clear unless one has already read the paper. Figure 1 is not well described.

      Thank you for your valuable feedback regarding the clarity of this sentence and the legend of Figure 1. We have revised the text and legend to provide more context and improve the flow for readers who are unfamiliar with the specifics of the study. The revised version now clearly explains the targeted binding sites and the purpose of the bivalent binders at the beginning of the results section.

      In line 235 the meaning is not clear. What is in excess? Is there free CPP in solution? Is it the charge on the CPP?

      We have clarified the passage as requested.

      When describing peptide-induced aggregation, Figures 5 and 6, figure 1B is never referred to. Figure 1B would work better as part of Figure 6.

      We understand the reviewer’s suggestion. However, we decided to highlight and summarize the major findings and the underlying hypothesis early in the manuscript. We included Figure 1B at the beginning to allow readers to quickly grasp a core concept and outcome of our study.

      We now however refer to Figure 1B and together with all the other changes hope that we have improved the clarity and quality of the manuscript.

      We appreciate your constructive feedback and the opportunity to further refine the work.

    1. eLife Assessment

      In this manuscript, the authors describe a new AlphaFold2 pipeline called PabFold that can represent a useful tool for identifying linear antibody epitopes (B-cell epitopes) for different antigens. This information can be used in the selection of different reagents in competitive ELISA assays which can save time and reduce costs. Because several questions about the work remain, the study is currently incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, "PAbFold: Linear Antibody Epitope Prediction using AlphaFold2", the authors generate a python wrapper for the screening of antibody-peptide interactions using AlphaFold, and test the performance of AlphaFold on 3 antibody-peptide complexes. In line with previous observations regarding the ability of AlphaFold to predict antibody structures and antigen binding, the results are mixed. While the authors are able to use AlphaFold to identify and experimentally validate a previously characterized broad binding epitope with impressive precision, they are unable to consistently identify the proper binding registers for their control [Myc-tag, HA-tag] peptides. Further, it appears that the reproducibility and generality of these results are low, with new versions of AlphaFold negatively impacting the predictive power. However, if this reproducibility issue is solved, and the test set is greatly increased, this manuscript could contribute strongly towards our ability to predict antibody-antigen interactions.

      Strengths:

      Due to the high significance, but difficulty, of the prediction of antibody-antigen interactions, any attempts to break down these predictions into more tractable problems should be applauded. The authors' approach of focusing on linear epitopes (peptides) is clever, reducing some of the complexities inherent to antibody binding. Further, the ability of AlphaFold to narrow down a previously broadly identified experimental epitope is impressive. The subsequent experimental validation of this more precisely identified epitope makes for a nice data point in the assessment of AlphaFold's ability to predict antibody-antigen interactions.

      Weaknesses:

      Without a larger set of test antibody-peptide interactions, it is unclear whether or not AlphaFold can precisely identify the binding register of a given antibody to a given peptide antigen. Even within the small test set of 3 antibody-peptide complexes, performance is variable and depends upon the scFv scaffold used for unclear reasons. Lastly, the apparent poor reproducibility is concerning, and it is not clear why the results should rely so strongly on which multi-sequence alignment (MSA) version is used, when neither the antibody CDR loops nor the peptide are likely to strongly rely on these MSAs for contact prediction.

      Major Point-by-Point Comments:

      (1) The central concern for this manuscript is the apparent lack of reproducibility. The way the authors discuss the issue (lines 523-554) it sounds as though they are unable to reproduce their initial results (which are reported in the main text), even when previous versions of AlphaFold2 are used. If this is the case, it does not seem that AlphaFold can be a reliable tool for predicting antibody-peptide interactions.

      (2) Aside from the fundamental issue of reproducibility, the number of validating tests is insufficient to assess the ability of AlphaFold to predict antibody-peptide interactions. Given the authors' use of AlphaFold to identify antibody binding to a linear epitope within a whole protein (in the mBG17:SARS-Cov-2 nucleocapsid protein interaction), they should expand their test set well beyond Myc- and HA-tags using antibody-antigen interactions from existing large structural databases.

      (3) As discussed in lines 358-361, the authors are unsure if their primary control tests (antibody binding to Myc-tag and HA-tag) are included in the training data. Lines 324-330 suggest that even if the peptides are not included in the AlphaFold training data because they contain fewer than 10 amino acids, the antibody structures may very well be included, with an obvious "void" that would be best filled by a peptide. The authors must confirm that their tests are not included in the AlphaFold training data, or re-run the analysis with these templates removed.

      (4) The ability of AlphaFold to refine the linear epitope of antibody mBG17 is quite impressive and robust to the reproducibility issues the authors have run into. However, Figure 4 seems to suggest that the target epitope adopts an alpha-helical structure. This may be why the score is so high and the prediction is so robust. It would be very useful to see along with the pLDDT by residue plots a structure prediction by residue plot. This would help to see if the high confidence pLDDT is coming more from confidence in the docking of the peptide or confidence in the structure of the peptide.

      (5) Related to the above comment, pLDDT is insufficient as a metric for assessing antibody-antigen interactions. There is a chance (as is nicely shown in Figure S3C) that AlphaFold can be confident and wrong. Here we see two orange-yellow dots (fairly high confidence) that place the peptide COM far from the true binding region. While running the recommended larger validation above, the authors should also include a peptide RMSD or COM distance metric, to show that the peptide identity is confident, and the peptide placement is roughly correct. These predictions are not nearly as valuable if AlphaFold is getting the right answer for the wrong reasons (i.e. high pLDDT but peptide binding to a non-CDR loop region). Eventual users of the software will likely want to make point mutations or perturb the binding regions identified by the structural predictions (as the authors do in Figure 4).

      Comments on revisions:

      I have read the author's responses and the revised manuscript. The authors did not sufficiently address my comments, nor the fundamental issue with the manuscript.

      By the authors' own admission, many of the results presented in the current version of the manuscript cannot be reproduced without relying on locally saved MSAs. In other words, there is almost no evidence presented that this pipeline will predict antibody-antigen interactions using currently publicly available software. This manuscript is reduced to essentially a case study (N=1) in how one might go about making such predictions coupled with pretty good experimental evidence backing up this singular prediction.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors showed the applicability and usefulness of a new AlphaFold2 pipeline called PabFold, which can predict linear antibody epitopes (B-cell epitopes) that can be helpful for the selection of reagents to be applied in competitive ELISA assay.

      Strengths:

      The authors showed the accuracy of the pipeline to identify correctly the binding epitope for three different antibody-antigen systems (Myc, HA, and Sars-Cov2 nucleocapsid protein). The design of scFvs from Fab of the three antibodies to speed up the analysis time is extremely interesting.

      Weaknesses:

      The article justifies correctly the findings and no great weaknesses are present. However, it could be useful for a broader audience to show in detail how pLDDT was calculated for both Simple-Max approach (per residue-pLDDT) and Consensus analysis ( average pLDDT for each peptide), with associated equations.

      Comments on revisions:

      I have read the author's responses to my comments and the revised paper. They addressed the minor comments and concerns. However, I agree with Reviewer #1 that these findings cannot be reproduced without local MSAs and this is a major issue.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Comments):

      (1) The central concern for this manuscript is the apparent lack of reproducibility. The way the authors discuss the issue (lines 523-554) it sounds as though they are unable to reproduce their initial results (which are reported in the main text), even when previous versions of AlphaFold2 are used. If this is the case, it does not seem that AlphaFold can be a reliable tool for predicting antibody-peptide interactions.

      The driving point behind the multiple sequence alignment (MSA) discussion was indeed to point out that AlphaFold2 (AF2) performance when predicting scFv:peptide complexes is highly dependent upon the MSA, but that is a function of MSA generation algorithm (MMseqs2, HHbiltz, jackhmmer, hhsearch, kalign, etc) and sequence databases, and less an intrinsic function of AF2. It is important to report MSA-dependent performance precisely because this results in changing capabilities with respect to peptide prediction.

      Performance also significantly varies with the target peptide and scFv framework changes. By reporting the varying success rates (as a function of MSA, peptide target, and framework changes) we aim to help future researchers craft modified algorithms that can achieve increased reliability at protein-peptide binding predictions. Ultimately, tracking down how MSA generation details vary results (especially when the MSA’s are hundreds long) is significantly outside the scope of this paper. Our goal for this paper was to show a general method for identification of linear antibody epitopes using only sequence information, and future work by us or others should focus on optimization of the process. 

      (2) Aside from the fundamental issue of reproducibility, the number of validating tests is insufficient to assess the ability of AlphaFold to predict antibody-peptide interactions. Given the authors' use of AlphaFold to identify antibody binding to a linear epitope within a whole protein (in the mBG17:SARS-Cov-2 nucleocapsid protein interaction), they should expand their test set well beyond Myc- and HA-tags using antibody-antigen interactions from existing large structural databases.

      Performing the calculations at the scale that the reviewer is requesting is not feasible at this time. We showed in this manuscript that we were able to predict 3 of 3 epitopes, including one antigen and antibody pair that have not been deposited into the PDB with no homologs. While we feel that an N=3 is acceptable to introduce this method to the scientific community, we will consider adding more examples of success and failure in the future to optimize and refine the method as computational resources become available. Notably, future efforts that attempt high-throughput predictions of this class using existing databases should take particular care to avoid contamination.

      (3) As discussed in lines 358-361, the authors are unsure if their primary control tests (antibody binding to Myc-tag and HA-tag) are included in the training data. Lines 324-330 suggest that even if the peptides are not included in the AlphaFold training data because they contain fewer than 10 amino acids, the antibody structures may very well be included, with an obvious "void" that would be best filled by a peptide. The authors must confirm that their tests are not included in the AlphaFold training data, or re-run the analysis with these templates removed.

      First, we address the simpler question of templates.

      The reruns of AF2 with the local 2022 rebuild, the most reproducible method used with results most on par with the MMSEQS server in the Fall of 2022, were run without templates. This is because the MSA was generated locally; no templates were matched and generated locally. The only information passed then was the locally generated MSA, and the fasta sequence of the unchanging scFv and the dynamic epitope sequence. Because of how well this performed despite the absence of templates, we can confidently say the inclusion of the template flag is not significant with respect to how universally accurately PAbFold can identify the correct epitope. 

      Second, we can partially address the question of whether the AlphaFold models had access to models suitable, in theory, for “memorization” of pertinent structural details. 

      With respect to tracking the exact role and inclusion of specific PDB entries, the AF2 paper provides the following:

      “Structures from the PDB were used for training and as templates (https://www.wwpdb.org/ftp/pdb-ftp-sites; for the associated sequence data and 40% sequence clustering see also https://ftp.wwpdb.org/pub/pdb/derived_data/ and https://cdn.rcsb.org/resources/sequence/clusters/bc-40.out). Training used a version of the PDB downloaded 28 August 2019, while the CASP14 template search used a version downloaded 14 May 2020. The template search also used the PDB70 database, downloaded 13 May 2020 (https://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/).”

      Three of these links are dead. As such, it is difficult to definitively assess the role of any particular PDB entry with respect to AF2 training/testing, nor what impact homologous training structures given the very large number of immunoglobin structures in the training set. That said, we can summarize information for the potentially relevant PDB entries (l 2or9, which is shown in Fig. 1 and 1frg), and believe it is most conservative to assume that each such entry was within the training set.

      PDB entry 2or9 (released 2008): the anti-c-myc antibody 9E10 Fab fragment in complex with an 11-amino acid synthetic epitope: EQKLISEEDLN. This crystal structure is also noteworthy for featuring a binding mode where the peptide is pinned between two Fab. The apo structure (2orb) is also in the database but lacks the peptide and a resolved structure for CDR H3.

      PDB entry 1a93 (released 1998): a c-Myc-Max leucine zipper structure, where the c-Myc epitope (in a 34-amino acid protein) adopts an alpha helical conformation completely different from the epitope captured in entry 2or9.

      PDB entries 5xcs and 5xcu (released 2017): engineered Fv-clasps (scFv alternatives) in complex with the 9-amino acid synthetic HA epitope: YPYDVPDYA.

      PDB entry 1frg (released 1994): anti-HA peptide Fab in complex with HA epitope subset Ace-DVPDYASL-NH2.

      Since the 2or9 entry has our target epitope (10 aa) embedded within an 11aa sequence, we have revised this line in the manuscript:

      The AlphaFold2 training set was reported to exclude chains of less than 10, which would eliminate the myc and HA epitope peptides. => The AlphaFold2 training set was reported to exclude chains of less than 10, which would eliminate the HA epitope peptide from potential training PDB entries such as 5xcs or 5xcu”

      It is important to note that we obtained the best prediction performance for the scFv:peptide pair that had no pertinent PDB entries (mBG17). Specifically, doing a Protein Blast against the PDB using the mBG17 scFv revealed diverse homologs, but a maximum sequence identity of 89.8% for the heavy chain (to an unrelated antibody) and 93.8% for the light chain (to an unrelated antibody). Additionally, while it is possible that the AF2 models might have learned from the complex in pdb entry 2or9, Supplemental Figure 3 shows how often the peptide is “misplaced”, and the performance does not exceed the performance for mBG17.

      (4) The ability of AlphaFold to refine the linear epitope of antibody mBG17 is quite impressive and robust to the reproducibility issues the authors have run into. However, Figure 4 seems to suggest that the target epitope adopts an alpha-helical structure. This may be why the score is so high and the prediction is so robust. It would be very useful to see along with the pLDDT by residue plots a structure prediction by residue plot. This would help to see if the high confidence pLDDT is coming more from confidence in the docking of the peptide or confidence in the structure of the peptide.

      The reviewer is correct that target mBG17 epitope adopts an alpha helical conformation, and we concur that this likely contributes to the more reliable structure prediction performance.  When we predict the structure of the epitope alone without the mBG17 scFv, AF2 confidently predicts an alpha helix with an average pLDDT of 88.2 (ranging from 74.6 to 94.4). 

      Author response image 1.

      The AF2 prediction for the mBG17 epitope by itself.

      However, as one interesting point of comparison, a 10 a.a. poly-alanine peptide is also consistently folded into an alpha-helical coil by AF2. The A<sub>10</sub> peptide is also predicted to bind among the traditional scFv CDR loops, but the pLDDT scores are very poor (Supplemental Figure 5J). We also observed the opposite case; when a peptide has a very unstructured region in the binding domain but is nonetheless still be placed confidently, as seen in Supplemental Figure 3 C&D. Therefore, while we suspect peptides with strong alpha helical propensity are more likely to be accurately predicted, the data suggests that that alpha helix adoption is neither necessary nor sufficient to reach a confident prediction.

      (5) Related to the above comment, pLDDT is insufficient as a metric for assessing antibody antigen interactions. There is a chance (as is nicely shown in Figure S3C) that AlphaFold can be confident and wrong. Here we see two orange-yellow dots (fairly high confidence) that place the peptide COM far from the true binding region. While running the recommended larger validation above, the authors should also include a peptide RMSD or COM distance metric, to show that the peptide identity is confident, and the peptide placement is roughly correct. These predictions are not nearly as valuable if AlphaFold is getting the right answer for the wrong reasons (i.e. high pLDDT but peptide binding to a nonCDR loop region). Eventual users of the software will likely want to make point mutations or perturb the binding regions identified by the structural predictions (as the authors do in Figure 4).

      We agree with the reviewer that pLDDT is not a perfect metric, and we are following with great interest the evolving community discussion as to what metrics are most predictive of binding affinity (e.g. pAE, or pITM as a decent predictor for binding, but not affinity ranking). To our knowledge, there is not yet a consensus for the most predictive metrics for protein:protein binding nor protein:peptide binding. Intriguingly, since the antigen peptides are so small in our case, the pLDDT of the peptide residues should be mostly reporting on the confidence of the distances to neighboring protein residues.

      As to the suggestion for a RMSD or COM distance metric, we agree that these are useful -with the caveat that these require a reference structure. The goal of our method is to quickly narrow down candidate linear epitopes and thereby guide experimentalists to more efficiently determine the actual binding sequence of an antibody-antigen sequence. Presumably this would not be necessary if a reference structure were known. 

      It may also be possible to invent a method to filter unlikely binding modes that is specific to antibodies and peptide epitopes that does not require a known reference structure, but this would be an interesting problem for subsequent study.

      Reviewer 1 (Recommendations for the Authors):

      (1) "Linear epitope" should be more precisely defined in the text. It isn't clear whether the authors hope that they can use AlphaFold to predict where on a given protein antigen an antibody will bind, or which antigenic peptide the antibody will bind to. The authors discuss both problems, and there is an important distinction between the two. If the authors are only concerned with isolated antigenic peptides, rather than linear epitopes in their full length structural contexts, they should be more precise in the introduction and discussion.

      We thank the reviewer for the prompt towards higher precision. We are using the short contiguous antigen definition of “linear epitope” that depends on secondary rather than tertiary structure. The linear epitopes this paper considers are short “peptides” that form secondary structure independent of their structure in the complete folded antigen protein. We have clarified our definition of “linear epitope” in the text (lines 64-66). 

      (2) Line 101: "Not all portions of the antibody are critical". First, this is not consistent with the literature, particularly where computational biology is concerned.

      See https://pubs.acs.org/doi/10.1021/acs.jctc.7b00080 . Second, while I largely agree with what I think the authors are trying to say (that we can largely reduce the problem to the CDR loops), this is inconsistent with what the authors later find, which is that inexplicably the VH/VL scaffold used alters results strongly.

      We have adopted verbiage that should be less provocative: “Fortunately, with respect to epitope specificity, antibody constant domains are less critical than the CDR loops and the remainder of the variable domain framework regions.”

      (3) Related to the above comment, do the authors have any idea why epitope prediction performance improved for the chimeric scFvs? Is this due to some stochasticity in AlphaFold? Or is there something systematic? Expanding the test dataset would again help answer this question.

      We agree that future study with a larger test set could help address this intriguing result, for which we currently lack a conclusive explanation. Part of our motivation for this publication was to bring to light this unexpected result. Notably, these framework differences are not only implicated as a factor in driving AF2 performance, but also changing experimental intracellular performance as reported by our group (DOI: 10.1038/s41467-019-10846-1 ). We can generate a variety of hypotheses for this phenomenon. Just as MSA sub-sampling has been a popular approach to drive AF2 to sample alternative conformations, sequence recombination may be a generically effective way to generate usefully different binding predictions. However, it is difficult to discriminate between recombination inducing subtle structural tweaks that increase protein intracellular fitness and binding, from recombination causing changes to the MSA that affect the likelihood of sampling a good epitope binding conformation. It is also possible that the chimeras are more deftly predicted by AF2 due to differences in sequence representation during the training of the AF2 models (e.g. more exposure to models containing 15F11 or 2E2 structures). We attempted to deconvolute MSA differences by using single-sequence mode (Supplementary Figure 13) but this ablated performance.

      (4) Figure 2: The reported consensus pLDDT scores are actually quite low here, suggesting low confidence in the result. This is in strong contrast to the reported consensus scores for mBG17. Again, a larger test dataset would help set a quantitative cutoff for where to draw the line for "trustworthy" AlphaFold predictions in antibody-peptide binding applications.

      We agree that a larger dataset will be useful to begin to establish metrics and thresholds and will contribute to the aforementioned community discussion about reliable predictors of binding. Our current focus is not structure prediction per se. In the current work we are more focused on relative binding likelihood and increasing the efficiency of experimental epitope verification by flagging the most likely linear epitopes. Thus, while the pLDDT scores are low for Myc in Figure 2, it is remarkable (and worth reporting) that there is still useful signal in the relative variation in pLDDT. The utility of the signal variation is evident in the ability to short-list correct lead peptides via the two methods we demonstrate (consensus and per-residue max).

      (5) Figure 4: if the authors are going to draw conclusions from the actual structure predictions of AlphaFold (not just the pLDDT scores), the side-chain accuracy placement should be assessed in the test dataset (RMSD or COM distance).

      We agree with the reviewer that side-chain placement accuracy is important when evaluating the accuracy of AF2 structure predictions. However, here our focus was relative binding likelihood rather than structure prediction. The one case where we attempted to draw conclusions from the structure prediction was in the context of mBG17, where there is not yet an experimental reference structure. Absolutely, if we were to obtain a crystal structure for that complex, we would assess side-chain placement accuracy. 

      (6) Lines 493-508: I am not sure that this assessment for why AlphaFold has difficulty with antibody-antigen interactions is correct. If the authors' interpretation is correct (larger complicated structures are more challenging to move) then AlphaFold-Multimer (https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) wouldn't perform as well as it does. Instead, the issue is likely due to the incredibly high diversity in antibody CDR loops, which reduces the ability of the AlphaFold MSA step (which the authors show is quite critical to predictions: Figure S13) to inform structure prediction. This, coupled with the importance of side chain placement in antibody and TCR interactions, which is notoriously difficult (https://elifesciences.org/articles/90681), are likely the largest source of uncertainty in antibody-antigen interaction prediction.

      We agree with the reviewer that CDR loop diversity (and associated side chain placement challenges) are a major barrier to successfully predict antibody-antigen complexes. Presumably this is true for both peptide antigens and protein antigens. Indeed, the authors of AlphaFold-multimer admit that the updated model struggles with antibody-antigen complexes, saying “As a limitation, we observe anecdotally that AlphaFold-Multimer is generally not able to predict binding of antibodies and this remains an area for future work.” The point about how loop diversity could reduce MSA quality is well taken. We have included the following thanks to the guidance of the reviewer when discussing MSA sensitivity is discussed later on in lines 570-572.: 

      “These challenges are presumably compounded by the incredible diversity of the CDR loops in antibodies which could decrease the useful signal from the MSA as well as drive inconsistent MSA-dependent performance”.

      With respect to lines 493-508, we have also rephrased a key sentence to try to better explain that we are comparing the often-good recognition performance for short epitopes to the never-good performance when those epitopes are embedded within larger sequences. Instead of saying, “In contrast, a larger and complicated structure may be more challenging to move during the AlphaFold2 structure prediction or recycle steps.” we now say in lines 520-522 , “In contrast, embedding the epitope within a larger and more complicated structure appears to degrade the ability of AlphaFold2 to sample a comparable bound structure within the allotted recycle steps.”

      (7) Related to major comment 1: Are AlphaFold predictions deterministic? That is, if you run the same peptide through the PAbFold pipeline 20 times, will you get the same pLDDT score 20 times? The lack of reproducibility may be in part due to stochasticity in AlphaFold, which the authors could actually leverage to provide more consistent results.

      This is a good question that we addressed while dissecting the variable performance. When the random seed is fixed, AF2 returns the same prediction every time. After running this 10 times with a fixed seed, the mBG17 epitope was predicted with an average pLDDT of 88.94, with a standard deviation of 1.4 x 10<sup>-14</sup>. In contrast, when no seed is specified, AF2 did not return an *identical* result. However, the results were still remarkably consistent. Running the mBG17 epitope prediction 10 times with a different seed gave an average pLDDT of 89.24, with a standard deviation of 0.49. 

      (8) Related to major comment 2: The authors could use, for example, this previous survey of 1833 antibody-antigen interactions (https://www.sciencedirect.com/science/article/pii/S2001037023004725) the authors could likely pull out multiple linear epitopes to test AlphaFold's performance on antibody peptide interactions. A large number of tests are necessary for validation.

      We thank the reviewer for this report of antibody-antigen interactions and will use it as a source of complexes in a future expanded study. Given the quantity and complexity of the data that we are already providing, as well as logistical challenges for compute and personnel the reviewer is asking for, we must defer this expansion to future work.

      (9) Related to major comment 3: Apologies if this is too informal for a review, but this Issue on the AlphaFold GitHub may be useful: https://github.com/googledeepmind/alphafold/issues/416 .

      We thank the reviewer for the suggestion – per our response above we have indeed run predictions with no templates. Since we are using local AlphaFold2 calculations with localcolabfold, the use or non-use of templates is fairly simple: including a “—templates” flag or not.

      (10) Related to major comment 4: I am not sure if AlphaFold outputs by-residue secondary structure prediction by default, but I know that Phyre2 does http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index .

      To our knowledge, AF2 does not predict secondary structure independent of the predicted tertiary structure. When we need to analyze the secondary structure we typically use the program DSSP from the tertiary structure. 

      (11) The documentation for this software is incomplete. The GitHub ReadMe should include complete guidelines for users with details of expected outputs, along with a thorough step-by-step walkthrough for use.

      We thank the reviewer for pointing this out, but we feel that the level of detail we provide in the GitHub is sufficient for users to utilize the method described.

      Stylistic comments:

      (1) I do not think that the heatmaps (as in 1C, top) add much information for the reader. They are largely uniform across the y-axis (to my eyes), and the information is better conveyed by the bar and line graphs (as in 1C, middle and bottom panels).

      We thank the reviewer for this feedback but elect to leave it in on the premise of more data presented is (usually) better. Including the y-axis reveals common patterns such as the lower confidence of the peptide termini, as well as the lack of some patterns that might have occurred. For example, if a subset of five contiguous residues was necessary and sufficient for local high confidence this could be visually apparent as a “staircase” in the heat map.

      (2) A discussion of some of the shortcomings of other prediction-based software (lines 7177) might be useful. Why are these tools less well-equipped than AlphaFold for this problem? And if they have tried to predict antibody-antigen interactions, why have they failed?

      We agree with the reviewer that a broader review of multiple methods would be interesting and useful. One challenge is that the suite of available methods is evolving rapidly, though only a subset work for multimeric systems. Some detail on deficiencies of other approaches was provided in lines 71-77 originally, although we did not go into exhaustive detail since we wanted to focus on AF2. We view using AF2 in this manner is novel and that providing additional options predict antibody epitopes will be of interest to the scientific community. We also chose AF2 because we have ample experience with it and is a software that many in the scientific community are already using and comfortable with. Additionally, AF2 provided us with a quantification parameter (pLDDT) to assess the peptides’ binding abilities. We think a future study that compares the ability of multiple emerging tools for scFv:peptide prediction will be quite interesting. 

      (3) Similar to the above comment, more discussion focused on why AlphaFold2 fails for antibodies (lines 126-128) might be useful for readers.  

      We thank the reviewer for the suggestion. The following line has been added shortly after lines 135-137:

      “Another reason for selecting AF2 is to attempt to quantify its abilities the compare simple linear epitopes, since the team behind AF-multimer reported that conformational antibody complexes were difficult to predict accurately (14).”

      Per earlier responses, we also added text that flags one particular possible reason for the general difficulty of predicting antibody-antigen complexes (the diversity of the CDR loops and associated MSA challenges).

      (4) The first two paragraphs of the results section (lines 226-254) could likely be moved to the Methods. Additionally, details of how the scores are calculated, not just how the commands are run in python, would be useful.

      Per the reviewer suggestion, we moved this section to the end of the Methods section. Also, to aid in the reader’s digestion of the analysis, the following text has been added to the Results section (lines 256-264):

      “Both the ‘Simple Max’ and ‘Consensus’ methods were calculated first by parsing every pLDDT score received by every residue in the antigen sequence sliding window output structures. From the resulting data structure, the Simple Max method simply finds the maximum pLDDT value ever seen for a single residue (across all sliding windows and AF2 models). For the Consensus method, per-residue pLDDT was first averaged across the 5 AF2 models. These averages are reported in the heatmap view, and further averaged per sliding window for the bar chart below.

      In principle, the strategy behind the Consensus method is to take into account agreement across the 5 AF2 models and provide insight into the confidence of entire epitopes (whole sliding windows of n=10 default) instead of disconnected, per-residue pLDDT maxima.” 

      (5) Figure 1 would be more useful if you could differentiate specifically how the Consensus and Simple Max scoring is different. Providing examples for how and why the top 5 peptide hits can change (quite significantly) using both methods would greatly help readers understand what is going on.

      Per the reviewer suggestion, we have added text to discuss the variable hit selection that results from the two scoring metrics. The new text (lines 264-271) adds onto the added text block immediately above:

      “Having two scoring metrics is useful because the selection of predicted hits can differ. As shown in Figure 2, part of the Myc epitope makes it into the top 5 peptides when selection is based on summing per-residue maximum pLDDT (despite there being no requirement that these values originate in the same physical prediction). In contrast, a Consensus method score more directly reports on a specific sliding window, and the strength of the highest confidence peptides is more directly revealed with superior signal to noise as shown in Figure 3. Variability in the ranking of top hits between the two methods arises from the fundamental difference in strategy (peptide-centric or residue-centric scoring) as well as close competition between the raw AF2 confidence in the known peptide and competing decoy sequences.”

      (6) Hopefully the reproducibility issue is alleviated, but if not the discussion of it (lines 523554) should be moved to the supplement or an appendix.

      The ability of the original AF2 model to predict protein-protein complexes was an emergent behavior, and then an explicit training goal for AF2.multimer. In this vein, the ability to predict scFv:peptide complexes is also an emergent capability of these models. It is our hope that by highlighting this capacity, as well as the high level of sensitivity, that this capability will be enhanced and not degraded in future models/algorithms (both general and specialized). In this regard, with an eye towards progress, we think it is actually important to put this issue in the scientific foreground rather than the background. When it comes to improving machine learning methods negative results are also exceedingly important.

      Reviewer 2 (Recommendations for the Author):

      - Line 113, page 3 - the structures of the novel scFv chimeras can be rapidly and confidently be predicted by AlphaFold2 to the structures of the novel scFv chimeras can be rapidly and confidently predicted by AlphaFold2.

      The superfluous “be” was removed from the text.

      - Line 276 and 278 page 9 - peptide sequences QKLSEEDLL and EQKLSEEDL in the text are different from the sequences reported in Figures 1 and 2 (QKLISEEDLL and EQKLISEEDL). Please check throughout the manuscript and also in the Figure caption (as in Figure 2).

      These changes were made throughout the text. 

      - I would include how you calculate the pLDDT score for both Simple Max approach and Consensus analysis.

      Good suggestion, this should be covered via the additions noted above.

    1. eLife Assessment

      The objective of this important study is to assess the study design and rigor, enhance the quality of clinical research studies, and emphasize crucial design elements in basic science research. It specifically tackles the ongoing problem of experimental design deficiencies that obstruct the effective translation of research findings into clinical applications. This paper is particularly convincing as it highlights the lack of progress in addressing these issues over the past decade, despite a substantial body of existing research. It serves as a strong call to action for the broader scientific community to improve research practices.

    2. Reviewer #1 (Public review):

      Summary:

      Rigor in the design and application of scientific experiments is an ongoing concern in preclinical (animal) research. Because findings from these studies are often used in the design of clinical (human) studies, it is critical that the results of the preclinical studies are valid and replicable. However, several recent peer-reviewed published papers have shown that some of the research results in cardiovascular research literature may not be valid because their use of key design elements is unacceptably low. The current study is designed to expand on and replicate previous preclinical studies in nine leading scientific research journals. Cardiovascular research articles that were used for examination were obtained from a PubMed Search. These articles were carefully examined for four elements that are important in the design of animal experiments: use of both biological sexes, randomization of subjects for experimental groups, blinding of the experimenters, and estimating the proper size of samples for the experimental groups. The findings of the current study indicate that the use of these four design elements in the reported research in preclinical research is unacceptably low. Therefore, the results replicate previous studies and demonstrate once again that there is an ongoing problem in the experimental design of preclinical cardiovascular research.

      Strengths:

      This study selected four important design elements for study. The descriptions in the text and figures of this paper clearly demonstrate that the rate of use of all four design elements in the examined research articles was unacceptably low. The current study is important because it replicates previous studies and continues to call attention once again to serious problems in the design of preclinical studies, and the problem does not seem to lessen over time.

      Weaknesses:

      Weaknesses from the first review were adequately addressed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Rigor in the design and application of scientific experiments is an ongoing concern in preclinical (animal) research. Because findings from these studies are often used in the design of clinical (human) studies, it is critical that the results of the preclinical studies are valid and replicable. However, several recent peer-reviewed published papers have shown that some of the research results in cardiovascular research literature may not be valid because their use of key design elements is unacceptably low. The current study is designed to expand on and replicate previous preclinical studies in nine leading scientific research journals. Cardiovascular research articles that were used for examination were obtained from a PubMed Search. These articles were carefully examined for four elements that are important in the design of animal experiments: use of both biological sexes, randomization of subjects for experimental groups, blinding of the experimenters, and estimating the proper size of samples for the experimental groups. The findings of the current study indicate that the use of these four design elements in the reported research in preclinical research is unacceptably low. Therefore, the results replicate previous studies and demonstrate once again that there is an ongoing problem in the experimental design of preclinical cardiovascular research.

      Strengths:

      This study selected four important design elements for study. The descriptions in the text and figures of this paper clearly demonstrate that the rate of use of all four design elements in the examined research articles was unacceptably low. The current study is important because it replicates previous studies and continues to call attention once again to serious problems in the design of preclinical studies, and the problem does not seem to lessen over time.

      Weaknesses:

      The current study uses both descriptive and inferential statistics extensively in describing the results. The descriptive statistics are clear and strong, demonstrating the main point of the study, that the use of these design elements is quite low, which may invalidate many of the reported studies. In addition, inferential statistical tests were used to compare the use of the four design elements against each other and to compare some of the journals. The use of inferential statistical tests appears weak because the wrong tests may have been used in some cases. However, the overall descriptive findings are very strong and make the major points of the study.

      We sincerely appreciate the reviewer’s comments and detailed feedback and their recognition of the importance of this work in replicating previous studies and calling attention to the problems in preclinical study design. In response to the reviewer’s suggestions, we have recalculated our inferential statistics. In place of our previous inferential statistics, we have used an alternative correction calculation for p-values (Holm-Bonferroni corrections) and used median-based linear model analyses and nonparametric Kruskal-Wallis tests that are more appropriate for analyzing this dataset. Our overall trends in results remain the same.

      Reviewer #2 (Public Review):

      Summary

      This study replicates a 2017 study in which the authors reviewed papers for four key elements of rigor: inclusion of sex as a biological variable, randomization of subjects, blinding outcomes, and pre-specified sample size estimation. Here they screened 298 published papers for the four elements. Over a 10 year period, rigor (defined as including any of the 4 elements) failed to improve. They could not detect any differences across the journals they surveyed, nor across models. They focused primarily on cardiovascular disease, which both helps focus the research but limits the potential generalizability to a broader range of scientific investigation. There is no reason, however, to believe rigor is any better or worse in other fields, and hence this study is a good 'snapshot' of the progress of improving rigor over time.

      Strengths

      The authors randomly selected papers from leading journals, e.g., PNAS). Each paper was reviewed by 2 investigators. They pulled papers over a 10-year period, 2011 to 2021, and have a good sample of time over which to look for changes. The analysis followed generally accepted guidelines for a structured review.

      Weaknesses

      The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.

      The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.

      I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.

      Impact and Context

      Lack of reproducibility remains an enormous problem in science, plaguing both basic and translational investigations. With the increased scrutiny on rigor, and requirements at NIH and other funding agencies for more rigor and transparency, one would expect to find increasing rigor, as evidenced by authors including more study design elements (SDEs) that are recommended. This review found no such change, and this is quite disheartening. The data implies that journals-editors and reviewers-will have to increase their scrutiny and standards applied to preclinical and basic studies. This work could also serve as a call to action to investigators outside of cardiovascular science to reflect on their own experiences and when planning future projects.

      We sincerely appreciate the reviewer’s insights and comments and recognition of our work contributing to the growing body of evidence on the lack of rigor in preclinical cardiovascular research study design. Regarding the weaknesses the reviewer noted; the referenced 2017 publication details a study by Ramirez et al, and was not conducted by our group. Our study aimed to expand upon their findings by using a more recent timeframe and an alternative list of highly respected cardiovascular research journals. We have now better clarified this distinction in the manuscript. We have also addressed our phrasing regarding the lack of statistical significance in the increase of the proportion of studies including animals of both sexes from 2011-2021.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Many of the methods in this study were strong or adequate. Although the descriptive statistics appear solid, there are significant problems that need to be addressed in the selection and use of inferential statistics.

      (1) One of the design elements that was studied was sample size estimation. This is usually done by a power analysis. The authors should consider what group size for the examined journals is adequate for their statistics to be valid. Or they could report the power of their studies to achieve a given meaningful difference.

      We thank the reviewer for this excellent observation. We unfortunately failed to conduct an a priori power analysis. Previous research (Gupta, et al. 2016) suggests that post-hoc power calculations should not be carried out after the study has been conducted. We acknowledge the importance of establishing a sufficient sample size to draw sound conclusions based on an adequate effect size, and we regret that we did not carry out the appropriate estimations. We are very appreciative of the reviewer’s suggestions and aim to implement such an appropriate study design element in future studies.

      Gupta KK, Attri JP, Singh A, Kaur H, Kaur G. Basic concepts for sample size calculation: Critical step for any clinical trials!. Saudi J Anaesth. 2016;10(3):328-331. doi:10.4103/1658-354X.174918

      (2) A Bonferroni correction was used extensively. Because of its use, the corrected p values often appear much too high. The Bonferroni test becomes much too conservative for more than 3 or 4 tests. I suggest using a different test for multiple comparisons.

      We thank the reviewer for their insightful suggestion. We have updated all p-values to reflect a Holm-Bonferroni correction instead. All p-values have been corrected and updated.

      (3) The use of the chi-square test for categorical data is appropriate. However, the t-test and multiple regression tests are designed for continuous variables. Here, it appears that they were used for the nominal variables (Table 1). For these nominal data, other nonparametric tests should be used.

      We thank the reviewer for this valuable insight. We have updated our statistical analysis methods and now use nonparametric Kruskal-Wallis tests to analyze differences in SDE reporting across journals, instead of chi-square test. Our reported p-values have been adjusted accordingly.

      (4) It is not clear exactly when each test is used. The stats section in Methods should better delineate when each test is used. In addition, it would be helpful to include the test used in the figure legends.

      We thank the reviewer for bringing up this important point. We have now updated the methods section to better delineate which tests were used, and also included the specific tests in the figure legends.

      (5) You will need to rewrite some sections of the text to reflect the changes due to changing your use of statistics.

      We have rewritten the sections of the text to reflect the changes in our use of statistics.

      Here are a few comments on the presentation.

      (1) Some of the figure legends are almost impossible to read. They are too congested.

      We thank the reviewer for pointing this out. We have edited the figure legends to make them more readable. We will also attach a pdf with the graphs to allow for easier formatting.

      (2) Also, is it possible to drop some of the panels in Figure 1?

      The panels in figure 1 have been rearranged to make them more readable. We believe that each panel provides valuable visual summaries of our data, that will aid readers in understanding our results.

      (3) It is not mandatory that values of y-axis on the graphs go up 100% (Figs 2 and 3). Using a maximum value of 100% clumps the lines visually. I suggest a max value on the y-axis of the graph of 50% or 60%. That will spread the lines better visually so differences can better be seen.

      We thank the reviewer for considering the experience of our paper’s readers. The y-axes of Figures 2 and 3 have been truncated to 50%. The trend lines in each Figure now appear more separated and differences can better be seen.

      Reviewer #2 (Recommendations For The Authors):

      The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.

      We appreciate the reviewer’s concern in maintaining consistency with the paper published by Ramirez, et al. in 2017. To clarify, our efforts focused on providing a replication study that expanded upon the original Ramirez publication - which we have no affiliation with. For our study, we used different academic journals than those used by Ramirez, et al, and also a different time-frame. We have updated the language in the manuscript to better-clarify the purpose and parameters of our study relative to the previous, unaffiliated, study.

      The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.

      Thank you for bringing this information to our attention. We agree with the concern regarding the statement, “the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2.” We have rephrased the statement. Our updated Holm-Bonferroni corrected p-value is now noted in this more appropriately worded description of our results. Lastly, we have addressed the wording and redundancy seen in both the introduction and discussion and have made both more concise.

      I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.

      We thank the reviewer for bringing this to our attention. We have addressed the redundancy across the Introduction and the Discussion. We have also altered the wording to reflect a more concise explanation of our study.

      The 'trends' are not statistically significant. A non-significant trend does not exist and no claim of a 'trend' is justified by the data.

      We thank the reviewer for this observation. We have updated the phrasing of ‘trends’ in all areas of the manuscript.

    1. eLife Assessment

      This is a valuable study that generates an inventory of accessible genomic regions bound by a transcription factor ZFHX3 within the suprachiasmatic nucleus in the hypothalamus and details the impact of its depletion on daily rhythms in behaviour and gene expression patterns. Analysis using circadian phase-estimation algorithms makes the argument that gene regulatory networks are at play and changes in gene expression of a few clock genes cannot account for the observed animal behaviour. While the transcriptome analysis is solid, the data do not currently support the mechanism of activity of the TF in rhythmic gene expression.

    2. Reviewer #1 (Public review):

      Summary:

      Authors of this article have previously shown the involvement of the transcription factor Zinc finger homeobox-3 (ZFHX3) in the function of the circadian clock and the development/differentiation of the central circadian clock in the suprachiasmatic nucleus (SCN) of the hypothalamus. Here, they show that ZFHX3 plays a critical role in the transcriptional regulation of numerous genes in the SCN. Using inducible knockout mice, they further demonstrate that the deletion Of Zfhx3 induces a phase advance of the circadian clock, both at the molecular and behavioral levels.

      Strengths:

      - Inducible deletion of Zfhx3 in adults<br /> - Behavioral analysis<br /> - Properly designed and analyzed ChIP-Seq and RNA-Seq supporting the conclusion of the behavioral analysis

      Weaknesses:

      - Further characterization of the disruption of the activity of the SCN is required.<br /> - The description of the controls needs some clarification.

    3. Reviewer #2 (Public review):

      Summary:

      ZFHX3 is a transcription factor expressed in discrete populations of adult SCN and was shown by the authors previously to control circadian behavioral rhythms using either a dominant missense mutation in Zfhx3 or conditional null Zfhx3 mutation using the Ubc-Cre line (Wilcox et al., 2017). In the current manuscript, the authors assess the function of ZFHX3 by using a multi-omics approach including ChIPSeq in wildtype SCNs and RNAseq of SCN tissues from both wildtype and conditional null mice. RNAseq analysis showed a loss of oscillation in Bmal1 and changes in expression levels of other clock output genes. Moreover, a phase advance gene transcriptional profile using the TimeTeller algorithm suggests the presence of a regulatory network that could underlie the observed pattern of advanced activity onset in locomotor behavior in knockout mice.

      In figure1, the authors identified tthe ZFHX3 bound sites using ChIPseq and compared the loci with other histone marks that occur at promoters, TSS, enhancers and intergenic regions. And the analysis broadly points to a role for ZFHX3 in transcriptional regulation. The vast majority of nearly 40000 peaks overlapped H3K4me3 and K27ac marks, active promoters which also included genes falling under the GO category circadian rhythms. However, no significant differential ZFHX3 bound peaks were detected between ZT3 and ZT15. In these experiments, it is not clear if and how the different ChIP samples (ZFHX3 and histone PTM ChIPs) were normalized/downsampled for analysis. Moreover, it seems that ZFHX3 binding or recruitment has little to do with whether the promoters are active.

      Based on a enrichment of ARNT domains next to K4Me3 and K27ac PTMs, the authors propose a model where the core-clock TFs and ZFHX3 interact. If the authors develop other assays beyond just predictions to test their hypothesis, it would strengthen the argument for role in circadian transcription in the SCN. It would be important in this context to perform a ChIP-seq experiment for ZFHX3 in the knockout animal (described from Figure 2 onwards) to eliminate the possibility of non-specific enrichment of signal from "open chromatin'. Alternatively, a ChIPseq analysis for BMAL1 or CLOCK could also strengthen this argument to identify the sites co-occupied by ZFHX3 and core-clock TFs.

      Next, they compared locomotor activity rhythms in floxed mice with or without tamoxifen treatment. As reported before in Wilcox et al 2017, the loss of ZFHX3 led to a shorter free running period and reduced amplitude and earlier onset of activity. Overall, the behavioral data in Figure 2 and supplementary figure 2 has been reported before and are not novel.

      Next, the authors performed RNAseq at 4hr intervals on wildtype and knockout animals maintained in light/dark cycles to determine the impact of loss of ZFHX3. Overall transcriptomic analysis indicated changes in gene expression in nearly 36% of expressed genes, with nearly half being upregulated while an equal fraction was downregulated. Pathways affected included mostly neureopeptide neurotransmitter pathways. Surprisingly, there was no correlation between the direction in change in expression and TF binding since nearly all the sites were bound by ZFHX3 and the active histone PTMs. The ChIP-seq experiment for ZFHX3 in the UBC-Cre+Tam mice again could help resolve the real targets of ZFHX3 and the transcriptional state in knockout animals.

      To determine the fraction of rhythmic transcripts, Using dryR, the authors categorise the rhythmic transcriptome into modules that include genes that lose rhythmicity in the KO, gain rhythmicity in the KO or remain unaffected or partially affected. The analysis indicates that a large fraction of the rhythmic transcriptome is affected in the KO model. However, among core-clock genes only Bmal1 expression is affected showing a complete loss of rhythm. The authors state a decrease in Clock mRNA expression (line 294) but the panel figure 4A does not show this data. Instead it depicts the loss in Avp expression - {{ misstated in line 321 ( we noted severe loss in 24-h rhythm for crucial SCN neuropeptides such as Avp (Fig. 3a).}}

      However, core-clock genes such as Pers and Crys show minor or no change in expression patterns while Per2 and Per3 show a ~2hr phase advance. While these could only weakly account for the behavioral phase advance, the authors used TimeTeller to assess circadian phase in wildtype and ZFHX3 deficient mice. This approach clearly indicated that while the clock is not disrupted in the knockout animals, the phase advance can be correctly predicted from a network of gene expression patterns.

      Strengths:

      The authors use a multiomic strategy in order to reveal the role of the ZFHX3 transcription factor with a combination of TF and histone PTM ChIPseq, time-resolved RNAseq from wildtype and knockout mice and modeling the transcriptomic data using TimeTeller. The RNAseq experiments are nicely controlled and the analysis of the data indicates a clear impact on gene-expression levels in the knockout mice and the presence of a regulatory network that could underlie the advanced activity onset behavior.

      Weaknesses:

      It is not clear whether ZFHX3 has a direct role in any of the processes and seems to be a general factor that marks H3K4me3 and K27ac marked chromatin. Why it would specifically impact the core-clock TTFL clock gene expression or indeed daily gene expression rhythms is not clear either. Details for treatment of different ChIP samples (ZFHX3 and histone PTM ChIPs) on data normalization for analysis are needed. The loss of complete rhythmicity of Avp and other neuropeptides or indeed other TFs could instead account for the transcriptional deregulation noted in the knockout mice.

    4. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Authors of this article have previously shown the involvement of the transcription factor Zinc finger homeobox-3 (ZFHX3) in the function of the circadian clock and the development/differentiation of the central circadian clock in the suprachiasmatic nucleus (SCN) of the hypothalamus. Here, they show that ZFHX3 plays a critical role in the transcriptional regulation of numerous genes in the SCN. Using inducible knockout mice, they further demonstrate that the deletion Of Zfhx3 induces a phase advance of the circadian clock, both at the molecular and behavioral levels. 

      Strengths: 

      - Inducible deletion of Zfhx3 in adults 

      - Behavioral analysis 

      - Properly designed and analyzed ChIP-Seq and RNA-Seq supporting the conclusion of the behavioral analysis 

      Weaknesses: 

      - Further characterization of the disruption of the activity of the SCN is required. 

      (1) We thank the reviewer for their valuable inputs. Indeed, a comprehensive behavioral assessment of mice of this genotype was executed in Wilcox et al. ;2017 study. In Wilcox et al.; 2017, Figure 4, 6-h phase advance (jetlag) clearly showed faster reentrainment in ZFHX3-KO mice when compared to the controls.

      - The description of the controls needs some clarification. 

      (2) We agree with the reviewer and will modify the text to clearly describe the controls wherever mentioned.

      Reviewer #2 (Public review): 

      Summary: 

      ZFHX3 is a transcription factor expressed in discrete populations of adult SCN and was shown by the authors previously to control circadian behavioral rhythms using either a dominant missense mutation in Zfhx3 or conditional null Zfhx3 mutation using the Ubc-Cre line (Wilcox et al., 2017). In the current manuscript, the authors assess the function of ZFHX3 by using a multi-omics approach including ChIPSeq in wildtype SCNs and RNAseq of SCN tissues from both wildtype and conditional null mice. RNAseq analysis showed a loss of oscillation in Bmal1 and changes in expression levels of other clock output genes. Moreover, a phase advance gene transcriptional profile using the TimeTeller algorithm suggests the presence of a regulatory network that could underlie the observed pattern of advanced activity onset in locomotor behavior in knockout mice. 

      In figure1, the authors identified the ZFHX3 bound sites using ChIPseq and compared the loci with other histone marks that occur at promoters, TSS, enhancers and intergenic regions. And the analysis broadly points to a role for ZFHX3 in transcriptional regulation. The vast majority of nearly 40000 peaks overlapped H3K4me3 and K27ac marks, active promoters which also included genes falling under the GO category circadian rhythms. However, no significant differential ZFHX3 bound peaks were detected between ZT3 and ZT15. In these experiments, it is not clear if and how the different ChIP samples (ZFHX3 and histone PTM ChIPs) were normalized/downsampled for analysis. Moreover, it seems that ZFHX3 binding or recruitment has little to do with whether the promoters are active.

      (3) We thank the reviewer for their valuable comment. Different ChIP samples. (ZFHX3 and histone PTM ChIPs) were treated in the same manner from preprocessing (quality control by FastQC, Trimming, Alignment to mm10 genome and Peak calling) using MACS2 as mentioned in Methods. The data was normalized using bamCoverage tools and bigwig files were generated for visual inspection using USCS Genome Browser. These additional details will be added to Methods. Finally, BEDTools was employed to study overlapping peaks between ZFHX3 and histone PTMs.

      We agree that, alone, the current data does not make any claim for ZFHX3 being crucial for promoter to be active. Our data clearly suggests that a vast majority of ZFHX3 genomic binding in the SCN was observed at active promoters marked by H3K4me3 and H3K27ac and potentially regulating gene transcription. 

      Based on a enrichment of ARNT domains next to K4Me3 and K27ac PTMs, the authors propose a model where the core-clock TFs and ZFHX3 interact. If the authors develop other assays beyond just predictions to test their hypothesis, it would strengthen the argument for role in circadian transcription in the SCN. It would be important in this context to perform a ChIP-seq experiment for ZFHX3 in the knockout animal (described from Figure 2 onwards) to eliminate the possibility of non-specific enrichment of signal from "open chromatin'. Alternatively, a ChIPseq analysis for BMAL1 or CLOCK could also strengthen this argument to identify the sites co-occupied by ZFHX3 and core-clock TFs. 

      (4a) We agree that follow-up experiments such as BMAL1/CLOCK ChIPseq suggested by the reviewer will further confirm the proposed interaction of ZFHX3 with core-clock TFs. However, this is beyond the scope of the current study. 

      (4b) Again, conducting complementary ChIPseq in ZFHX3 knockout mice will strengthen the findings, but conducting TF-ChIPseq in a specific brain tissue such as the SCN (unlike peripheral tissues such as liver) does not only warrant use of multiple animals per sample but is also technically challenging and time-consuming to ensure specificity of the sample. For these reasons, datasets such as ours on the SCN are uncommon. Furthermore, in this particular context, we are certain that, based on current dataset, the ZFHX3 peaks (narrow) we observed were well-defined and met the specified statistical criteria mitigating any risk of signal arising from non-specific enrichment from open-chromatin regions. 

      Next, they compared locomotor activity rhythms in floxed mice with or without tamoxifen treatment. As reported before in Wilcox et al 2017, the loss of ZFHX3 led to a shorter free running period and reduced amplitude and earlier onset of activity. Overall, the behavioral data in Figure 2 and supplementary figure 2 has been reported before and are not novel.

      (5) We recognise that a detailed circadian behavior assessment from adult mice lacking ZFHX3 has been conducted previously by Nolan lab (Wilcox et al; 2017). In the current study, however, we used a separate cohort of mice, to focus on the behavioral advance noted in 24-h LD cycle and generate a more refined assessment. Importantly, these mice were also used for transcriptomic studies as detailed in Figure 3, which we consider to be a positive feature of our experimental design: behavior and molecular analyses were performed on the same animals. 

      Next, the authors performed RNAseq at 4hr intervals on wildtype and knockout animals maintained in light/dark cycles to determine the impact of loss of ZFHX3. Overall transcriptomic analysis indicated changes in gene expression in nearly 36% of expressed genes, with nearly half being upregulated while an equal fraction was downregulated. Pathways affected included mostly neureopeptide neurotransmitter pathways. Surprisingly, there was no correlation between the direction in change in expression and TF binding since nearly all the sites were bound by ZFHX3 and the active histone PTMs. The ChIP-seq experiment for ZFHX3 in the UBC-Cre+Tam mice again could help resolve the real targets of ZFHX3 and the transcriptional state in knockout animals. 

      (6) We agree with the reviewer that most of the differentially expressed genes showed ZFHX3 binding at active promoter sites. That said, the current dataset is in line with recently published ZFHX3-CHIPseq data by Baca et al; 2024 [PMID: 38412861] in human neural stem cells and Hu et al; 2024 [PMID: 38871709] in human prostate cancer cells that clearly suggests ZFHX3 binds at active promoters and act as chromatin remodellers/mediators that modulate gene transcription depending on the accessory TFs assembled at target genes. Therefore, finding no correlation in the direction of change in expression is not striking.  

      To determine the fraction of rhythmic transcripts, Using dryR, the authors categorise the rhythmic transcriptome into modules that include genes that lose rhythmicity in the KO, gain rhythmicity in the KO or remain unaffected or partially affected. The analysis indicates that a large fraction of the rhythmic transcriptome is affected in the KO model. However, among core-clock genes only Bmal1 expression is affected showing a complete loss of rhythm. The authors state a decrease in Clock mRNA expression (line 294) but the panel figure 4A does not show this data. Instead it depicts the loss in Avp expression - {{ misstated in line 321 ( we noted severe loss in 24-h rhythm for crucial SCN neuropeptides such as Avp (Fig. 3a).}} 

      (7a) Indeed, among the core-clock genes rhythmic expression is lost after ZFHX3 knockout only for Bmal1. However, given the mice were rhythmic (as assessed by wheel-running activity) in LD conditions, the observed 24-h gene expression rhythm in the majority of core-clock genes (Pers and Crys)  is consistent with behavior data,  and suggests towards a molecular clock with plausible scenarios as explained at line 439. That said, the unique and well-defined changes (amplitude and phase) observed as demonstrated in Figure 5 highlights a model in which ZFHX3 exerts differential control, for example in case of Per2 noted advance in molecular rhythm (~2-h), but no such change in Cry, presents an opportunity to delineate further the regulation of TTFL genes. 

      (7b) Line 294 states- loss of Bmal1 rhythm and reduction in Clock mRNA . Figure 4a is in support of former. We shall revise the text for clarity. 

      (7c) As rightly pointed out by the reviewer, line 321 is referring to loss of Avp expression and we shall correct the typo by replacing “Figure 3a to 4a”. Thank you.  

      However, core-clock genes such as Pers and Crys show minor or no change in expression patterns while Per2 and Per3 show a ~2hr phase advance. While these could only weakly account for the behavioral phase advance, the authors used TimeTeller to assess circadian phase in wildtype and ZFHX3 deficient mice. This approach clearly indicated that while the clock is not disrupted in the knockout animals, the phase advance can be correctly predicted from a network of gene expression patterns. 

      Strengths: 

      The authors use a multiomic strategy in order to reveal the role of the ZFHX3 transcription factor with a combination of TF and histone PTM ChIPseq, time-resolved RNAseq from wildtype and knockout mice and modeling the transcriptomic data using TimeTeller. The RNAseq experiments are nicely controlled and the analysis of the data indicates a clear impact on gene-expression levels in the knockout mice and the presence of a regulatory network that could underlie the advanced activity onset behavior. 

      Weaknesses: 

      It is not clear whether ZFHX3 has a direct role in any of the processes and seems to be a general factor that marks H3K4me3 and K27ac marked chromatin. Why it would specifically impact the core-clock TTFL clock gene expression or indeed daily gene expression rhythms is not clear either. Details for treatment of different ChIP samples (ZFHX3 and histone PTM ChIPs) on data normalization for analysis are needed. The loss of complete rhythmicity of Avp and other neuropeptides or indeed other TFs could instead account for the transcriptional deregulation noted in the knockout mice.

      (8) We thank the reviewer for the constructive feedback.  The current data suggests ZFHX3 acts as a mediating factor, occupying targeted active promoter sites and regulating gene expression by partnering with other key TFs in the SCN. Please see point 7 for clarification. The binding sites of ZFHX3 clearly showed enrichment for E-box(CACGTG) motif bound by CLOCK/BMAL1 along with binding sites for key SCN-specific TFs such as RFX (please see Supplementary Fig1). Our data thereby shows that it affects both core-clock and clock output genes (at varied levels) thereby exercising a pervasive control over the SCN transcriptome. 

      For treatment of ChIP samples please see point 4. We followed ENCODE guidelines strictly.

    1. eLife Assessment

      This study answers an essential question about how migratory primordial germ cells mobilize based on their anterior or posterior location. Convincing data support the model that interactions of migrating mouse PGCs with their niches influences their proliferation, cytoskeletal regulation, epigenetic state and pluripotent state. Rigorous methodology and analysis of single-cell RNA sequencing of migratory primordial germ cells and surrounding somatic cells result in datasets which will be valuable to developmental biologists.