10,000 Matching Annotations
  1. Oct 2024
    1. Reviewer #2 (Public Review):

      M. El Amri et al., investigated the functions of Marcks and Marcks like 1 during spinal cord (SC) development and regeneration in Xenopus laevis. The authors rigorously performed loss of function with morpholino knock-down and CRISPR knock-out combining rescue experiments in developing spinal cord in embryo and regeneration in tadpole stage.

      For the assays in the developing spinal cord, a unilateral approach (knock-down/out only one side of the embryo) allowed the authors to assess the gene functions by direct comparing one-side (e.g. mutated SC) to the other (e.g. wild type SC on the other side). For the assays in regenerating SC, the authors microinject CRISPR reagents into 1-cell stage embryo. When the embryo (F0 crispants) grew up to tadpole (stage 50), the SC was transected. They then assessed neurite outgrowth and progenitor cell proliferation. The validation of the phenotypes was mostly based on the quantification of immunostaining images (neurite outgrowth: acetylated tubulin, neural progenitor: sox2, sox3, proliferation: EdU, PH3), that are simple but robust enough to support their conclusions. In both SC development and regeneration, the authors found that Marcks and Marcksl1 were necessary for neurite outgrowth and neural progenitor cell proliferation.<br /> The authors performed rescue experiments on morpholino knock-down and CRISPR knock-out conditions by Marcks and Marcksl1 mRNA injection for SC development and pharmacological treatments for SC development and regeneration. The unilateral mRNA injection rescued the loss-of-function phenotype in the developing SC. To explore the signalling role of these molecules, they rescued the loss-of-function animals by pharmacological reagents They used S1P: PLD activator, FIPI: PLD inhibitor, NMI: PIP2 synthesis activator and ISA-2011B: PIP2 synthesis inhibitor. The authors found the activator treatment rescued neurite outgrowth and progenitor cell proliferation in loss of function conditions. From these results, the authors proposed PIP2 and PLD are the mediators of Marcks and Marcksl1 for neurite outgrowth and progenitor cell proliferation during SC development and regeneration. The results of the rescue experiments are particularly important to assess gene functions in loss of function assays, therefore, the conclusions are solid. In addition, they performed gain-of-function assays by unilateral Marcks or Marcksl1 mRNA injection showing that the injected side of the SC had more neurite outgrowth and proliferative progenitors. The conclusions are consistent with the loss-of-function phenotypes and the rescue results. Importantly, the authors showed the linkage of the phenotype and functional recovery by behavioral testing, that clearly showed the crispants with SC injury swam less distance than wild types with SC injury at 10-day post surgery.<br /> Prior to the functional assays, the authors analyzed the expression pattern of the genes by in situ hybridization and immunostaining in developing embryo and regenerating SC. They confirmed that the amount of protein expression was significantly reduced in the loss of function samples by immunostaining with the specific antibodies that they made for Marcks and Marcksl1. Although the expression patterns are mostly known in previous works during embryo genesis, the data provided appropriate information to readers about the expression and showed efficiency of the knock-out as well.

      MARCKS family genes have been known to be expressed in the nervous system. However, few studies focus on the function in nerves. This research introduced these genes as new players during SC development and regeneration. These findings could attract broader interests from the people in nervous disease model and medical field. Although it is a typical requirement for loss of function assays in Xenopus laevis, I believe that the efficient knock-out for four genes by CRISPR/Cas9 was derived from their dedication of designing, testing and validation of the gRNAs and is exemplary.

      Weaknesses,<br /> 1) Why did the authors choose Marcks and Marcksl1?<br /> The authors mentioned that these genes were identified with a recent proteomic analysis of comparing SC regenerative tadpole and non-regenerative froglet (Line (L) 54-57). However, although it seems the proteomic analysis was their own dataset, the authors did not mention any details to select promising genes for the functional assays (this article). In the proteomic analysis, there must be other candidate genes that might be more likely factors related to SC development and regeneration based on previous studies, but it was unclear what the criteria to select Marcks and Marcksl1 was.

      2) Gene knock-out experiments with F0 crispants,<br /> The authors described that they designed and tested 18 sgRNAs to find the most efficient and consistent gRNA (L191-195). However, it cannot guarantee the same phenotypes practically, due to, for example, different injection timing, different strains of Xenopus laevis, etc. Although the authors mentioned the concerns of mosaicism by themselves (L180-181, L289-292) and immunostaining results nicely showed uniformly reduced Marcks and Marcksl1 expression in the crispants, they did not refer to this issue explicitly.

      3) Limitations of pharmacological compound rescue<br /> In the methods part, the authors describe that they performed titration experiments for the drugs (L702-704), that is a minimal requirement for this type of assay. However, it is known that a well characterized drug is applied, if it is used in different concentrations, the drug could target different molecules (Gujral TS et al., 2014 PNAS). Therefore, it is difficult to eliminate possibilities of side effects and off targets by testing only a few compounds.

    2. Reviewer #3 (Public Review):

      El Amri et al conducted an analysis on the function of marcks and marcksl in Xenopus spinal cord development and regeneration. Their study revealed these proteins are crucial for neurite outgrowth and cell proliferation, including Sox2+ progenitors. Furthermore, they suggested these genes may act through the PLD pathway. The study is well-executed with appropriate controls and validation experiments, distinguishing it from typical regeneration research by including behavioral assays. The manuscript is commendable for its quantifications, literature referencing, careful conclusions, and detailed methods. Conclusions are well-supported by the experiments performed in this study. Overall, this manuscript contributes to the field of spinal cord regeneration and sets a good example for future research in this area.

    1. eLife Assessment

      This important paper demonstrates that different PKA subtypes exhibit distinct subcellular localization at rest in CA1 neurons. The authors provide compelling evidence that when all tested PKA subtypes are activated by norepinephrine, catalytic subunits translocate to dendritic spines but regulatory subunits remain unmoved. Furthermore, PKA-dependent regulation of synaptic plasticity and transmission can be supported only by wildtype, dissociable PKA, but not by inseparable PKA.

    2. Reviewer #1 (Public review):

      Summary:

      This is a short self-contained study with a straightforward and interesting message. The paper focuses on settling whether PKA activation requires dissociation of the catalytic and regulatory subunits. This debate has been ongoing for ~ 30 years, with renewed interest in the question following a publication in Science, 2017 (Smith et al.). Here, Xiong et al demonstrate that fusing the R and C subunits together (in the same way as Smith et al) prevents the proper function of PKA in neurons. This provides further support for the dissociative activation model - it is imperative that researchers have clarity on this topic since it is so fundamental to building accurate models of localised cAMP signalling in all cell types. Furthermore, their experiments highlight that C subunit dissociation into spines is essential for structural LTP, which is an interesting finding in itself. They also show that preventing C subunit dissociation reduces basal AMPA receptor currents to the same extent as knocking down the C subunit. Overall, the paper will interest both cAMP researchers and scientists interested in fundamental mechanisms of synaptic regulation.

      Strengths:

      The experiments are technically challenging and well executed. Good use of control conditions e.g untransfected controls in Figure 4.

      Weaknesses:

      The novelty is lessened given the same team has shown dissociation of the C subunit into dendritic spines from RIIbeta subunits localised to dendritic shafts before (Tillo et al., 2017). Nevertheless, the experiments with RII-C fusion proteins are novel and an important addition.

    3. Reviewer #2 (Public review):

      Summary:

      PKA is a major signaling protein which has been long studied and is vital for synaptic plasticity. Here, the authors examine the mechanism of PKA activity and specifically focus on addressing the question of PKA dissociation as a major mode of its activation in dendritic spines. This would potentially allow to determine the precise mechanisms of PKA activation and address how it maintains spatial and temporal signaling specificity.

      Strengths:

      The results convincingly show that PKA activity is governed by the subcellular localization in dendrites and spines and is mediated via subunit dissociation. The authors make use of organotypic hippocampal slice cultures, where they use pharmacology, glutamate uncaging, and electrophysiological recordings.

      Overall, the experiments and data presented are well executed. The experiments all show that at least in the case of synaptic activity, distribution of PKA-C to dendritic spines is necessary and sufficient for PKA mediated functional and structural plasticity.<br /> The authors were able to persuasively support their claim that PKA subunit dissociation is necessary for its function and localization in dendritic spines. This conclusion is important to better understand the mechanisms of PKA activity and its role in synaptic plasticity.

      Weaknesses:

      While the experiments are indeed convincing and well executed, the data presented is similar to previously published work from the Zhong lab (Tillo et al., 2017, Zhong et al 2009). This reduces the novelty of the findings in terms of re-distribution of PKA subunits, which was already established, at least to some degree.

    4. Reviewer #3 (Public review):

      Summary:

      Xiong et al. investigated the debated mechanism of PKA activation using hippocampal CA1 neurons under pharmacological and synaptic stimulations. Examining all major PKA-R isoforms in these neurons, they found that a portion of PKA-C dissociates from PKA-R and translocate into dendritic spines following norepinephrine bath application. Additionally, their use of a non-dissociable form of PKA demonstrates its essential role in structural long-term potentiation (LTP) induced by two-photon glutamate uncaging, as well as in maintaining normal synaptic transmission, as verified by electrophysiology. This study presents a valuable finding on the activation-dependent re-distribution of PKA catalytic subunits in CA1 neurons, a process vital for synaptic functionality. The robust evidence provided by the authors makes this work particularly relevant for biologists seeking to understand PKA activation mechanisms, its downstream effects, and synaptic plasticity.

      Strengths:

      The study is methodologically robust, particularly in the application of two-photon imaging and electrophysiology. The experiments are well-designed with effective controls and a comprehensive analysis. The credibility of the data is further enhanced by the research team's previous works in related experiments. The study provides sufficient evidence to support the classical model of PKA activation via dissociation in neurons.

      Weaknesses:

      No specific weaknesses are noted in the current study; future research could provide additional insights by exploring PKA dissociation under varied physiological conditions, particularly in vivo, to further validate and expand upon these findings.

    5. Author response:

      The following is the authors’ response to the original reviews.

      New Experiments

      (1) Activation-dependent dynamics of PKA with the RIα regulatory subunit, adding to the answer to Reviewers 1 and 2. To determine the dynamics of all PKA isoforms, we have added experiments that used PKA-RIα as the regulatory subunit. We found differential translocation between PKA-C (co-expressed with PKA-RIα) and PKA-RIα (Figure 1–figure supplement 3), similar to the results when PKA-RIIα or PKA-RIβ was used.

      (2) PKA-C dynamics elicited by a low concentration of norepinephrine, addressing Reviewer 3’s comment. We have found that PKA-C (co-expressed with RIIα) exhibited similar translocation into dendritic spines in the presence of a 5x lowered concentration (2 μM) of norepinephrine, suggesting that the translocation occurs over a wide range of stimulus strengths (Figure 1-figure supplement 2).

      Reviewer #1 (Public Review):

      Summary:

      This is a short self-contained study with a straightforward and interesting message. The paper focuses on settling whether PKA activation requires dissociation of the catalytic and regulatory subunits. This debate has been ongoing for ~ 30 years, with renewed interest in the question following a publication in Science, 2017 (Smith et al.). Here, Xiong et al demonstrate that fusing the R and C subunits together (in the same way as Smith et al) prevents the proper function of PKA in neurons. This provides further support for the dissociative activation model - it is imperative that researchers have clarity on this topic since it is so fundamental to building accurate models of localised cAMP signalling in all cell types. Furthermore, their experiments highlight that C subunit dissociation into spines is essential for structural LTP, which is an interesting finding in itself. They also show that preventing C subunit dissociation reduces basal AMPA receptor currents to the same extent as knocking down the C subunit. Overall, the paper will interest both cAMP researchers and scientists interested in fundamental mechanisms of synaptic regulation.

      Strengths:

      The experiments are technically challenging and well executed. Good use of control conditions e.g untransfected controls in Figure 4.

      We thank the reviewer for their accurate summarization of the position of the study in the field and for the positive evaluation of our study.

      Weaknesses:

      The novelty is lessened given the same team has shown dissociation of the C subunit into dendritic spines from RIIbeta subunits localised to dendritic shafts before (Tillo et al., 2017). Nevertheless, the experiments with RII-C fusion proteins are novel and an important addition.

      We thank the reviewer for noticing our earlier work. The first part of the current work is indeed an extension of previous work, as we have articulated in the manuscript. However, this extension is important because recent studies suggested that the majority of PKA-RIIβ are axonal localized. The primary PKA subtypes in the soma and dendrite are likely PKA-RIβ or PKA-RIIα. Although it is conceivable that the results from PKA-RIIβ can be extended to the other subunits, given the current debate in the field regarding PKA dissociation (or not), it remains important to conclusively demonstrate that these other regulatory subunit types also support PKA dissociation within intact cells in response to a physiological stimulant. To complete the survey for all PKA-R isoforms, we have now added data for PKA-RIα (New Experiment #1), as they are also expressed in the brain (e.g., https://www.ncbi.nlm.nih.gov/gene/5573). Additionally, as the reviewer points out, our second part is a novel addition to the literature.

      Reviewer #2 (Public Review):

      Summary:

      PKA is a major signaling protein that has been long studied and is vital for synaptic plasticity. Here, the authors examine the mechanism of PKA activity and specifically focus on addressing the question of PKA dissociation as a major mode of its activation in dendritic spines. This would potentially allow us to determine the precise mechanisms of PKA activation and address how it maintains spatial and temporal signaling specificity.

      Strengths:

      The results convincingly show that PKA activity is governed by the subcellular localization in dendrites and spines and is mediated via subunit dissociation. The authors make use of organotypic hippocampal slice cultures, where they use pharmacology, glutamate uncaging, and electrophysiological recordings.

      Overall, the experiments and data presented are well executed. The experiments all show that at least in the case of synaptic activity, the distribution of PKA-C to dendritic spines is necessary and sufficient for PKA-mediated functional and structural plasticity.

      The authors were able to persuasively support their claim that PKA subunit dissociation is necessary for its function and localization in dendritic spines. This conclusion is important to better understand the mechanisms of PKA activity and its role in synaptic plasticity.

      We thank the reviewer for their positive evaluation of our study.

      Weaknesses:

      While the experiments are indeed convincing and well executed, the data presented is similar to previously published work from the Zhong lab (Tillo et al., 2017, Zhong et al 2009). This reduces the novelty of the findings in terms of re-distribution of PKA subunits, which was already established. A few alternative approaches for addressing this question: targeting localization of endogenous PKA, addressing its synaptic distribution, or even impairing within intact neuronal circuits, would highly strengthen their findings. This would allow us to further substantiate the synaptic localization and re-distribution mechanism of PKA as a critical regulator of synaptic structure, function, and plasticity.

      We thank the reviewer for noticing our earlier work. The first part of the current work is indeed an extension of previous work, as we have articulated in the manuscript. However, this extension is important because recent studies suggested that the majority of PKA-RIIβ are axonal localized. The primary PKA subtypes in the soma and dendrite are likely PKA-RIβ or PKA-RIIα. Although it is conceivable that the results from PKA-RIIβ can be extended to the other subunits, given the current debate in the field regarding PKA dissociation (or not), it remains important to conclusively demonstrate that these other regulatory subunit types also support PKA dissociation within intact cells in response to a physiological stimulant. To complete the survey for all PKA-R isoforms, we have now added data for PKA-RIα (New Experiment #1), as they are also expressed in the brain (e.g., https://www.ncbi.nlm.nih.gov/gene/5573). Additionally, as Reviewer 1 points out, our second part is a novel addition to the literature.

      We also thank the reviewer for suggesting the experiments to examine PKA’s synaptic localization and dynamics as a key mechanism underlying synaptic structure and function. We agree that this is a very interesting topic. At the same time, we feel that this mechanistic direction is open ended at this time and beyond what we try to conclude within this manuscript: prevention of PKA dissociation in neurons affects synaptic function. Therefore, we will save the suggested direction for future studies. We hope the reviewer understand.

      Reviewer #3 (Public Review):

      Summary:

      Xiong et al. investigated the debated mechanism of PKA activation using hippocampal CA1 neurons under pharmacological and synaptic stimulations. Examining the two PKA major isoforms in these neurons, they found that a portion of PKA-C dissociates from PKA-R and translocates into dendritic spines following norepinephrine bath application. Additionally, their use of a non-dissociable form of PKC demonstrates its essential role in structural long-term potentiation (LTP) induced by two-photon glutamate uncaging, as well as in maintaining normal synaptic transmission, as verified by electrophysiology. This study presents a valuable finding on the activation-dependent re-distribution of PKA catalytic subunits in CA1 neurons, a process vital for synaptic functionality. The robust evidence provided by the authors makes this work particularly relevant for biologists seeking to understand PKA activation and its downstream effects essential for synaptic plasticity.

      Strengths:

      The study is methodologically robust, particularly in the application of two-photon imaging and electrophysiology. The experiments are well-designed with effective controls and a comprehensive analysis. The credibility of the data is further enhanced by the research team's previous works in related experiments. The conclusions of this paper are mostly well supported by data. The research fills a significant gap in our understanding of PKA activation mechanisms in synaptic functioning, presenting valuable insights backed by empirical evidence.

      We thank the reviewer for their positive evaluation of our study.

      Weaknesses:

      The physiological relevance of the findings regarding PKA dissociation is somewhat weakened by the use of norepinephrine (10 µM) in bath applications, which might not accurately reflect physiological conditions. Furthermore, the study does not address the impact of glutamate uncaging, a well-characterized physiologically relevant stimulation, on the redistribution of PKA catalytic subunits, leaving some questions unanswered.

      We agreed with the Reviewer that testing under physiological conditions is critical especially given the current debate in the literature. That is why we tested PKA dynamics induced by the physiological stimulant, norepinephrine. It has been suggested that, near the release site, local norepinephrine concentrations can be as high as tens of micromolar (Courtney and Ford, 2014). Based on this study, we have chosen a mid-range concentration (10 μM). At the same time, in light of the Reviewer’s suggestion, we have now also tested PKA-RIIα dissociation at a 5x lower concentration of norepinephrine (2 μM; New Experiment #2). The activation and translocation of PKA-C is also readily detectible under this condition to a degree comparable to when 10 μM norepinephrine was used.

      Regarding the suggested glutamate uncaging experiment, it is extremely challenging because of finite signal-to-noise ratios in our experiments. From our past studies, we know that activated PKA-C can diffuse three dimensionally, with a fraction as membrane-associated proteins and the other as cytosolic proteins. Although we have evidence that its membrane affinity allows it to become enriched in dendritic spines, it is not known (and is unlikely) that activated PKA-C is selectively targeted to a particular spine. Glutamate uncaging of a single spine presumably would locally activate a small number of PKA-C. It will be very difficult to trace the 3D diffusion of these small number of molecules in the presence of surrounding resting-state PKA-C molecules. Finally, we hope the reviewer agrees that, regardless of the result of the glutamate uncaging experiment, the above new experiment (New Experiment #2) already indicate that certain physiologically relevant stimuli can drive PKA-C dissociation from PKA-R and translocation to spines, supporting our conclusion.

      Reviewer #2 (Recommendations For The Authors):

      It was a pleasure reading your paper, and the results are well-executed and well-presented.

      My main and only recommendations are two ways to further expand the scope of the findings.

      First, I believe addressing the endogenous localization of PKA-C subunit before and after PKA activation would be highly important to validate these claims. Overexpression of tagged proteins often shows vastly different subcellular distribution than their endogenous counterparts. Recent technological advances with CRISPR/Cas9 gene editing (Suzuki et al Nature 2016 and Gao et al Neuron 2019 for example) which the Zhong lab recently contributed to (Zhong et al 2021 eLife) allow us to tag endogenous proteins and image them in fixed or live neurons. Any experiments targeting endogenous PKA subunits that support dissociation and synaptic localization following activation would be very informative and greatly increase the novelty and impact of their findings.

      We agreed that addressing the endogenous PKA dynamics is important. However, despite recent progress, endogenous labeling using CRISPR-based methods remains challenging and requires extensive optimization. This is especially true for signaling proteins whose endogenous abundance is often low. We have tried to label PKA catalytic subunits and regulatory subunits using both the homologous recombination-based method SLENDR and our own non-homologous end joining-based method CRISPIE. We did not succeed, in part because it is very difficult to see any signal under wide-field fluorescence conditions, which makes it difficult to screen different constructs for optimizing parameters. It is also possible that, at the endogenous abundance, the label is just not bright enough to be seen. Nevertheless, for both PKA type Iβ and type IIα that we studied in this manuscript, we have correlated the measured parameters (specifically, Spine Enrichment Index or SEI) with the overexpression level (Figure 1-figure supplement 1). We found that they are not strongly correlated with the expression level under our conditions. By extrapolating to non-overexpression conditions, our conclusion remains valid.

      To overcome the inability to label endogenous PKA subunits using CRISPR-based methods, we have also attempted a conditional knock-in method call ENABLED that we previously developed to label PKA-Cα. In preliminary results, we found that endogenously label PKA were very dim. However, in a subset of cells that are bright enough to be quantified, the PKA catalytic subunit indeed translocated to dendritic spines upon stimulation (see Additional Fig. 1 in the next page), corroborating our results using overexpression. These results, however, are not ready to be published because characterization of the mouse line takes time and, at this moment, the signal-to-noise ratio remains low. We hope that the reviewer can understand.

      Author response image 1.

      Endogeneous PKA-Cα translocate to dendritic spines upon activation.

      Second, experiments which would advance and validate these findings in vivo would be highly valuable. This could be achieved in a number of ways - one would be overexpression of tagged PKA versions and examining sub-cellular distribution before and after physiological activation in vivo. Another possibility is in vivo perturbation - one would speculate that disruption or tethering of PKA subunits to the dendrite would lead to cell-specific functional and structural impairments. This could be achieved in a similar manner to the in vitro experiments, with a PKA KO and replacement strategy of the tethered C-R plasmid, followed by structural or functional examination of neurons.

      I would like to state that these experiments are not essential in my opinion, but any improvements in one of these directions would greatly improve and extend the impact and findings of this paper.

      We thank the reviewer for the suggestion and the understanding. The suggested in vivo experiments are fascinating. However, in vivo imaging of dendritic spine morphology is already in itself challenging. The difficulty greatly increases when trying to detect partial, likely transient translocation of a signaling protein. It is also very difficult to knock down endogenous PKA while simultaneously expressing the R-C construct in a large number of cells to achieve detectable circuit or behavioral effect (and hope that compensation does not happen over weeks). We hope the reviewer agrees that these experiments would be their own project and go beyond the time and scope of the current study.

      Reviewer #3 (Recommendations For The Authors):

      Please elaborate on the methods used to visualize PKA-RIIα and PKA-RIβ subunits.

      As suggested, we have now included additional details for visualizing PKA-Rs in the text. Specifically, we write (pg. 5): “…, as visualized using expressed PKA-R-mEGFP in separate experiments (Figs. 1A-1C).”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors examined the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). Using all-atom molecular dynamics simulations, they identified four distinct classes of salt dependence in the phase separation of intrinsically disordered proteins (IDPs), which can be predicted based on their amino acid composition. However, the simulations and analysis, in their current form, are inadequate and incomplete. 

      Strengths: 

      The authors attempt to unravel the mechanistic insights into the interplay between salt and protein phase separation, which is important given the complex behavior of salt effects on this process. Their effort to correlate the influence of salt on the low-complexity domain of hnRNPA1 (A1-LCD) with a range of other proteins known to undergo salt-dependent phase separation is an interesting and valuable topic. 

      Weaknesses: 

      (1) The simulations performed are not sufficiently long (Figure 2A) to accurately comment on phase separation behavior. The simulations do not appear to have converged well, indicating that the system has not reached a steady state, rendering the analysis of the trajectories unreliable.

      We have extended the simulations for an additional 500 ns, to 1500 ns. The last 500 ns show reasonably good convergence (see Figure 2A).

      (2) The majority of the data presented shows no significant alteration with changes in salt concentration. However, the authors have based conclusions and made significant comments regarding salt activities. The absence of error bars in the data representation raises questions about its reliability. Additionally, the manuscript lacks sufficient scientific details of the calculations.  

      We have now included error bars. With the error bars, the salt dependences of all the calculated properties (exception for Rg) show a clear trend. Additionally, we have expanded the descriptions of our calculations (p. 15-16).

      (3) In Figures 2B and 2C, the changes in the radius of gyration and the number of contacts do not display significant variations with changes in salt concentration. The change in the radius of gyration with salt concentration is less than 1 Å, and the number of contacts does not change by at least 1. The authors' conclusions based on these minor changes seem unfounded. 

      The variation of ~ 1 Å for the calculated Rg is similar to the counterpart for the experimental Rg. As for the number of contacts, note that this property is presented on a per-residue basis, so a value of 1 means that each residue picks up one additional contact, or each protein chain gains a total of 131 contacts, when the salt concentration is increased from 50 to 1000 mM.

      Reviewer #2 (Public Review): 

      This is an interesting computational study addressing how salt affects the assembly of biomolecular condensates. The simulation data are valuable as they provide a degree of atomistic details regarding how small salt ions modulate interactions among intrinsically disordered proteins with charged residues, namely via Debye-like screening that weakens the effective electrostatic interactions among the polymers, or through bridging interactions that allow interactions between like charges from different polymer chains to become effectively attractive (as illustrated, e.g., by the radial distribution functions in Supplementary Information). However, this manuscript has several shortcomings: 

      (i) Connotations of the manuscript notwithstanding, many of the authors' concepts about salt effects on biomolecular condensates have been put forth by theoretical models, at least back in 2020 and even earlier. Those earlier works afford extensive information such as considerations of salt concentrations inside and outside the condensate (tie-lines). But the authors do not appear to be aware of this body of prior works and therefore missed the opportunity to build on these previous advances and put the present work with its complementary advantages in structural details in the proper context.

      (ii) There are significant experimental findings regarding salt effects on condensate formation [which have been modeled more recently] that predate the A1-LCD system (ref.19) addressed by the present manuscript. This information should be included, e.g., in Table 1, for sound scholarship and completeness. 

      (iii) The strengths and limitations of the authors' approach vis-à-vis other theoretical approaches should be discussed with some degree of thoroughness (e.g., how the smallness of the authors' simulation system may affect the nature of the "phase transition" and the information that can be gathered regarding salt concentration inside vs. outside the "condensate" etc.). Accordingly, this manuscript should be revised to address the following. In particular, the discussion in the manuscript should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      (1) The ability to use atomistic models to address the questions at hand is a strength of the present work. However, presumably because of the computational cost of such models, the "phase-separated" "condensates" in this manuscript are extremely small (only 8 chains). An inspection of Fig.1 indicates that while the high-salt configuration (snapshot, bottom right) is more compact and droplet-like than the low-salt configuration (top right), it is not clear that the 50 mM NaCl configuration can reasonably correspond to a dilute or homogeneous phase (without phase separation) or just a condensate with a lower protein concentration because the chains are still highly associated. One may argue that they become two droplets touching each other (the chains are not fully dispersed throughout the simulation box, unlike in typical coarse-grained simulations of biomolecular phase separation). While it may not be unfair to argue from this observation that the condensed phase is less stable at low salt, this raises critical questions about the adequacy of the approach as a stand-alone source of theoretical information. Accordingly, an informative discussion of the limitation of the authors' approach and comparisons with results from complementary approaches such as analytical theories and coarsegrained molecular dynamics will be instructive-even imperative, especially since such results exist in the literature (please see below). 

      We now discuss the limitations of our all-atom simulations and also other approaches (p. 13; see below).

      (2) The aforementioned limitation is reflected by the authors' choice of using Dmax as a sort of phase separation order parameter. However, no evidence was shown to indicate that Dmax exhibits a twostate-like distribution expected of phase separation. It is also not clear whether a Dmax value corresponding to the linear dimension of the simulation box was ever encountered in the authors' simulated trajectories such that the chains can be reliably considered to be essentially fully dispersed as would be expected for the dilute phase. Moreover, as the authors have noted in the second paragraph of the Results, the variation of Dmax with simulation time does not show a monotonic rank order with salt concentration. The authors' explanation is equivalent to stipulating that the simulation system has not fully equilibrated, inevitably casting doubt on at least some of the conclusions drawn from the simulation data. 

      First off, with the extended simulations, the Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). Secondly, as we now state (p. 13), our low-salt simulations mimic a homogenous solution whereas our high-salt simulations mimic the dense phase of a phase-separated system. The intermediate-salt simulations also mimic the dense phase but at a somewhat lower concentration (hence the intermediate Dmax value).

      (3) With these limitations, is it realistic to estimate possible differences in salt concentration between the dilute and condensed phases in the present work? These features, including tie-lines, were shown to be amenable to analytical theory and coarse-grained molecular dynamics simulation (please see below).  

      The differences in salt effects that we report do not represent those between two phases. Rather, as explained in the preceding reply, they represent differences between a homogenous solution at low salt and the dense phase at higher salt. We also acknowledge salt effects calculated by analytical theory and coarse-grained simulations (p. 13).

      (4) In the comparison in Fig.2B between experimental and simulated radius of gyration as a function of [NaCl], there is an outlier among the simulated radii of gyration at [NaCl] ~ 250 mM. An explanation should be offered.  

      After extending the simulations and analyzing the last 500 ns, the Rg data no longer show an outlier though still have some fluctuations from one salt concentration to another.

      (5) The phenomenon of no phase separation at zero and low salt and phase separation at higher salt has been observed for the IDP Caprin1 and several of its mutants [Wong et al., J Am Chem Soc 142, 24712489 (2020) [https://pubs.acs.org/doi/full/10.1021/jacs.9b12208], see especially Fig.9 of this reference]. This work should be included in the discussion and added to Table 1. 

      We now have added Caprin1 to Table 1 (new ref 26) and discuss this paper (p. 13).

      (6) The authors stated in the Introduction that "A unifying understanding of how salt affects the phase separation of IDPs is still lacking". While it is definitely true that much remains to be learned about salt effects on IDP phase separation, the advances that have already been made regarding salt effects on IDP phase separation is more abundant than that conveyed by this narrative. For instance, an analytical theory termed rG-RPA was put forth in 2020 to provide a uniform (unified) treatment of salt, pH, and sequence-charge-pattern effects on polyampholytes and polyelectrolytes (corresponding to the authors' low net charge and high net charge cases). This theory offers a means to predict salt-IDP tie-lines and a comprehensive account of salt effect on polyelectrolytes resulting in a lack of phase separation at extremely low salt and subsequent salt-enhanced phase separation (similar to the case the authors studied here) and in some cases re-entrant phase separation or dissolution [Lin et al., J Chem Phys 152. 045102 (2020) [https://doi.org/10.1063/1.5139661]]. This work is highly relevant and it already provided a conceptual framework for the authors' atomistic results and subsequent discussion. As such, it should definitely be a part of the authors' discussion. 

      We now cite this paper (new ref 34) in Introduction (p. 4). We also discuss its results for Caprin1 (new ref 18; p. 13).

      (7) Bridging interactions by small ions resulting in effective attractive interactions among polyelectrolytes leading to their phase separation have been demonstrated computationally by Orkoulas et al., Phys Rev Lett 90, 048303 (2003) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.90.048303]. This result should also be included in the discussion. 

      We now cite this paper (new ref 41; p. 11).

      (8) More recently, the salt-dependent phase separations of Caprin1, its RtoK variants and phosphorylated variant (see item #5 above) were modeled (and rationalized) quite comprehensively using rG-RPA, field-theoretic simulation, and coarse-grained molecular dynamics [Lin et al., arXiv:2401.04873 [https://arxiv.org/abs/2401.04873]], providing additional data supporting a conceptual perspective put forth in Lin et al. J Chem Phys 2020 (e.g., salt-IDP tie-lines, bridging interactions, reentrance behaviors etc.) as well as in the authors' current manuscript. It will be very helpful to the readers of eLife to include this preprint in the authors' discussion, perhaps as per the authors' discretion along the manner in which other preprints are referenced and discussed in the current version of the manuscript. 

      We now cite this paper (new ref 18) and discuss it along with new ref 26 in Discussion (p. 13).

      Reviewer #3 (Public Review): 

      Summary: 

      This study investigates the salt-dependent phase separation of A1-LCD, an intrinsically disordered region of hnRNPA1 implicated in neurodegenerative diseases. The authors employ all-atom molecular dynamics (MD) simulations to elucidate the molecular mechanisms by which salt influences A1-LCD phase separation. Contrary to typical intrinsically disordered protein (IDP) behavior, A1-LCD phase separation is enhanced by NaCl concentrations above 100 mM. The authors identify two direct effects of salt: neutralization of the protein's net charge and bridging between protein chains, both promoting condensation. They also uncover an indirect effect, where high salt concentrations strengthen pi-type interactions by reducing water availability. These findings provide a detailed molecular picture of the complex interplay between electrostatic interactions, ion binding, and hydration in IDP phase separation. 

      Strengths: 

      Novel Insight: The study challenges the prevailing view that salt generally suppresses IDP phase separation, highlighting A1-LCD's unique behavior. 

      Rigorous Methodology: The authors utilize all-atom MD simulations, a powerful computational tool, to investigate the molecular details of salt-protein interactions. 

      Comprehensive Analysis: The study systematically explores a wide range of salt concentrations, revealing a nuanced picture of salt effects on phase separation. 

      Clear Presentation: The manuscript is well-written and logically structured, making the findings accessible to a broad audience. 

      Weaknesses: 

      Limited Scope: The study focuses solely on the truncated A1-LCD, omitting simulations of the full-length protein. This limitation reduces the study's comparative value, as the authors note that the full-length protein exhibits typical salt-dependent behavior. A comparative analysis would strengthen the manuscript's conclusions and broaden its impact.

      Perhaps we did not impress on the reviewer how expensive the all-atom MD simulations on A1-LCD were: the systems each contained half a million atoms and the simulations took many months to complete. That said, we agree with the reviewer that, ideally, a comparative study on a protein showing the typical screening class of salt dependence would have made our work more complete. However, we are confident of the conclusions for several reasons. First, the three salt effects – charge neutralization, bridging, and strengthening of pi-types of interactions – revealed by the all-atom simulations are physically sound and well-supported by other studies. Second, these effects led us to develop a unified picture for the salt dependence of homotypic phase separation, in the form of a predictor for the classes of salt dependence based on amino-acid composition. This predictor works well for nearly 30 proteins. Third, recent studies using analytical theory and coarse-grained simulations (new ref 18) also strongly support our conclusions.

      Reviewer #1 (Recommendations For The Authors): 

      (1) In Figure 1, the color scheme should be updated and the figure remade, as the current set of color choices makes it very difficult to distinguish the magenta spheres.  

      We have increased the sizes of ions in Figure 1 to make them distinguishable.

      (2) Within the framework of atomistic simulations, the influence of salt concentration alteration on protein conformational plasticity is worth investigating. This could be correlated (with proper details) with the effect of salt-concentration-modulated protein aggregation behavior. 

      We now use RMSF to measure conformational plasticity, which shows a clear salt-dependent trend with a 27% reduction in fluctuations from 50 mM to 1000 mM NaCl (new Fig. S1).

      (3) The authors should mention the protein concentrations employed in the simulations and whether these are consistent with experimentally used concentrations.  

      We have mentioned the initial concentration (3.5 mM). We now further state that this concentration is maintained in the low-salt simulations, indicating absence of phase separation, but is increased to 23 mM in the high-salt simulations, indicating phase separation. The latter value is consistent with the measured concentrations in the dense phase (last two paragraphs of p. 5).

      (4) It would be useful to test the salt effect for at least two extreme salt concentrations at various protein concentrations, consistent with experimental protein concentration ranges.  

      In simulation studies of short peptides (ref 37), we have shown that the initial concentration does not affect the final concentration in the dense phase, as expected for phase-separation systems. We expect that the same will be true for the A1-LCD system at intermediate and high salt where phase separation occurs. Though this expectation could be tested by simulations at a different initial protein concentration, such simulations would be expensive but unlikely to yield new physical insight.

      (5) Importantly, the simulations do not appear to have converged well enough (Figure 2A). The authors should extend the simulation trajectories to ensure the system has reached a steady state.  

      We extended the simulations for an additional 500 ns, which now appear to show convergence. In Figure 2A we now see Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). 

      (6) The authors mention "phase separation" in the title, but with only a 1 μs simulation trajectory, it is not possible to simulate a phenomenon like phase separation accurately. Since atomistic simulations cannot realistically capture phase separation on this timescale, a coarse-grained approach is more suitable. To properly explore salt effects in the context of phase separation, long timescale simulation trajectories should be considered. Otherwise, the data remain unreliable. 

      Our all-atom simulations revealed rich salt effects that might have been missed in coarse-grained simulations. It is true that coarse-grained models allow the simulations of the phase separation process, but as we have recently demonstrated (refs 36 and 37), all-atom simulations on the μs timescale are also able to capture the spontaneous phase separation of peptides and small IDPs. A1-LCD is much larger than those systems, so we had to use a relatively small chain number (8 chains here vs 64 used in ref 37 and 16 used in ref 37). S2ll, we observe the condensation into a dense phase at high salt. We discuss the pros and cons of all-atom vs. coarse-grained simulations in p. 13.

      (7) In Figure 5E, the plot does not show that g(r) has reached 1. If it does, the authors should show the full curve. The same issue remains with supplementary figures 1, 2, 3, etc.  

      We now show the approach to 1 in the insets of Figs. S2, S3, S4, and 5E.

      (8) None of the data is represented with error bars. The authors should include error bars in their data representations. 

      We have now included error bars in all graphs that report average values.

      (9) The authors state that "the net charge of the system reduces to only +8 at 1000 mM NaCl (Figure 3C)" but do not explain how this was calculated. 

      We now add this explanation in methods (p. 16).

      (10). The authors mention "similar to the role played by ATP molecules in driving phase separation of positively charged IDPs." However, ATP can inhibit aggregation, and its induction of phase separation is concentration-dependent. Given ATP's large aromatic moiety, its comparison to ions is not straightforward and is more complex. This comparison can be at best avoided. 

      In this context we are comparing the bridging capability of ATP molecules in driving phase separation of positively charged IDPs in ref 36 to the bridging capability of the ions here. In ref 36 the authors show ATP bridging interactions between protein chains similar to what we show here with ions.

      (11) Many calculations are vaguely represented. The process for calculating the number of bridging ions, for example, is not well documented. The authors should provide sufficient details to allow for the reproducibility of the data. 

      We have now expanded the methods section to include more detailed information on calculations done.

      Reviewer #3 (Recommendations For The Authors): 

      Include error bars or standard deviations for all results averaged over four replicates, particularly for the number of ions and contacts per residue. This would provide a clearer picture of the data's reliability and variability. 

      We have now included error bars in all graphs that report averaged values.

      Strengthen the support for the conclusion that "each Arg sidechain often coordinates two Cl- ions, multiple backbone carbonyls often coordinate a single Na+ ion." While Fig. 3A clearly demonstrates ArgCl- coordination, the Na+ coordination claim for a 131-residue protein requires further clarification. Consider including the integration profile of radial distribution functions for Na+ ions to bolster this assertion. 

      We now report the number of Na+ ions that coordinate with multiple backbone carbonyls (p. 7) as well as the number of Na+ ions that bridge between A1-LCD chains via coordination with multiple backbone carbonyls (p. 9). Please note that Figure 4A right panel displays an example of Na+ coordinating with multiple backbone carbonyls.

      Address the following typographical errors in the main text: o Page 11, line 25: "distinct classes of sat dependence" should be "distinct classes of salt dependence" o Page 14, line 9: "for Cl- and 3.0 and 5.4 A" should be "for Cl- and 3.0 and 5.4 √Ö" o Page 14, line 18: "As a control, PRDFs for water were also calculated" should be "As a control, RDFs for water were also calculated" (assuming PRDF was meant to be RDF) 

      We have now corrected these typos.

      Consider expanding the study to include simulations of the full-length protein to provide a more comprehensive comparison between the truncated A1-LCD and the complete protein's behavior in various salt concentrations. 

      As we explained above, even with eight chains of A1-LCD, which has 131 residues, the systems already contain half a million atoms each and the all-atom simulations took many months to complete. Full-length A1 has 314 residues so a multi-chain system would be too large to be feasible for all-atom simulations.

    2. eLife Assessment

      In this potentially important study, the authors conducted atomistic simulations to probe the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). The authors have identified both direct and indirect mechanisms of salt modulation, provided explanations for four distinct classes of salt dependence, and proposed a model for predicting protein properties from amino acid composition. There is a range of opinions regarding the strength of evidence, with some considering the evidence as incomplete due to the limitations in the length and statistical errors of the computationally intense atomistic MD simulations.

    3. Reviewer #1 (Public review):

      Summary:

      The authors examined the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). Using all-atom molecular dynamics simulations, they identified four distinct classes of salt dependence in the phase separation of intrinsically disordered proteins (IDPs), which can be predicted based on their amino acid composition. However, the simulations and analysis, in their current form, are inadequate and incomplete.

      Strengths:

      The authors attempt to unravel the mechanistic insights into the interplay between salt and protein phase separation, which is important given the complex behavior of salt effects on this process. Their effort to correlate the influence of salt on the low-complexity domain of hnRNPA1 (A1-LCD) with a range of other proteins known to undergo salt-dependent phase separation is an interesting and valuable topic.

      Weaknesses:

      Based on the reviewer's assessment of the manuscript, the following points were raised:

      (1) The simulation duration is too short to draw comprehensive conclusions about phase separation.<br /> (2) There are concerns regarding the convergence of the simulations, particularly as highlighted in Figure 2A.<br /> (3) The simulation begins with a protein concentration of 3.5 mM ("we built an 8-copy model for the dense phase (with an initial concentration of 3.5 mM)"), which is high for phase separation studies. The reviewer questions the use of the term "dense phase" and suggests that the authors conduct a clearer analysis depicting the coexistence of both the dilute and dense phases to represent a steady state. Without this, the realism of the described phenomena is doubtful. Commenting on phase separation under conditions that don't align with typical phase separation parameters is not acceptable.<br /> (4) The inference that "Each Arg sidechain often coordinates two Cl- ions simultaneously, but each Lys sidechain coordinates only one Cl- ion" is questioned. According to Supplementary Figure 2A, Lys seems to coordinate with Cl- ions more frequently than Arg.<br /> (5) The authors are requested to update the figure captions for Supplementary Figures 2 and 3, specifying which system the analyses were performed on.<br /> (6) It is difficult to observe a clear trend due to irregularities in the data. Although the authors have included a red dotted line in the figures, the trend is not monotonic. The reviewer expresses concerns about significant conclusions drawn from these figures (e.g., Figure 2C, Figure 5A, Supplementary Figure 1).<br /> (7) Given the error in the radius of gyration (Rg) calculations, the reviewer questions the validity of drawing conclusions from this data.<br /> (8) The pair correlation function values in Figure 5E and supplementary figure 4 show only minor differences, and the reviewer questions whether these differences are significant.<br /> (9) Previous reports suggest that, upon self-assembly, protein chains extend within the condensate, leading to a decrease in intramolecular contacts. However, the authors show an increase in intramolecular contacts with increasing salt concentration (Figure 2C), which contradicts prior studies. The reviewer advises the authors to carefully review this and provide justification.<br /> (10) A systematic comparison of estimated parameters with varying salt concentrations is required. Additionally, the authors should provide potential differences in salt concentrations between the dilute and condensed phases.<br /> (11) The reviewer finds that the majority of the data presented shows no significant alteration with changes in salt concentration, yet the authors have made strong conclusions regarding salt activity.

      The manuscript lacks sufficient scientific details of the calculations.

    4. Reviewer #2 (Public review):

      This is an interesting computational study addressing how salt affects the assembly of biomolecular condensates. The simulation data are valuable as they provide a degree of atomistic details regarding how small salt ions modulate interactions among intrinsically disordered proteins with charged residues, namely via Debye-like screening that weakens the effective electrostatic interactions among the polymers, or through bridging interactions that allow interactions between like charges from different polymer chains to become effectively attractive (as illustrated, e.g., by the radial distribution functions in Supplementary Information). However, this manuscript has several shortcomings: (i) Connotations of the manuscript notwithstanding, many of the authors' concepts about salt effects on biomolecular condensates have been put forth by theoretical models, at least back in 2020 and even earlier. Those earlier works afford extensive information such as considerations of salt concentrations inside and outside the condensate (tie-lines). But the authors do not appear to be aware of this body of prior works and therefore missed the opportunity to build on these previous advances and put the present work with its complementary advantages in structural details in the proper context. (ii) There are significant experimental findings regarding salt effects on condensate formation [which have been modeled more recently] that predate the A1-LCD system (ref.19) addressed by the present manuscript. This information should be included, e.g., in Table 1, for sound scholarship and completeness. (iii) The strengths and limitations of the authors' approach vis-à-vis other theoretical approaches should be discussed with some degree of thoroughness (e.g., how the smallness of the authors' simulation system may affect the nature of the "phase transition" and the information that can be gathered regarding salt concentration inside vs. outside the "condensate" etc.).

      Comments on revised version:

      The authors have adequately addressed my previous concerns and suggestions. The manuscript is now significantly improved. The new results and analyses provided by the authors represent a substantial advance in our understanding of the role of electrostatics in the assembly of biomolecular condensates.

    5. Reviewer #3 (Public review):

      Summary:

      This study investigates the salt-dependent phase separation of A1-LCD, an intrinsically disordered region of hnRNPA1 implicated in neurodegenerative diseases. The authors employ all-atom molecular dynamics (MD) simulations to elucidate the molecular mechanisms by which salt influences A1-LCD phase separation. Contrary to typical intrinsically disordered protein (IDP) behavior, A1-LCD phase separation is enhanced by NaCl concentrations above 100 mM. The authors identify two direct effects of salt: neutralization of the protein's net charge and bridging between protein chains, both promoting condensation. They also uncover an indirect effect, where high salt concentrations strengthen pi-type interactions by reducing water availability. These findings provide a detailed molecular picture of the complex interplay between electrostatic interactions, ion binding, and hydration in IDP phase separation.

      Strengths:

      • Novel Insight: The study challenges the prevailing view that salt generally suppresses IDP phase separation, highlighting A1-LCD's unique behavior.<br /> • Rigorous Methodology: The authors utilize all-atom MD simulations, a powerful computational tool, to investigate the molecular details of salt-protein interactions.<br /> • Comprehensive Analysis: The study systematically explores a wide range of salt concentrations, revealing a nuanced picture of salt effects on phase separation.<br /> • Clear Presentation: The manuscript is well-written and logically structured, making the findings accessible to a broad audience.

      Weaknesses:

      • Limited Scope: The study focuses solely on the truncated A1-LCD, omitting simulations of the full-length protein. This limitation reduces the study's comparative value, as the authors note that the full-length protein exhibits typical salt-dependent behavior. However, given the much larger size of the full-length protein, it is acceptable to omit it given the current computing resources available.

      Overall, this manuscript represents a significant contribution to the field of IDP phase separation. The authors' findings provide valuable insights into the molecular mechanisms by which salt modulates this process, with potential implications for understanding and treating neurodegenerative diseases.

    1. eLife Assessment

      This manuscript presents a valuable new quantitative crosslinking mass spectrometry approach using novel isobaric crosslinkers. The data are solid and the method has potential for a broad application in structural biology if more isobaric crosslinking channels are available and the quantitative information of the approach is exploited in more depth.

    2. Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

      Comments on latest version:

      I raised only two points which they have not addressed: Higher multiplexing of Qlinkers (1) and experiments to assess the statistical power of their quantitation strategy (2).

      I can see that point (1) requires substantial experimental efforts and synthesis of novel Qlinkers would be months of work. This is an editorial decision if the limited quantitative power of the "2-plex" approach they have right now is sufficient to support publication in eLife. While I like the approach, I feel it falls short of its potential in its current form.

      For point (2), the authors did not do any supporting experiments. They claim "higher plex Qlinkers" would need to be available, but I suggested experiments that can be done even with Q2linkers: Using one of the two channels as a reference channel (similar the Super-SILAC strategy published in 2010 by Geiger et al; using an isotope-labeled channel as a stable reference channel between different experiments and LC-MS runs), they could do time-courses or ligand-concentration-series with the other channel and then show that Qlinkers allow quantitative monitoring of the different populations (e.g. conformations or ligand-bound proteins).

      As an additional point, I was a bit surprised to read that the quantitation evaluation in Figure 1 is based on a single experiment (reviewer response document page 6, line 2 in the authors' reply). I strongly suggest this to be repeated a few times so a proper statistical test on experimental reproducibiltiy of Qlinkers can be conducted.

      In summary, the authors declined to do any experimental work to address my concerns.

    3. Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes.

      Comments on latest version:

      The authors have not really addressed most of the concerns. They have added minimal discussion points to the text. This is okay from my perspective as eLife's policy is to leave it up to the authors of how strongly to consider the reviewers' comments. I should add that I do fully agree with the other reviewer that the quantitative assessment from Figure 1 should have been done in triplicates at least and that this would actually be essential.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      We agree that a multiplexed Qlinker approach would be very useful. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

      We agree that multiplexed Qlinkers would open the door to exciting avenues of investigation such as studying conformational state populations.  We plan to conduct the suggested experiments when multiplexed Qlinkers are available.

      Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      As the goal of the experiments is to maximize identification of crosslinked peptides which tend to have higher charge states, we targeted ions with charge states of 3+ or higher in our MS acquisition settings for CLMS, and ignored ions with 2+ charge states, which correspond to many of the normal (i.e., not crosslinked) peptides that are identified by MS. As a result, normal peptides are less likely to be identified by the MS procedure used in our CLMS experiments compared to MS settings typically used to identify normal peptides. Our settings may also fail to identify some mono-modified peptides. Like most other CLMS methods, the total number of identified crosslinked peptide spectra is usually less than 1% of the total acquired spectra and we normally expect the crosslinked species to be approximately 1% of the total peptides. 

      We added information about the number of crosslinked and monolinked peptides identified in the pol I benchmarking experiments (line 173).  The number of crosslinks and monolinks identified in the pol II +/- a-amanitin experiment, the TBP/TFIIA/TFIIB experiment and the pol II experiment +/- Rpb4/7 are also provided.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the pol II experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      Regarding the Pol II complex experiment described in Figures 4 and 5, out of the 277 lysine residues in the complex, 207 were identified as monolinked residues (74.7%), and 817 crosslinked pairs out of 38,226 potential pairs (2.1%) were observed. The ability of CLMS to detect proximity/reactivity changes may be impacted by several factors including 1) the (low) abundance of crosslinked peptides in complex mixtures, 2) the presence of crosslinkable residues in close proximity with appropriate orientation, and 3) the ability to generate crosslinked peptides by enzymatic digestion that are amenable to MS analysis (i.e., the peptides have appropriate m/z’s and charge states, the peptides ionize well, the peptides produce sufficient fragment ions during MS2 analysis to allow confident identification). Future efforts to enrich crosslinked peptides prior to MS analysis may improve sensitivity.

      It is very difficult to estimate the modification efficiency of Qlinker (or many other crosslinkers) based on peptide identification results. One major reason for this is that trypsin is not able to cleave after a crosslinker-modified lysine residue.  As a result, the peptides generated after the modification reaction have different lengths, compositions, charge states, and ionization efficiencies compared to unmodified peptides. These differences make it very difficult to estimate the modification efficiencies based on the presence/absence of certain peptide ions, and/or the intensities of the modified and unmodified versions of a peptide. Also, 2+ ions which correspond to many normal (i.e., unmodified) peptides were excluded by our MS acquisition settings.

      It is also very difficult to predict which structural changes are expected and which crosslinked peptides and/or modified peptides can be observed by MS.  This is especially true when the experiment involves proteins containing unstructured regions such as the experiments involving Pol II, and TBP, TFIIA and TFIIB. Since we are at the early stages of using qCLMS to study structural changes, we are not sure which changes we can expect to observe by qCLMS. Additional applications of Qlinker-CLMS are needed to better understand the types of structural changes that can be studied using the approach.

      We hope that our discussions of some the limitations of CLMS for detecting conformational/reactivity changes provide the reader with an understanding of the sensitivity that can be expected with the approach.  At the end of the paragraph about the pol II a-amanitin experiment we say, “Unfortunately, no Q2linker-modified peptides were identified near the site where α-amanitin binds. This experiment also highlights one of the limitations of residue-specific, quantitative CLMS methods in general. Reactive residues must be available near the region of interest, and the modified peptides must be identifiable by mass spectrometry.” In the section about Rbp4/7-induced structural changes in pol II we describe the under-sampling issue. And in the last paragraph we reiterate these limitations and say, “This implies that this strategy, like all MS-based strategies, can only be used for interpretation of positively identified crosslinks or monolinks. Sensitivity and under sampling are common problems for MS analysis of complex samples.”

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

      We agree with the reviewer that it will be very helpful to establish gold standard datasets for CLMS. As we further develop and promote this technology, we will try to establish a standardized qCLMS.

      Reviewer #1 (Recommendations for the authors):

      Only a very minor point:

      I may have missed it but it's not really clear how many independent experiments were used for the benchmarking quantitation and mixing experiments for Figure 1. What is the reproducibility across experiments on average and on a per-peptide basis?

      Otherwise, I think the approach would really benefit from at least "Q5linkers" or even "Q10linkers", if possible. And then conduct detailed quantitative studies, either using dilution series or maybe investigating the kinetics of complex formation.

      We used a sample of BSA crosslinked peptides to optimize the MS settings, establish the MS acquisition strategies and test the quantification schemes.  The data in Figure 1 is based on one experiment, in which used ~150 ug of purified pol I complexes from a 6 L culture. We added this information to the Figure 1 legend. We also provide information about the reproducibility of peptide quantification by plotting the observed and expected ratios for each monolinked and crosslinked peptide identified in all of the runs in Figure S3.

      We agree with the reviewer that the Qlinker approach would be even more attractive if multiplex Qlinker reagents were designed. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Reviewer #2 (Recommendations for the authors):

      In addition to the public review I have the following recommendations/questions:

      (1) The first part of the results section where the synthesis of the crosslinker is explained is excellent for mass spec specialists, but problematic for general readers - either more info should be provided (e.g. b1+ ions - most readers will have no idea why that is) - or potentially it could be simplified here and the details shifted to Materials and Methods for the expert reader. The same is true below for the length of spacer arms.

      However - in general this level of detail is great - but can impact the ease of understanding for the more mass spec affine but not expert reader.

      We have added the following sentence to assist the general reader: A b1+ ion is an ion with a charge state of +1 corresponding to the first N-terminal amino acid residue after breakage of the first peptide bond (lines 126-128).

      (2) The Calmodulin experiment (lines 239 to 257) - it is a very nice result that they see the change in the crosslinked peptide between residues K78-K95, but the monolinks are not just detected as described in the text but actually go 2 fold up. This would have been actually a bit expected if the residues are now too far away to be still crosslinked that the monolinks increase. In this case, this counteraction of monolinks to crosslinked sites can also be potentially used as a "selection criteria" for interesting sites that change. Is that a possible interpretation or do the authors think that upregulation of the monolinks is a coincidence and should not be interpreted?

      We agree with the reviewer that both monolinks and crosslinks can be used as potential indicators for some changes. However, it is much more difficult to interpret the abundance information from monolinks because, unlike crosslinks, there is little associated structural/proximity information with monolinks. Because it is difficult to understand the reason(s) for changes in monolink abundance, we concentrate on changes in crosslink abundances, which provide proximity/structural information about the crosslinked residues.

      (3) Lines 267 to 274: a small thing but the structural information provided is quite dense I have to say. Maybe simplify or accompany with some supplemental figures?

      We agree that the structural information is a bit dense especially for readers who are not familiar with the pol II system.  We added a reference to Figure 3c (line 177) to help the reader follow the structural information. 

      As qCLMS is still a relatively new approach for studying conformational changes, the utility of the approach for studying different types of conformational changes is still unclear. Thus, one of the goals of the experiments is to demonstrate the types of conformational changes that can be detected by Q2linkers.  We hope that the detailed descriptions will help structural biologists understand the types of conformational changes that can be detected using Qlinkers.

      (4) Line 280: explain maybe why the sample was fractionated by SCX (I guess to separate the different complexes?).

      SCX was used to reduce the complexity of the peptide mixtures. As the samples are complex and crosslinked peptides are of low abundance compared to normal peptides, SCX can separate the peptides based on their positive charges.  Larger peptides and peptides with higher charge states, such as crosslinked peptides, tend to elute at higher salt concentration during SCX chromatography.  The use of SCX to fractionate complex peptide mixtures is described in the “General crosslinking protocol and workflow optimization” section of the Methods, and we added a sentence to explain why the sample was fractionated by SCX (lines 278-279).

      (5) Lines 354 to 357: "This suggests that the inability to identity most of these crosslinked peptides in both experiments is mainly due to under-sampling during mass spectrometry analysis of the complex samples, rather than the absence of the crosslinked peptides in one of the experiments."

      This is an extremely important point for the interpretation of missing values - have the authors tried to also collect the mass spec data with DIA which is better in recovery of the same peptide signals between different samples? I realize that these are isobaric samples so DIA measurements per se are not useful as the quantification is done on the reporter channels in the MS2, but it would at least give a better idea if the missing signals were simply not picked up for MS2 as claimed by the authors or the modified peptides are just not present. Another possibility is for the authors to at least try to use a "match between the run" function as can be done in Maxquant. One of the strengths of the method is that it is quantitative and two states are analyzed together, but as can be seen in this experiment, more than two states might want to be compared. In such cases, the under-sampling issue (if that is indeed the cause) makes interpretation of many sites hard (due to missing values) and it would be interesting if for example, an analysis approach with a "match between the runs" function could recover some of the missing values.

      We agree that undersampling/missing values is an important issue that needs to be addressed more thoroughly. This also highlights the importance of qCLMS, as conclusions about structural changes based on the presence/absence of certain crosslinked species in database search results may be misleading if the absence of a species is due to under-sampling. We have not tried to collect the data with DIA since we would lose the quantitative information. It would be interesting to see if match between runs can recover some of the missing values. While this could provide evidence to support the under-sampling hypothesis, it would not recover the quantitative information.

      We recommend performing label swap experiments and focusing downstream analysis on the crosslinks/monolinks that are identified on both experiments. Future development of multiplexed Qlinker reagents should help to alleviate under-sampling issues. See response to Reviewer #1.

      (6) Lines 375 to 393 (the whole paragraph): extremely detailed and not easy to follow. Is that level of detail necessary to drive home that point or could it be visualized in enough detail to help follow the text?

      We agree that the paragraph is quite detailed, but we feel that the level of detailed is necessary to describe the types of conformational changes that can be detected by the quantitative crosslinking data, and also illustrate the challenges of interpreting the structural basis for some crosslink abundance changes even when high resolution structural data exists.

      To make it easier to follow, we added a sentence to the legend of Figure 5b. “In the holo-pol II structure (right), Switch 5 bending pulls Rpb1:D1442 away from K15, breaking the salt bridge that is formed in the core pol II structure (left). The increase in the abundances of the Rpb1:15-Rpb6:76 and Rpb1:15-Rpb6:72 crosslinks in holo-pol II is likely attributed to the salt bridge between K15 and D1442 in core pol II which impedes the NHS ester-based reaction between the epsilon amino group of K15 and the crosslinker.”

      (7) Final paragraph in the results section - lines 397 and 398: "All of the intralinks involving Rpb4 are more abundant in holo-pol II as expected." If I understand that experiment correctly the intralinks with Rpb4 should not be present at all as Rpb4 has been deleted. Is that due to interference between the 126 and 127 channels in MS2? If so, then this also sets a bit of the upper limit of quantitative differences that can be seen. The authors should at least comment on that "limitation".

      Yes, we shouldn’t detect any Rpb4 peptides in the sample derived from the Rpb4 knockout strain. The signal from Rpb4 peptides in the DRpb4 sample is likely due to co-eluting ions. To clarify, we changed the text to:

      All of the intralinks involving Rpb4 are more abundant in the holo-pol II sample (even though we don’t expect any reporter ion signal from Rpb4 peptides derived from the ∆Rpb4 pol II sample, we still observed reporter ion signals from the channel corresponding to the DRpb4 sample, potentially due to the presence of low abundance, co-eluting ions)(lines 395-399).

      (8) Materials and Methods - line 690: I am probably missing something but why were two different mass additions to lysine added to the search (I would have expected only one for the crosslinker)?

      The 297 Da modification is for monolinked peptides with one end of the crosslinker hydrolyzed and 18 Da water molecule is added. The 279 Da modification is for crosslinks and sometimes for looplinks (crosslinks involving two lysine residues on the same tryptic peptide).

    1. eLife Assessment

      This useful study by Nandy and colleagues examined relationships between behavioral state, neural activity in cortical area V4, and trial-by-trial variability in the ability to detect weak visual stimuli. They present solid evidence indicating that certain changes in arousal and eye-position stability, along with patterns of synchrony in the activity of neurons in different layers of V4, can show modest correspondences to changes in the ability to correctly detect a stimulus. These findings are likely to be of interest to those who seek a deeper understanding of circuit mechanisms that underlie perception.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Nandy and colleagues examine neural, physiological and behavioral correlates of perceptual variability in monkeys performing a visual change detection task. They used a laminar probe to record from area V4 while two macaque monkeys detected a small change in stimulus orientation that occurred at a random time in one of two locations, focusing their analysis on stimulus conditions where the animal was equally likely to detect (hit) or not-detect (miss) a briefly presented orientation change (target). They discovered two behavioral and physiological measures that are significantly different between hit and miss trials - pupil size tends to be slightly larger on hits vs. misses, and monkeys are more likely to miss the target on trials in which they made a microsaccade shortly before target onset. They also examined multiple measures of neural activity across the cortical layers and found some measures that are significantly different between hits and misses.

      Strengths:

      Overall the study is well executed and the analyses are appropriate (with some possible caveats discussed below).

      Weaknesses:

      I have two remaining concerns. First, with the exception of the pre-target microsaccades, the correlates of perceptual variability (differences between hits and misses) appear to be weak and disconnected. The GLM analysis of the predictive power of trial outcome based on the behavioral and neural measures is only discussed at the end of the paper. This analysis shows that some of the measures have no significant predictive power, while others cannot be examined using the GLM analysis because these measures cannot be estimated in single trials. Given these weak and disconnected effects, my overall sense is that the current results provide a limited advance to our understanding of the neural basis of perceptual variability.

      In addition, because the authors combine data across stimulus contrasts, I am somewhat uneasy about the possible confounding effect of contrast. As expected, stimulus contrast affected the probability of hits vs. misses. Independently, contrast may have affected some of the physiological measurements. Therefore, showing that contrast is not the source of the covariations between the physiological/behavioral measurements and perception can be challenging, and I am not convinced that the authors have ruled this out as a possible confound. It is unclear why the authors had to vary contrast in the first place, and why the analyses had to be done by combining the data across contrasts or by ignoring contrast as a variable (e.g., in the GLM analysis).

    1. Author response:

      Review #1:

      Also, they observed no difference in the binding free energy of phosphatidylserine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We directly note this contrast with experimental findings in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss of function in the R47H variant extends beyond decreased binding affinities and also impacts binding patterns. As stated in our manuscript: ‘Our observations for both sTREM2 and TREM2 indicate that R47H-induced dysfunction may result not only from diminished ligand binding but also an impaired ability to discriminate between different ligands in the brain, proposing a novel mechanism for loss-of-function.’

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      The reviewer raises an interesting point regarding the repetition of individual simulations, a consideration we carefully evaluated during the design of this study. However, we believe our approach—running multiple independent models of the same system—offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      In our study, we demonstrate that within the 150 ns timescale of our protein/ligand (PL) simulations, the relatively small ligands are able to move from their initial docking positions to a specific binding site. While ideally, replicates of these independent models would further strengthen the findings, this was not computationally feasible given the unprecedented total duration of our simulations. Importantly, our conclusions are seldom based on the results of a single protein/PL simulation.

      Moreover, the ergodic hypothesis suggests that over sufficiently long timescales, simulations will explore all accessible states. Additionally, we have performed several replicate simulations of our WT and R47H Ig-like domain models in solution, specifically to investigate CDR2 loop dynamics.

      In this case, since the system involves only the protein and lacks the independent replicates seen in the protein/PL simulations, these runs were chosen to effectively capture the stochastic nature of CDR2 loop movement.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation. While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. We are currently preparing two separate publications that will delve into these gaps in more detail, as addressing them was beyond the scope of the present study.

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      We are currently working to address this comment to strengthen the validity of our results and statistical conclusions in the revised manuscript.  

      Review #2:

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We will adjust and refocus how we reference this evidence from Kober et al. in our revised manuscript. 

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper. Hence, we are currently working toward a manuscript that will be the first biologically relevant model of TREM2 in a membrane and will challenge the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

    2. eLife Assessment

      This useful manuscript addresses some key molecular mechanisms on the neuroprotective roles of soluble TREM2 in neurodegenerative diseases. Thw study will advance our understanding of TREM2 mutations, particularly on the damaging effect of known TREM2 mutations, and also explain why soluble TREM2 can antagonize Aβ aggregation. However, the primary experimental method, MD simulations, suffers from limited sampling, rendering the results incomplete for definite conclusions.

    3. Reviewer #1 (Public review):

      In this manuscript, Saeb et al reported the mechanistic roles of the flexible stalk domain in sTREM2 function using molecular dynamics simulations. They have reported some interesting molecular bases explaining why sTREM2 shows protective effects during AD, such as partial extracellular stalk domain promoting binding preference and stabilities of sTREM2 with its ligand even in the presence of known AD-risk mutation, R47H. Furthermore, they found that the stalk domain itself acts as the site for ligand binding by providing an "expanded surface", known as 'Expanded Surface 2' together with the Ig-like domain. Also, they observed no difference in the binding free energy of phosphatidyl-serine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

    4. Reviewer #2 (Public review):

      Significance:

      TREM2 is an immunomodulatory receptor expressed on myeloid cells and microglia in the brain. TREM2 consists of a single immunoglobular (Ig) domain that leads into a flexible stalk, transmembrane helix, and short cytoplasmic tail. Extracellular proteases can cleave TREM2 in its stalk and produce a soluble TREM2 (sTREM2). TREM2 is genetically linked to Alzheimer's disease (AD), with the strongest association coming from an R47H variant in the Ig domain. Despite intense interest, the full TREM2 ligand repertoire remains elusive, and it is unclear what function sTREM2 may play in the brain. The central goal of this paper is to assess the ligand-binding role of the flexible stalk that is generated during the shedding of TREM2. To do this, the authors simulate the behavior of constructs with and without stalk. However, it is not clear why the authors chose to use the isolated Ig domain as a surrogate for full-length TREM2. Additionally, experimental binding evidence that is misrepresented by the authors contradicts the proposed role of the stalk.

      Summary and strengths:

      The authors carry out MD simulations of WT and R47H TREM2 with and without the flexible stalk. Simulations are carried out for apo TREM2 and for TREM2 in complex with various lipids. They compare results using just the Ig domain to results including the flexible stalk that is retained following cleavage to generate sTREM2. The computational methods are well-described and should be reproducible. The long simulations are a strength, as exemplified in Figure 2A where a CDR2 transition happens at ~400-600 ns. The stalk has not been resolved in structural studies, but the simulations suggest the intriguing and readily testable hypothesis that the stalk interacts with the Ig domain and thereby contributes to the stability of the Ig domain and to ligand binding. I suspect biochemists interested in TREM2 will make testing this hypothesis a high priority.

      Weaknesses:

      Unfortunately, the work suffers from two fundamental flaws.

      (1) The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      (2) The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

    1. eLife Assessment

      The study provides a valuable showcase of a workflow to perform large-scale characterization of drug mechanisms of action using proteomics in which on-target and off-targets of 166 compounds using proteome solubility analysis in living cells and cell lysates were determined. The evidence supporting the claims of the authors is solid, however, the inclusion of more replicate experiments and more statistical rigor would have strengthened the study. This will be of broad interest to medicinal chemists, toxicologists, computational biologists and biochemists.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases, but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket have confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their widespread use.

      Strengths:

      - Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action. - Multiple orthogonal approaches are used to provide high-resolution information on ligand binding poses and protein dynamics.

      - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.

      Weaknesses

      - Inclusion of statistical analysis is missing in several places in the text. - Functional analysis beyond coregulator binding is needed.

      We added additional statistical analyses as recommended (Source Data 1, a Microsoft Excel spreadsheet).

      Related to functional analysis, we cite and studies from our previous publication (Hughes et al. Nature Communications 2014 5:3571) where we demonstrated that the covalent inhibitor ligands (GW9662 and T0070907) do not block the activity of other ligands using a PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes. Our study here expands on this finding and other published studies showing the structural mechanism for the lack of blocking activity by the covalent inhibitors.

      Reviewer #2 (Public Review):

      Summary:

      The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can co-bind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or co-bind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce the orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand-specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).

      Strengths:

      The biochemical and biophysical evidence presented is strong and convincing.

      Weaknesses:

      However, the co-crystal studies were performed by soaking non-covalent ligands to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would a similar conclusion be drawn? Additional discussion will broaden the implications of the conclusion.

      This is an interesting point, which we now expand upon in a new (third) paragraph of the discussion in our revised manuscript:

      “In our previous study, we observed synthetic and natural/endogenous ligand co-binding via co-crystallography where preformed crystals of PPARγ LBD bound to unsaturated fatty acids (UFAs) were soaked with a synthetic ligand, which pushed the bound UFA to an alternate site within the orthosteric ligand-binding pocket 8. In the scenario of synthetic ligand cobinding with a covalent inhibitor, it is possible that soaking a covalent inhibitor into preformed crystals where the PPARγ LBD is already bound to a non-covalent ligand may prove to be difficult. The covalent inhibitor would need to flow through solvent channels within the crystal lattice, which may not be a problem. However, upon reaching the entrance surface to the orthosteric ligand-binding pocket, it may be difficult for the covalent inhibitor to gain access to the region of the orthosteric pocket required for covalent modification as the larger non-covalent ligand could block access. This potential order of addition problem may not be a problem for studies in solution or in cells, where the non-covalent ligand can more freely exchange in and out of the orthosteric pocket and over time the covalent reaction would reach full occupancy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - IC50 or EC50 values are not reported for the coregulator interaction assays, R2 for fit should also be reported where Ki and IC50s are disclosed.

      We now report fitting statistics and IC50/EC50 values when possible in Figure 2B and Source Data 1 along with R2 values for the fit. We note that some data do not show complete or robust enough binding curves to faithfully fit to a dose response equation.

      -  Reporter gene or qPCR should be performed for the combinations of covalent and noncovalent ligands to show how these molecules impact transcriptional activities rather than just coregulator binding profiles.

      We previously performed PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes to demonstrate that cotreatment of a covalent inhibitor (GW9662 or T0070907) with a non-covalent ligand does not block activity of the non-covalent ligand and showed cobinding-induced activation relative to DMSO control (Hughes et al., 2024 Nature Communications). We did not specifically mention this in our original manuscript, but we now call this out in the first paragraph of the results section.

      - Inclusion of a structure figure to show the different helix 12 orientations should be included in the introduction. Likewise, how the overall structure of the LBD changes as a result of the cobinding in the discussion or a summary model would be helpful.

      Our revised manuscript includes a structure figure called out in the introduction describing the active and repressive helix 12 PPARγ LBD conformations (new Figure 1). There are no major changes to the overall structure of the LBD compared to the active conformation that crystallized, so we did not include a summary model figure but we do refer readers to our previous paper (Shang and Kojetin, Structure 2021 29(9):940-950) in the penultimate paragraph of the discussion. We also added the following sentence to the crystallography results section related to the overall LBD changes:

      “The structures show high structural similarity to the transcriptionally active LBD conformation with rmsd values ranging from 0.77–1.03Å (Supplementary Table S2)”

      A typo in paragraph 3 of the discussion says "long-live" when it should probably say "long-lived."

      We corrected this typo.

      Reviewer #2 (Recommendations For The Authors):

      It's interesting that ligand-specific binding mode of non-covalent ligands was observed. Would modifications of the chemical structure of a covalent inhibitor alter the allosteric binding behavior of non-covalent ligands in a predictive manner? If so, how can such SAR be used to guide the design of covalent inhibitors to more broadly and effectively inhibit agonists of various chemical structures? Discussion on this topic could be valuable.

      This is an interesting point, which we now discuss in the penultimate and last paragraphs of the discussion:

      “Another way to test this structural model could be through the use of covalent PPARγ inverse agonist analogs with graded activity 23, where one might posit that covalent inverse agonist analogs that shift the LBD conformational ensemble towards a fully repressive LBD conformation may better inhibit synthetic ligand cobinding.”

      “It may be possible to use the crystal structures we obtained to guide structure-informed design of covalent inhibitors that would physically block cobinding of a synthetic ligand. This could be the potential mechanism of a newer generation covalent antagonist inhibitor we developed, SR16832, that more completely inhibit alternate site ligand binding of an analog of MRL20, rosiglitazone and the UFA docosahexaenoic acid (DHA)

      21 and thus may be a better choice for the field to use as a covalent ligand inhibitor of PPARγ.”

    2. eLife Assessment

      This landmark study elucidates the intricate structural mechanisms by which both covalent and non-covalent synthetic ligands can co-occupy the binding pocket of the nuclear receptor transcription factor PPARγ. Through a compelling integration of structural, biochemical, and biophysical evidence, the authors challenge the reliability of two commonly used covalent inhibitors. These findings have far-reaching implications for the broader field of nuclear receptor research. This work will be of high interest to structural biologists and biochemists exploring ligand interactions within the nuclear receptor superfamily.

    3. Reviewer #1 (Public review):

      Summary:

      PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket has confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their wide-spread use.

      Strengths:

      - Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action.<br /> - Multiple orthogonal approaches are used to provide high resolution information on ligand binding poses and protein dynamics.<br /> - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.

    4. Reviewer #2 (Public review):

      Summary:

      The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can cobind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or cobind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).

      Strengths:

      The biochemical and biophysical evidence as presented is strong and convincing.

      Additional comments:

      The co-crystal studies were performed by soaking a non-covalent ligand to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, one might ask if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would similar conclusion be drawn? The authors have reasonably speculated that it might be difficult to soak a covalent inhibitor into preformed crystals where the PPARγ LBD is already bound to a non-covalent ligand, because the larger non-covalent ligand could block the covalent inhibitor to gain access to the region of the orthosteric pocket required for covalent modification.

    1. eLife Assessment

      The manuscript presents an important model for the field of endosome maturation, providing perspective on the role of the deubiquitinating enzyme UPS-50/USP8 in the process. The evidence presented in the paper is clear, incorporating well-designed experiments that suggest the dual actions of UPS-50 and USP8 in the conversion of early endosomes into late endosomes. Overall, the work is convincing and centers on an intriguing subject.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies.

      Strengths:

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8.

      Weaknesses:

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the ups-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that localization of USP8 to early endosomes are disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although that Rabex5 are accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation.

      Strengths:

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidences to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members in the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled.

      Weaknesses:

      - The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell that whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion. Note that the authors have provided convincing evidence about the effects on Rab proteins in the revised manuscript.<br /> - The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient of both factors displayed similar defects in late endosomes/lysosomes. But the authors didn't confirm whether and/or to which extent that USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model. Note that the authors have provided convincing evidence about the role of USP8-Rabx5 axis in the revised manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      The authors elucidated the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of abnormal enlarged vesicles (aberrant early endosomes) when USP8 function was lost. They showed that USP8 interacts with Rabx5 to dissociate it from early endosomes promoting the recruitment of the Rab7 GEF SAND-1/Mon1 and the maturation of the endosomes. The authors provided evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells.

      Strengths:

      The use of two models, C. elegans and a mammalian cell line to describe a similar mechanism.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths: 

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses: 

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the usp-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs. 

      Excellent suggestions. USP8 has been identified as a protein associated with ESCRT components, which are crucial for endosomal membrane deformation and scission, leading to the formation of intraluminal vesicles (ILVs) within multivesicular bodies (MVBs). In usp-50 mutants, we observed a significant reduction in the punctate signals of HGRS-1::GFP and STAM-1 (Figure 1G and H; and Figure1-figure supplement 1B), indicating a disruption in ESCRT-0 complex localization (Author response image 1). Additionally, lysosomal structures are markedly reduced in these mutants. In contrast, we found that early endosomes, as marked by FYVE, RAB-5, RABEX5, and EEA1, are significantly enlarged in usp-50 mutants. Electron microscopy (EM) imaging further revealed an increase in large cellular vesicles containing various intraluminal structures. Given the reduction in lysosomal structures and the enlargement of early endosomes in usp-50 mutants, these enlarged vesicles are likely aberrant early endosomes rather than late endosomal or lysosomal structures. To address potential confusion, we have revised the manuscript according to the reviewer's comments and updated the model to accurately reflect these observations.

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths: 

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses: 

      - The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion. 

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Figure 5-figure supplement 1, Figure 5-figure supplement 2, and Figure 6-figure supplement 1). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      - The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model. 

      Excellent point. To test whether USP-50 regulates endosome maturation through RABX-5, we performed additional genetic analyses. In rabx-5(null) mutant animals, the morphology of 2xFYVE-labeled early endosomes is comparable to that of wild-type controls (Figure 4H and I). Introducing the rabx-5(null) mutation into usp-50(xd413) backgrounds resulted in a significant suppression of the enlarged early endosome phenotype characteristic of usp-50(xd413) mutants (Figure 4H and I). These findings suggest that USP-50 may modulate the size of early endosomes through its interaction with RABX-5.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation. 

      Weaknesses: 

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript. 

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it. 

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Figure 5-figure supplement 1, Figure 5-figure supplement 2, and Figure 6-figure supplement 1). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. Electron microscopy (EM) analysis indicated that usp-50 mutation leads to abnormally enlarged vesicles containing various intraluminal structures in worm epidermal cells. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Within Figures 1K-N, diverse anomalous structures were detected in the usp-50 mutant. Further scrutiny is needed to definitively characterize these structures, particularly as the images in Figures 1M and 1L exhibit notable similarities to lamellar bodies.

      We thank the reviewer for the insightful question regarding the resemblance between the vesicles observed in our study and lamellar bodies (LBs). Lamellar bodies are specialized organelles involved in lipid storage and secretion1, prominently studied in keratinocytes of the skin and alveolar type II (ATII) epithelial cells in the lung2. These organelles contain not only lipids but also cell-type specific proteins and lytic enzymes. Due to their acidic pH and functional similarities, LBs are classified as lysosome-related organelles (LROs) or secretory lysosomes3,4. In usp-50 mutants, we observed a considerable number of abnormal vesicles, some of which contain threadlike membrane structures and exhibit morphological similarities to LBs (Figure 2O). However, further analysis with a comprehensive panel of lysosome-related markers demonstrated a significant reduction in lysosomal structures within these mutants. In contrast, vesicles marked by early endosome markers, such as FYVE, RAB-5, RABX-5, and EEA1, were notably enlarged. These results suggest that the enlarged vesicles observed in usp-50 mutants are more likely aberrant early endosomes rather than true lamellar bodies. We have revised the manuscript to reflect these findings and to clearly differentiate between these structures and lysosome-related organelles.

      (2) The correlation between the presence of these abnormal structures and ESCRT-0 remains unaddressed, thus the assertion that UPS-50 regulates endolysosome trafficking in conjunction with ESCRT-0 lacks empirical support.

      We thank the reviewer for the valuable suggestions. We apologize for any confusion and appreciate the opportunity to clarify our findings. The ESCRT machinery is essential for driving endosomal membrane deformation and scission, which leads to the formation of intraluminal vesicles (ILVs) within multivesicular bodies (MVBs). Recent research has shown that the absence of ESCRT components results in a reduction of ILVs in worm gut cells5. In wild type animals, the ESCRT-0 components HGRS-1 and STAM-1 display a distinct punctate distribution (Figure 1G and H). However, in usp-50 mutants, the punctate signals of HGRS-1::GFP and STAM-1::GFP are significantly reduced (Figure 1G and H; and Figure 1-figure supplement 1B), indicating a role for USP-50 in stabilizing the ESCRT-0 complex. Our TEM analysis revealed an accumulation of abnormally enlarged vesicles containing intraluminal structures in usp-50 mutants. When we examined a panel of early endosome and late endosome/lysosome markers, we found that early endosomes are significantly enlarged, while late endosomal/lysosomal structures are markedly reduced in these mutants. This suggests that the abnormal structures observed in usp-50 mutants are likely enlarged early endosomes rather than classical MVBs. To further investigate whether the reduction in ESCRT components contributes to the late endosome/lysosome defects, we analyzed stam-1 mutants. In these mutants, the size of RAB-7-coated vesicles was reduced (Author response image 1C), and the lysosomal marker LAAT-1 indicated a reduction in lysosomal structures (Author response image 1B). These results highlight the importance of the ESCRT complex in late endosome/lysosome formation. However, the morphology of early endosomes, as marked by 2xFYVE, remained similar to that of wild type in stam-1 mutants (Author response image 1A). Therefore, while reduced ESCRT-0 components may contribute to the late endosome/lysosome defects observed in usp-50 mutants, the enlargement of early endosomes in these mutants may involve additional mechanisms. We have revised the manuscript to incorporate these insights and to address the reviewer's comments more comprehensively.

      Author response image 1.

      (A) Confocal fluorescence images of hypodermis expressing YFP::2xFYVE to detect EEs in L4 stage animals in wild type and stam-1(ok406) mutants. Scale bar: 5 μm. (B) Confocal fluorescence images of hypodermal cell 7 (hyp7) expressing the LAAT-1::GFP marker to highlight lysosome structures in 3-day-old adult animals. Compared to wild type, LAAT-1::GFP signal is reduced in stam-1(ok406) animals. Scale bar, 5 μm. (C) The reduction of punctate endogenous GFP::RAB-7 signals in stam-1(ok406) animals. Scale bar: 10 μm.

      (3) Endosomal dysfunction typically leads to significant alterations in the spatial arrangement of marker proteins across distinct endosomes. In the manuscript, the authors examined the distribution and morphology of early endosomes, multivesicular bodies (MVBs), late endosomes, and lysosomes in a usp-50 deficient background primarily through single-channel confocal imaging. By employing two color images showing RAB-5 and RAB-7, in conjunction with HGRS-1, a more comprehensive picture of the aftermath of USP-50 loss can be obtained.

      Good suggestions. We have conducted a double-labeling analysis to examine the distribution of RAB-5 and RAB-7 in conjunction with HGRS-1. In wild type animals, HGRS-1 exhibits a punctate distribution that is partially co-localized with both RAB-5 and RAB-7. In contrast, in usp-50 mutants, the punctate signal of HGRS-1 is significantly reduced, along with its co-localization with RAB-5 and RAB-7 (Author response image 2). These results suggest that, in the absence of USP-50, the stabilization of ESCRT-0 components on endosomes is compromised.

      Author response image 2.

      ESCRT-0 is adjacent to both early endosomes and late endosomes. (A) Confocal fluorescence images of wild-type and usp-50(xd413) hypodermis at L4 stage co-expressing HGRS-1::GFP (hgrs-1 promoter) and endogenous wrmScarlet::RAB-5. (B) HGRS-1 and RAB-5 puncta were analyzed to produce Manders overlap coefficient M1 (HGRS-1/RAB-5) and M2 (RAB-5/HGRS-1) (N=10). (C) Confocal fluorescence images of wild-type and usp-50(xd413) hypodermis at L4 stage co-expressing HGRS-1::GFP (hgrs-1 promoter) and endogenous wrmScarlet::RAB-7. (D) HGRS-1 and RAB-7 puncta were analyzed to produce Manders overlap coefficient M1 (HGRS-1/RAB-7) and M2 (RAB-7/HGRS-1) (N=10). Scale bar: 10 μm for (A) and (C).

      (4) The authors observed enlarged early endosomes in cells depleted of usp-50/usp8, along with enlarged MVB-like structures identified through TEM. The potential identity of these structures as the same organelle could be determined using CLEM.

      We thank the reviewer for the valuable suggestion. Our TEM analysis identified a large number of abnormally enlarged vesicles with various intraluminal structures accumulated in usp-50 mutants. As the reviewer correctly noted, CLEM (correlative light and electron microscopy) would be an ideal approach to further characterize these structures. We have been attempting to implement CLEM in C. elegans for a few years. Given that CLEM relies on fluorescence markers, in this study we focused on two tagged proteins, RAB-5 and RABX-5, which show enlargement in their vesicles in usp-50 mutants. Unfortunately, we encountered significant challenges with this approach, as the GFP-tagged RAB-5 and RABX-5 signals did not survive the electron microscopy procedure. Attempts to align EM sections with residual GFP signaling yielded results that were not convincing. Consequently, we concentrated our analysis on a panel of molecular markers, including 2xFYVE, RAB-5, RABX-5, RAB-7, and LAAT-1. These markers consistently indicated that early endosomes are specifically enlarged in usp-50 mutants, while late endosomal/lysosomal structures are notably reduced. Thus, the abnormal structures identified in usp-50 mutants via TEM are likely to be enlarged early endosomes rather than the classical view of MVBs. We have revised the manuscript to reflect these findings and to clarify this point.

      (5) The working model depicted in Figure 6 Y (right) requires revision, as it has the potential to mislead authors into mistaking enlarged early endosomes for multivesicular bodies (MVBs).

      We thank the reviewer for the excellent suggestion. We have revised the model to clarify that it is the enlarged early endosomes, rather than MVBs, that are observed in usp-50 mutants.

      Reviewer #2 (Recommendations For The Authors):

      (1) Is there any change of Rabx5 protein level in USP8/USP50 mutant cells?

      Good question. In the absence of usp-50/usp8, we indeed observed a noticeable increase in the signal of Rabex5 on endosomes. To determine whether usp-50/usp8 affects the protein level of Rabex5, we investigated the endogenous levels of RABX-5 using the RABX-5::GFP knock-in line. Compared to wild-type controls, we found an elevated protein level of RABX-5::GFP in the knock-in line (Author response image 3). This suggests that USP-50 may play a role in the destabilization of RABX-5/Rabex5 in vivo.

      Author response image 3.

      The endogenous RABX-5 protein level is increased in usp-50 mutants. (A) The RABX-5::GFP KI protein level is increased in usp-50(xd413). (B) Quantification of endogenous RABX-5::GFP protein level in wild type and usp-50(xd413) mutant animals.

      (2) It is interesting that "The rabx-5(null) animals are healthy and fertile and do not display obvious morphological or behavioral defects.", which seems contrary to its role in regulating USP8 localization and endosome maturation.

      It has been previously documented that rabx-5 functions redundantly with rme-6, another RAB-5 GEF in C. elegans, to regulate RAB-5 localization in oocytes6. RNA interference (RNAi) targeting rabx-5 in a rme-6 mutant background results in synthetic lethality, whereas neither rabx-5 nor rme-6 single mutants are essential for worm viability. RME-6 co-localizes with clathrin-coated pits, while Rabex-5 is localized to early endosomes. Rabex-5 forms a stable complex with Rabaptin-5 and is part of a large EEA1-positive complex on early endosomes, whereas RME-6 does not interact with Rabaptin-5 (RABN-5) or EEA-1. These findings suggest that while RME-6 and RABX-5 may function redundantly, they likely play distinct roles in regulating intracellular trafficking processes. In the absence of RABX-5, USP-50 appears to lose its endosomal localization, although the size of the early endosome remains comparable to that of wild type. This observation contrasts with the phenotype associated with USP-50 loss-of-function, in which the early endosome is notably enlarged. These results suggest that residual USP-50 present in the endosomes is sufficient to maintain its role in the endocytic pathway. Conversely, the complete absence of USP-50 likely disrupts the transition of early endosomes to late endosomes, indicating a crucial role of USP-50 in this conversion process. It is also noteworthy that, although loss-of-function of rabx-5 does not result in obvious changes to early endosomes, increasing the gene expression level of rabx-5/Rabex-5 alone is sufficient to cause enlargement of early endosomes (Author response image 4) . Indeed, we observed that loss-of-function mutations in u_sp-50/usp_8 lead to abnormally enlarged early endosomes, accompanied by an enhanced signal of endosomal RABX-5. When the rabx-5(null) mutation was introduced into usp-50 mutant animals, the enlarged early endosome phenotype seen in usp-50 mutants was significantly suppressed (Figure 4H and I). This implies that maintaining a lower level of Rab5 GEF may be crucial for endolysosomal trafficking.

      (3) Does Rabx5 mutation has any impact on early endosomes?

      To address the question, we utilized the CRISPR/Cas9 technique to create a molecular null for rabx-5 (Figure 4E). In the rabx-5(null) mutant animals, we found that the 2xFYVE-labeled early endosomes are indistinguishable from wild type (Figure 4H and 4I). Given that r_abx-5_ functions redundantly with rme-6, another RAB-5 GEF in C. elegans, it is likely that the regulation of early endosome size involves a cooperative interaction between RABX-5 and RME-6.

      (4) The authors observed a reduction of ESCRT-0 components in USP8 mutant cells, could this contribute to the late endosome/lysosome defects?

      Good suggestion. In wild-type animals, the two ESCRT-0 components, HGRS-1 and STAM-1, exhibit a distinct punctate distribution (Figure 1G and H). However, in usp-50 mutants, the punctate signals of HGRS-1::GFP and STAM-1::GFP are significantly diminished (Figure 1G and H; and Figure 1-figure supplement 1B), which aligns with the role of USP-50 in stabilizing the ESCRT-0 complex. To investigate whether the reduction in ESCRT components might contribute to defects in late endosome/lysosome formation, we examined stam-1 mutants. In stam-1 mutants, we observed a reduction in the size of RAB-7-coated vesicles (Author response image 1). Further, when we introduced the lysosomal marker LAAT-1::GFP into stam-1 mutants, we found a substantial decrease in lysosomal structures compared to wild-type animals (Author response image 1). This suggests that the ESCRT complex is essential for proper late endosome/lysosome formation. In contrast, the morphology of early endosomes, as indicated by the 2xFYVE marker, appeared normal in stam-1 mutants, similar to wild-type animals (Author response image 1). This implies that while a reduction in ESCRT-0 components may contribute to the late endosome/lysosome defects observed in usp-50 mutants, the early endosome enlargement phenotype in _usp-5_0 mutants may involve additional mechanisms.

      (5) Rabx5 is accumulated in USP8 mutant cells, I am very curious about the phenotype of USP8-Rabx5 double mutants. Could over-expression of Rabx5 (wild type or mutant forms) cause any defects?

      Excellent suggestions. To address the question, we employed the CRISPR/Cas9 technique to create a molecular null for rabx-5 (Figure 4E). In the rabx-5(null) mutant animals, we observed that the punctate USP-50::GFP signal became diffusely distributed (Figure 4F and G). This suggests that rabx-5 is necessary for the endosomal localization of USP-50. Interestingly, in rabx-5(null) mutant animals, the 2xFYVE-labeled early endosomes appeared similar to those in wild-type animals (Figure 4H and I). When rabx-5(null) was introduced into usp-50 mutant animals, the enlarged early endosome phenotype observed in usp-50 was significantly suppressed (Figure 4H and I). This finding indicates that usp-50 indeed functions through rabx-5 to regulate early endosome size. Additionally, we constructed strains overexpressing either wild-type or K323R mutant RABX-5. Our results showed that overexpression of wild-type RABX-5 led to early endosome enlargement (as indicated by YFP::2xFYVE labeling) (Author response image 4A, B and D). In contrast, overexpression of the K323R mutant RABX-5 did not result in noticeable early endosome enlargement (Author response image 4A, C and D). Together, these data are in consistent with our model that USP-50 may regulate RABX-5 by deubiquitinating the K323 site.

      Author response image 4.

      (A-C) Over-expression wild type RABX-5 causes enlarged EEs (labeled by YFP::2xFYVE) while RABX-5(K323R) mutant form does not. (D) Quantification of the volume of individual YFP::2xFYVE vesicles. Data are presented as mean ± SEM. ****P<0.0001. ns, not significant. One-way ANOVA with Tukey’s test.

      (6) Rabx5 could be ubiquitinated at K88 and K323, and Rabx5-K323R showed different activity when compared with the wild-type protein in USP8 mutant cells. Could the authors provide evidence that USP8 could remove the ubiquitin modification from K323 in Rabx5 protein?

      We appreciate the reviewer's insightful suggestions. To explore the potential of USP-50 in removing ubiquitin modifications from lysine 323 on the RABX-5 protein, we undertook a series of experiments. Initially, we sought to determine whether USP-50 influences the ubiquitination level of RABX-5 in vivo. However, due to the low expression levels of USP-50, we encountered challenges in obtaining adequate amounts of USP-50 protein from worm lysates. To overcome this, we expressed USP-50::4xFLAG in HEK293 cells for subsequent affinity purification. Concurrently, we utilized anti-GFP agarose beads to purify RABX-5::GFP from worms expressing the rabx-5::gfp construct. We then incubated RABX-5::GFP with USP-50::4xFLAG for varying durations and performed immunoblotting with an anti-ubiquitin antibody. As shown in Author response image 5A, our results revealed a decrease in the ubiquitination level of RABX-5 in the presence of USP-50, suggesting that USP-50 directly deubiquitinates RABX-5. Previous studies have indicated that only a minor fraction of recombinant RABX-5 undergoes ubiquitination in HeLa cells, which is believed to have functional significance7. Our findings are consistent with this observation, as only a small fraction of RABX-5 in worms is ubiquitinated. Rabex-5 is known to interact with both K63- and K48-linked poly-ubiquitin chains. To further elucidate whether USP-50 specifically targets K48 or K63-linked ubiquitination at the K323 site of RABX-5, we incubated various HA-tagged ubiquitin mutants with either wild-type or K323R mutant RABX-5 protein. Our results indicated that the K323R mutation reduces K63-linked ubiquitination of RABX-5 (Author response image 5). This experiment was repeated multiple times with consistent results. Additionally, while overexpression of wild-type RABX-5 led to an enlargement of early endosomes, as evidenced by YFP::2xFYVE labeling, overexpression of the K323R mutant did not produce a noticeable effect on endosome size (Author response image 4). Collectively, this finding indicates that RABX-5 is subject to ubiquitin modification in vivo and that USP-50 plays a significant role in regulating this modification at the K323 site.

      Author response image 5.

      (A) RABX-5::GFP protein was purified from worm lysates using anti-GFP antibody. FLAG-tagged USP-50 was purified from HEK293T cells using anti-FLAG antibody. Purified RABX-5::GFP was incubated with USP-50::4FLAG for indicated times (0, 15, 30, 60 mins), followed by immunoblotting using antibody against ubiquitin, FLAG or GFP. In the presence of USP-50::4xFLAG, the ubiquitination level of RABX-5::GFP is decreased. (B) Quantification of RABX-5::GFP ubiquitination level from three independent experiments. (C) HEK293T cells were transfected with HA-Ub or indicated mutants and 4xFLAG tagged RABX-5 or RABX-5 K323R mutant for 48h. The cells were subjected to pull down using the FLAG beads, followed by immunoblotting using antibody against HA or Flag.

      (7) The authors described "the almost identical phenotype of usp-50/usp8 and sand-1/Mon1 mutants", found protein-protein interaction between USP8 and sand-1, and showed that sand1-GFP signal is diminished in USP8 mutant cells. These observations fit with the possibility that USP8 regulates the stability of sand-1 to promote endosomal maturation. Could this be tested and integrated into the current model?

      are grateful for the insightful comments provided by the reviewer. Rab5, known to be activated by Rabex-5, plays a crucial role in the homotypic fusion of early endosomes. Rab5 effectors also include the Rab7 GEF SAND-1/Mon1–Ccz1 complex. Rab7 activation by SAND-1/Mon1-Ccz1 complex is essential for the biogenesis and positioning of late endosomes (LEs) and lysosomes, and for the fusion of endosomes and autophagosomes with lysosomes. The Mon1-Ccz1 complex is able to interact with Rabex5, causing dissociation of Rabex5 from the membrane, which probably terminates the positive feedback loop of Rab5 activation and then promotes the recruitment and activation of Rab7 on endosomes. In our study, we identified an interaction between USP-50 and the Rab5 GEF, RABX-5. In the absence of USP-50, we observed an increased endosomal localization of RABX-5 and the formation of abnormally enlarged early endosomes. This phenotype is reminiscent of that seen in sand-1 loss-of-function mutants, which also exhibit enlarged early endosomes and a concomitant reduction in late endosomes/lysosomes. Notably, USP-50 also interacts with SAND-1, suggesting a potential role in regulating its localization. We could propose several models to elucidate how USP-50 might influence SAND-1 localization, including:

      (1) USP-50 may stabilize SAND-1 through direct de-ubiquitination.

      (2) In the absence of USP-50, the sustained presence of RABX-5 could lead to continuous Rab5 activation, which might hinder or delay the recruitment of SAND-1.

      (3) USP-50 could facilitate SAND-1 recruitment by promoting the dissociation of RABX-5.

      We are actively investigating these models in our laboratory. Due to space constraints, a more detailed exploration of how USP-50 regulates SAND-1 stability will be presented in a separate publication.

      References:

      (1) Schmitz, G., and Müller, G. (1991). Structure and function of lamellar bodies, lipid-protein complexes involved in storage and secretion of cellular lipids. J Lipid Res 32, 1539-1570.

      (2) Dietl, P., and Frick, M. (2021). Channels and Transporters of the Pulmonary Lamellar Body in Health and Disease. Cells-Basel 11. https://doi.org/10.3390/cells11010045.

      (3) Raposo, G., Marks, M.S., and Cutler, D.F. (2007). Lysosome-related organelles: driving post-Golgi compartments into specialisation. Current opinion in cell biology 19, 394-401. https://doi.org/10.1016/j.ceb.2007.05.001.

      (4) Weaver, T.E., Na, C.L., and Stahlman, M. (2002). Biogenesis of lamellar bodies, lysosome-related organelles involved in storage and secretion of pulmonary surfactant. Semin Cell Dev Biol 13, 263-270. https://doi.org/10.1016/s1084952102000551.

      (5) Ott, D.P., Desai, S., Solinger, J.A., Kaech, A., and Spang, A. (2024). Coordination between ESCRT function and Rab conversion during endosome maturation. bioRxiv, 2024.2005.2014.594104. https://doi.org/10.1101/2024.05.14.594104.

      (6) Sato, M., Sato, K., Fonarev, P., Huang, C.J., Liou, W., and Grant, B.D. (2005). Caenorhabditis elegans RME-6 is a novel regulator of RAB-5 at the clathrin-coated pit. Nature cell biology 7, 559-569. https://doi.org/10.1038/ncb1261.

      (7) Mattera, R., Tsai, Y.C., Weissman, A.M., and Bonifacino, J.S. (2006). The Rab5 guanine nucleotide exchange factor Rabex-5 binds ubiquitin (Ub) and functions as a Ub ligase through an atypical Ub-interacting motif and a zinc finger domain. The Journal of biological chemistry 281, 6874-6883. https://doi.org/10.1074/jbc.M509939200.

    1. eLife Assessment

      This study demonstrates mRNA-specific regulation of translation by subunits of the eukaryotic initiation factor complex 3 (eIF3) using convincing methods, data, and analyses. The investigations have generated important information that will be of interest to biologists studying translation regulation. However, the physiological significance of the gene expression changes that were observed is not clear.

    2. Reviewer #1 (Public review):

      Summary:<br /> In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e and h) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      Strengths:<br /> Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:<br /> The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      The manuscript has improved with the new corrections. I appreciate the authors' attention to the minor comments, which have been fully solved. The authors have not, however, provided additional experimental evidence that uORF-mediated translation of Raf-1 mRNA depends on an intact eIF3 complex, nor have they addressed the consequences of such regulation for cell physiology. While I understand that this is a subject of follow-up research, the authors could have at least included their explanations/ speculations regarding major comments 2-4, which in my opinion could have been useful for the reader.

    3. Reviewer #2 (Public review):

      Summary:<br /> mRNA translation regulation permits cells to rapidly adapt to diverse stimuli by fine tuning gene expression. Specifically, the 13-subunit eukaryotic initiation factor 3 (eIF3) complex is critical for translation initiation as it aids in 48S PIC assembly to allow for ribosome scanning. In addition, eIF3 has been shown to drive transcript-specific translation by binding mRNA 5' cap structures through the eIF3d subunit. Dysregulation of eIF3 has been implicated in oncogenesis, however the precise eIF3 subunit contributions are unclear. Here, Herrmannová et al. aim to investigate how eIF3 subcomplexes, generated by knock down (KD) of either eIF3e, eIF3d or eIF3h, affect the global translatome. Using Ribo-seq and RNA-seq, the authors identified a large number of genes that exhibit altered translation efficiency upon eIF3d/e KD, while translation defects upon eIF3h KD were mild. eIF3d/eKD share multiple dysregulated transcripts, perhaps due to both subcomplexes lacking eIF3d. Both eIF3d/e KD increase translation efficiency (TE) of transcripts encoding lysosomal, ER and ribosomal proteins, suggesting a role of eIF3 in ribosome biogenesis and protein quality control. Many transcripts encoding ribosomal proteins harbor a TOP motif, and eIF3d KD and eIF3e KD cells exhibit a striking induction of these TOP-modified transcripts. On the other hand, eIF3d KD and eIF3e KD leads to a reduction of MAPK/ERK pathway proteins. Despite this downregulation, eIF3d KD and eIF3e KD activates MAPK/ERK signaling as ERK1/2 and c-Jun phosphorylation was induced. Finally, in all three knockdowns, MDM2 and ATF4 protein levels are reduced. This is notable because MDM2 and ATF4 both contain short uORFs upstream of the start codon, and further supports a role of eIF3 in reinitiation. Altogether, Herrmannová et al. have gained key insights to precise eIF3-mediated translational control as it relates to key signaling pathways implicated in cancer.

      Strengths:<br /> The authors have provided a comprehensive set of data to analyze RNA and ribosome footprinting upon perturbation of eIF3d, eIF3e, and eIF3h. As described above in the summary, these data present many interesting starting points to understand additional roles of the eIF3 complex and specific subunits in translational control.

      Weaknesses:<br /> - The differences between eIF3e and eIF3d knockdown are difficult to reconcile, especially since eIF3e knockdown leads to reduction in eIF3d levels.<br /> - The paper would be strengthened by experiments directly testing what RNA determinants allow for transcript-specific translation regulation by the eIF3 complex. This would allow the paper to be less descriptive.<br /> - The paper would have more biological relevance if eIF3 subunits were perturbed to mimic naturally occurring situations where eIF3 is dysregulated. For example, eIF3e is aberrantly upregulated in certain cancers, and therefore an overexpression and profiling experiment would have been more relevant than a knockdown experiment.

      The first review is unchanged as no additional experiments were provided to address the first review.

    4. Reviewer #3 (Public review):

      Summary:<br /> In this article, Hermannova et al catalog the changes in ribosome association with mRNAs when the multisubunit eukaryotic translation initiation factor 3 is disrupted by knocking down individual subunits. They find that RNAs relying on TOP motifs for translation, such as ribosomal protein RNAs, and RNAs encoding modification enzymes in the ER and components of the lysosome are upregulated. In contrast, proteins encoding components of MAP kinase cascades are downregulated when subunits of eIF3 are knocked down, but retain elevated levels of activity.

      Strengths:<br /> The authors use ribosome profiling of well-characterized mutants lacking subunits of eIF3 and assess the changes in translation that take place. They supplement the ribosome association studies with western blotting to determine protein level changes of affected transcripts. They analyze what transcripts undergo translation changes, which is important for understanding more broadly how translation initiation factor levels affect cancer cell translatomes. Changes observed by both ribosome profiling and western blotting supports their claims that eIF3 functions in mRNA-specific control of translation.

      Weaknesses:<br /> (1) The paper would be strengthened if there were a clear model tying the various effects together or linking individual subunit knockdown to cancerous phenotypes. It is noted that the authors plan to address such outcomes of eIF3 dysregulation in future work, which will be of interest.

      (2) The paper could also be strengthened if some of the experiments were performed in at least one other cell type to determine whether changes observed are general or cell-type specific. The authors discuss this issue and provide a literature citation to support a more general mechanism.

    1. eLife Assessment

      The study is considered important with solid evidence that demonstrates the impact of plasma membrane nano-domains and protein interactions in the plant defence response to viruses. It includes a molecular understanding of the role of a calcium dependent kinase (CPK3) and a remorin protein in the cell-to-cell spread of viruses and cytoskeletal dynamics demonstrating, conclusively, the role of CPK3 with multiple lines of evidence. The work opens avenues to investigate different viruses and other plasma membrane proteins to gain a fuller picture of the involvement of plasmodesmata and other nanodomains in virus spreading.

    2. Reviewer #1 (Public review):

      Summary:

      How plants perceive their environment and signal during growth and development is of fundamental importance for plant biology. Over the last few decades, nano domain organisation of proteins localised within the plasma-membrane has emerged as a way of organising proteins involved in signal pathways. Here, the authors addressed how a non-surface localised signal (viral infection) was resisted by PM localised signalling proteins and the effect of nano domain organisation during this process. This is valuable work as it describes how an intracellular process affects signalling at the PM where most previous work has focused on the other way round, PM signalling effecting downstream responses in the plant. They identify CPK3 as a specific calcium dependent protein kinase which is important for inhibiting viral spread. The authors then go on to show that CPK3 diffusion in the membrane is reduced after viral infection and study the interaction between CPK3 and the remorins, which are a group of scaffold proteins important in nano domain organisation. The authors conclude that there is an interdependence between CPK3 and remorins to control their dynamics during viral infection in plants.

      Strengths:

      The dissection of which CPK was involved in the viral propagation was masterful and very conclusive. Identifying CPK3 through knockout time course monitoring of viral movement was very convincing. The inclusion of overexpression, constitutively active and point mutation non-functioning lines further added to that.

      Weaknesses:

      I would like to thank the researchers for including some additional work suggested in the previous round of peer review. However, I still have concerns over this work which are two fold.

      (1) Firstly, the imaging described and shown is not sufficient to support the claims made. The PM localisation and its non-PM localised form look similar and with no PM stain or marker construct used to support this. In addition, the quality of lots of the confocal based imaging (including new figure on colocalisation) is simply not sufficient. The images are too noisy and no clear conclusions can be made. The point made previously, the system this data was collected on has an Airyscan detector capable of 120nm resolution and as such NDs can be resolved. The sptPALM data conclusions are nice and fit the narrative. The inclusion of sptPALM movies is useful for the reader and tracks numbers is highly beneficial. But they do not show a high signal to noise ratio compared to other work in the field (see work from Alex Martineire) and the mEOS prticles are only just observable over the detector noise in some videos. As such, I worry about the data quality on which the analysis is based on. In addition, in some of the videos the conversion laser seems too high as it is difficult to separate some of the single particles as they emerge which would again, hinder the analysis.

      (2) Secondly, remorins are involved in a lot of nano domain controlled processes at the PM. The authors have not conclusively demonstrated that during viral infection the remorin effects seen are solely due to its interaction with CPK3. The sptPALM imaging of REM1.2 in a cpk3 knockout line goes part way to solve this and the inclusion of CPK3-CA also strengthens the authors claims. But to propose a kiss and go model bearing in mind the differences in diffusion between CPK3 and REM3 and differential changes to diffusion between the two proteins after PIAMV infection without two colour imaging of both proteins at the same time, the claims are much stronger than the evidence. Negative control experiments are required here utilising other PM localised proteins which have no role during viral infection (such as Lti6B).

      Overall, I think this work has the potential to be a very strong manuscript but additional evidence supporting interaction claims would significantly strengthen the work and make it exceptional.

    3. Reviewer #3 (Public review):

      Summary:

      This study examined the role that the activation and plasma membrane localisation of a calcium dependent protein kinase (CPK3) plays in plant defence against viruses.<br /> The authors clearly demonstrate that the ability to hamper the cell-to-cell spread of the virus P1AMV is not common to other CPKs which have roles in defence against different types of pathogens, but appears to be specific to CPK3 in Arabidopsis. Further, they show that lateral diffusion of CPK3 in the plasma membrane is reduced upon P1AMV infection, with CPK3 likely present in nano-domains. This stabilisation however, depends on one of its phosphorylation substrates a Remorin scaffold protein REM1-2. However, when REM1-2 lateral diffusion was tracked, it showed an increase in movement in response to P1AMV infection. These contrary responses to P1AMV infection were further demonstrated to be interdependent, which led the authors to propose a model in which activated CPK3 is stabilised in nano-domains in part by its interaction with REM1.2, which it binds and phosphorylates, allowing REM1-2 to diffuse more dynamically within the membrane.

      The likely impact of this work is that it will lead to closer examination of the formation of nano-domains in the plasma membrane and dissection of their role in immunity to viruses, as well as further investigation into the specific mechanisms by which CPK3 and REM1-2 inhibit the cell-to-cell spread of viruses, including examination of their roles in cytoskeletal dynamics.

      Strengths:

      The paper provided compelling evidence about the roles of CPK3 and REM1-2 through a combination of logical reverse genetics experiments and advanced microscopy techniques, particularly in single particle tracking.

      Weaknesses:

      There is limited discussion or exploration of the role that CPK3 has in cytoskeletal organisation and whether this may play a role in the plant's defence against viral propagation. Further. although the authors show that there is no accumulation of CPK3/Rem1.2 at plasmodesmata, it would be interesting to investigate whether the demonstrated reduction of viral propagation is due to changes in PD permeability.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      How plants perceive their environment and signal during growth and development is of fundamental importance for plant biology. Over the last few decades, nano domain organisation of proteins localised within the plasma-membrane has emerged as a way of organising proteins involved in signal pathways. Here, the authors addressed how a non-surface localised signal (viral infection) was resisted by PM localised signalling proteins and the effect of nano domain organisation during this process. This is valuable work as it describes how an intracellular process affects signalling at the PM where most previous work has focused on the other way round, PM signalling effecting downstream responses in the plant. They identify CPK3 as a specific calcium dependent protein kinase which is important for inhibiting viral spread. The authors then go on to show that CPK3 diffusion in the membrane is reduced after viral infection and study the interaction between CPK3 and the remorins, which are a group of scaffold proteins important in nano domain organisation. The authors conclude that there is an interdependence between CPK3 and remorins to control their dynamics during viral infection in plants.

      Strengths:

      The dissection of which CPK was involved in the viral propagation was masterful and very conclusive. Identifying CPK3 through knockout time course monitoring of viral movement was very convincing. The inclusion of overexpression, constitutively active and point mutation non functioning lines further added to that.

      Weaknesses:

      My main concerns with the work are twofold.

      (1) Firstly, the imaging described and shown is not sufficient to support the claims made. The PM localisation and its non-PM localised form look similar and with no PM stain or marker construct used to support this. The sptPALM data conclusions are nice and fit the narrative. However, no raw data or movie is shown, only representative tracks. Therefore, the data quality cannot be verified and in addition, the reporting of number of single particle events visualised per experiment is absent, only number of cells imaged is reported. Therefore, it is impossible for the reader to appreciate the number of single molecule behaviours obtained and hence the quality of the data.

      (2) Secondly, remorins are involved in a lot of nanodomain controlled processes at the PM. The authors have not conclusively demonstrated that during viral infection the remorin effects seen are solely due to its interaction with CPK3. The sptPALM imaging of REM1.2 in a cpk3 knockout line goes part way to solve this but more evidence would strengthen it in my opinion. How do we not know that during viral infection the entire PM protein dynamics and organisation are altered? Or that CPK3 and REM are at very distant ends of a signalling cascade. Negative control experiments are required here utilising other PM localised proteins which have no role during viral infection. In addition, if the interaction is specific, the transiently expressed CPK3-CA construct (shown to from nano domains) should be expressed with REM1.2-mEOS to show the alterations in single particle behaviour occur due to specific activations of CPK3 and REM1.2 in the absence of PIAMV viral infection and it is not an artefact of whole PM changes in dynamics during viral infection.

      In addition, displaying more information throughout the manuscript (such as raw particle tracking movies and numbers of tracks measured) on the already generated data would strengthen the manuscript further.

      Overall, I think this work has the potential to be a very strong manuscript but additional reporting of methods and data are required and additional lines of evidence supporting interaction claims would significantly strengthen the work and make it exceptional.

      Reviewer #2 (Public Review):

      Summary:

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent.

      Strengths:

      The paper contains novel, important information.

      Weaknesses:

      The interpretation of some experimental data is not justified, and the proposed model is not fully based on the available data.

      Reviewer #3 (Public Review):

      Summary:

      This study examined the role that the activation and plasma membrane localisation of a calcium dependent protein kinase (CPK3) plays in plant defence against viruses.<br /> The authors clearly demonstrate that the ability to hamper the cell-to-cell spread of the virus P1AMV is not common to other CPKs which have roles in defence against different types of pathogens, but appears to be specific to CPK3 in Arabidopsis. Further they show that lateral diffusion of CPK3 in the plasma membrane is reduced upon P1AMV infection, with CPK3 likely present in nano-domains. This stabilisation however, depends on one of its phosphorylation substrates a Remorin scaffold protein REM1-2. However, when REM1-2 lateral diffusion was tracked, it showed an increase in movement in response to P1AMV infection. These contrary responses to P1AMV infection were further demonstrated to be interdependent, which led the authors to propose a model in which activated CPK3 is stabilised in nano-domains in part by its interaction with REM1.2, which it binds and phosphorylates, allowing REM1-2 to diffuse more dynamically within the membrane.

      The likely impact of this work is that it will lead to closer examination of the formation of nano-domains in the plasma membrane and dissection of their role in immunity to viruses, as well as further investigation into the specific mechanisms by which CPK3 and REM1-2 inhibit the cell-to-cell spread of viruses.

      Strengths:

      The paper provided compelling evidence about the roles of CPK3 and REM1-2 through a combination of logical reverse genetics experiments and advanced microscopy techniques, particularly in single particle tracking.

      Weaknesses:

      There is a lack of evidence for the downstream pathways, specifically whether the role that CPK3 has in cytoskeletal organisation may play a role in the plant's defence against viral propagation. Also, there is limited discussion about the localisation of the nano-domains and whether there is any overlap with plasmodesmata, which as plant viruses utilise PD to move from cell to cell seems an obvious avenue to investigate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Viral spread work in CPK mutants with time courses is beautiful!

      Regarding my public points on my issues with the imaging:

      - Figure 2A shows 'PM' localisation of CPK3 and 'non-PM' imaging of CPK3-G2A. The images are near identical both showing cell outlines and cytoplasmic strands. Here a PM marker (such as Lti6B) tagged with a different fluorophore or PM stain should be used in conjunction with surface views (such as in Figure 2C) to show it really is at the PM and the G2A line is not.

      Impaired membrane localization of CPK3-G2A is documented in Mehlmer et al., 2010 using microsomal fractionation. Although Figure 2A main purpose is to show correct expression of the constructs in the lines used for PlAMV propagation (Figure 2B), we replaced the images with wider view pictures to be more representative of the subcellular localization of CPK3 and CPK3-G2A.

      - Regarding Figure 2C, this is extremely noisy and PM heterogeneity is barely observable over the noise from the system (looking at the edges of surface imaged). You mention low resolution was an issue. I notice from the methods you have taken confocal images on an Zeiss 880 with Airyscan. These images must be confocal but If imaged with Airyscan the PM heterogeneity would be much clearer (see work from John Runions lab).

      Indeed, these are tangential views images obtained by Zeiss 880 with Airyscan. Based on tessellation analysis (Figure 2H-J), CPK3 is rather homogeneously distributed and forms ND of around 70nm of diameter. Objects of such size cannot be resolved using pixel reassignment methods such as Airyscan. Note also that AtREM in our study are less heterogeneously distributed than what was described in the literature for StREM1.3.

      - Regarding all sptPALM data. At least an example real data image and video is required otherwise the data can’t be assessed. The work of Alex Martiniere (sptPALM) or Alex Jonson (TIRF) all show raw data so the reader can appreciate the quality of the data. In addition, number of events (particles tracked) has to be shown in the figure legend, not just number of cells otherwise was one track performed per cell? Or 10,000? Obviously where the N sits in this range gives the reader more or less confidence of the data.

      We agree and we added example videos of sptPALM experiments in the supplementary data, we also indicated the number of tracked particles in the figure legends.

      - On a slight technical aside, how do you know the cells being imaged for sptPALM with PIAMV are actually infected with the virus? In Fig 2C you use a GFP tagged version but in sptPALM you use none tagged. I think a sentence in methods on this would help clarify.

      PlAMV-GFP was used for spt-PALM experiment and cell infection was assessed during PALM experiment. This is now precised in the corresponding figures and methods.

      - I also have a concern over some of the representative images showing the same things between different figures. Your clustering data in 3F looks very convincing. However, in Figure 2H the mock and PIAMV-GFP look very similar. How is Figure 3F so different for the same experiment? Especially considering the scale bars are the same for both figures. Same for CPK3-mRFP1.2 in Fig 2C and 3A, the same thing is being imaged, at the same scale (scale bars same size) but the images are extremely different.

      Figure 2 data were generated using CPK3 stably expressed in A. thaliana while Figure 3 data were obtained upon transient over-expression of CPK3 in N. benthamiana. We do not have a clear explanation for such a difference in CPK3 PM behavior, it could lie on a different PM composition or actin organization between those two species, this point is now addressed in the discussion.

      - Line 193&194 - you state that the CA CPK3 is reminiscent of the CPK3 upon PIAMV expression. I don't agree, while CPK3CA is less mobile (2D), the MSD shows it is in-between CPK3 and CPK3 + PIAMV. Therefore, can’t the opposite also be true? That overall the behaviour of CPK3-CA is reminiscent of WT CPK. I think this needs rewording.

      We agree and we reworded that part

      - Line 651 - what numerical aperture are you using for the lens during confocal microscopy. This is fundamentally important information directly related to the reproducibility of the work. You report it for the sptPALM.

      The numerical aperture is now indicated in the methods.

      Regarding my bigger point about specific interactions between CPK3 and remorin during viral infection to strengthen your claim the following need doing. I am not suggesting you do all of these but at least two would significantly enhance the paper.

      (1) Image a none related PM protein during viral infection using sptPALM and demonstrate that its behaviour is not altered (such as lti6b). This would show the affects on remorin behaviour are specific to CPK3 and not a whole scale PM alteration in dynamics due to viral infection.

      (2) Two colour SPT imaging of CPK3 and REM1.2. You show in absence of proteins (knockouts effect on each other) but your only interaction data is from a kinase assay (where CPK1 and 2 also interact, even though they are not localised at the same place) and colocalisation data (see below). A two colour SPT imaging experiment showing interaction and clustering of CPK3 and REM1.2 with each other and the change in their behaviours when viral infected and simultaneously imaged would address all of my concerns.

      - On another note, the co-localisation data (fig 5 sup 4) needs additional analysis. I would expect most PM proteins to show the results you show as the data is very noisy. In order to improve I would zoom in to fill the field of view and then determine correlation and also when one image is rotated 90 degrees (as described in Jarsch et al., plant cell) to enhance this work.

      (3) In the absence of viral infection, but presence of CPK3-CA, is sptPALM REM1.2 behaviour in the PM altered, if so then the interaction is specific and changes in remorin dynamics are not due to whole scale PM changes during viral infection and the manuscript substantially strengthened.

      (4) Building on from 3), if you have a CPK3 mutated with both CPK3-CA and G2A this would be constitutively active and non-PM localised and as such should not affect Remorin behaviour if your model is true, this would strengthen the case significantly but I appreciate is highly artificial and would need to be done transiently.

      Regarding the first point, since the role of PM proteins involved in potexvirus infection is barely assessed, picking a non-related PM protein might be tricky. The data obtained with mEOS3.2-REM1.2 expressed in cpk3 null-mutant point towards a specific role of CPK3 in PlAMV-induced REM1.2 diffusion and not a general alteration of PM protein behavior.

      Regarding the second point, we already reported the in vivo interaction between AtCPK3CA and AtREM1.2/AtREM1.3 by BiFC in N.benthamiana (Perraki et al 2018) and AtCPK3 was shown to co-IP with AtREM1.2 (Abel et al, 2021). While we agree on the relevance of doing dual color sptPALM with CPK3 and REM1.2, it is so far technically challenging and we would not be able to implement this in a timely manner. For the colocalization, although the whole cell is displayed in the figure, the analysis was performed on ROI to fill the field of analysis.

      We agree with the relevance of adding the colocalization analysis of randomized images (mTagBFP2 channel rotated 90 degrees), this is now added to Figure 5 – supplement figure 5.

      Finally, for the third and fourth points, spt-PALM analysis of REM1.2 in presence of CPK3-CA and CPK3-CA-G2A was performed (Figure 5 - figure supplement 4). The results suggest a specific role of CPK3-CA in REM1.2 diffusion.

      Minor points:

      Line 59 - from, I think you mean from.

      Line 63 - Reference needed after latter.

      Line 68 - Reference required after viral infection.

      Line 85 - Propose not proposed.

      Line 156 - Allowed us to not allows to.

      Line 204 - add we previously 'demonstrated'

      Line 622 and 623 - You say lines obtained from Thomas Ott. This is very odd phrasing considering he is an author. I appreciate citing the work producing the lines but maybe reword this

      These points were corrected, thank you.

      Reviewer #2 (Recommendations For The Authors):

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent. The paper contains novel, important information that can undoubtedly be published in eLife. However, I have some concerns that should be addressed before it can be accepted for publication.

      Major concerns

      When the authors say that CPK3 plays a role in viral propagation, it should be clarified what is meant by 'propagation', - replication of the viral genome, its cell-to-cell transport, or long-distance transport via the phloem. By default the readers will tend to assume the former meaning. In my opinion, the term 'propagation' is misleading and should be avoided.

      We purposely chose the term “propagation” because it sums replication and cell-to-cell movement. Nevertheless, we previously showed that group 1 StREM1.3 doesn’t alter PVX replication (Raffaele et al., 2009 The Plant Cell). In this paper, as we do not investigate the role of AtREM1.2 or AtCPK3 in the replication of the viral PlAMV genome, we cannot state that these proteins are strictly involved in cell-to-cell movement of the virus.

      The authors show that viral infection is associated with decreased diffusion of CPK3 and increased diffusion of REM1.2 in the PM. However, it remains unclear whether these changes are related to partial resistance to viral infection involving CPK3 and REM1.2, or whether they are simply a consequence of viral infection that may lead to altered PM properties and altered dynamics of PM-associated proteins. Therefore, the model presented in Fig. 6 appears to be entirely speculative, as it postulates that changes in CPK3 and REM1.2 dynamics are the cause of suppressed virus 'propagation'. In addition, the model implies that a decrease in CPK3 mobility leads to activation of its kinase activity. This view is not supported by experimental data (see my next comment). The model should be deleted (both as the figure and its description in the Discussion) or substantially reworked so that it finally relies on existing data.

      For the first point, the results obtained from the additional experiments proposed by reviewer #1 supports the hypothesis of a direct impact of CPK3 on REM1.2 diffusion (Figure 5 - figure supplement 4).

      We agree with the second point and reworked the model to remove the link between CPK3 activation and its increased diffusion.

      The statement that 'changes in CPK3 dynamics upon PlAMV infection are linked to its activation' (line 194) is based on a flawed logic, and the conclusion in this section of Results ('changes in CPK3 dynamics upon PlAMV infection are linked to its activation') is incorrect, as it is not supported by experimental data. In fact, the authors show that CPK3 dynamics and clustering upon viral infection is somewhat reminiscent of the behavior of a CPK3 deletion mutant, which is a constitutively active protein kinase. However, this partial similarity cannot be taken as evidence that CPK3 dynamics upon PlAMV infection are related to its activation. Furthermore, the authors emphasize the similarity of the mutant and CPK3 in infected cells without taking into account a drastic difference in their localization (Fig. 3A, middle and right panels) showing that the reduced dynamics or the compared proteins may have different causes. I suggest the removal of the section 'CPK3 activation leads to its confinement in PM ND' from the paper, as the results included in this section are not directly related to other data presented.

      The PM lateral organization of PM-bound CPKs in their native or constitutively active form as well as the role of lipid in such phenomenon was never shown before. We believe that this section contains relevant information for the community. We kept the section but reworded it to tamper the correlation made between CPK3 PM organization upon viral infection and its activation.

      Line 270 - 'group 1 REMs might play a role in CPK3 domain stabilization upon viral infection'. This is an overstatement. The size of the CPK3-containing NDs may have no correlation with their stability.

      We reworded the sentence.

      Minor points

      Line 204 - we previously that Line 234 and hereafter - "the D" sounds strange. Suggest using "the diffusion coefficient".

      This was reworded.

      Reviewer #3 (Recommendations For The Authors):

      The authors have previously demonstrated that there was an increase in REM1.2 localisation to plasmodesmata under viral challenge. It would be useful to see if there was any co-localisation of REM1.2 and CPK3 with plasmodesmata in response to PlAMV and how this is affected in the mutants. This could be carried out relatively simply using aniline blue.

      These experiments were added to the supplementary data of Figure 2 – figure supplement 2.  and Figure 4 – figure supplement 4. , no enrichment of CPK3 or REM1.2 at plasmodesmata could be observed upon PlAMV infection.

      Fig 3 supplementary figure 2 would be better incorporated into the main body of Figure 3 as this underpins discussion on the involvement of lipids such as sterols in the formation of nanodomains.

      We moved Figure 3 – Supplementary figure 2 to the main body of Figure 3.

      Minor corrections:

      Whilst the paper is generally well written there are a number of grammatical errors:

      Line 1 & 2: Title doesn't quite read correctly, suggest a rewording for clarity.

      L31: Insert "a"after only

      L33: Replace "are playing" with "play"

      L34: Begin sentence "Viruses are intracellular pathogens and as such the role..."59: replace "form" with "from"

      L63: Insert "was demonstrated" after REM1.2)

      L85: Replace "proposed" with "propose"

      L86: replace "encouraging to explore" with "which will encourage further exploration of "

      L129: replace "we'll focus on" with "we concentrated on"

      L131: insert "an" before ATP

      L138: change "among" to "amongst"

      L156: change "allows to analyse" to "allows the analysis of"

      L204: Insert "showed" after previously.

      L232: "root seedlings" should this be the roots of seedlings?

      L235: insert "to" after "as"

      L280: insert "a" after "only"

      L281: change " to play" with "as playing": change CA to superscript

      L307: Insert "was" after "transcription"

      L320: change "display" to "displaying"

      L321: change "form" to forms"

      L340: "hampering" should come before viral

      L365: insert"us' after "allow"

      Thank you, these were corrected

    1. eLife Assessment

      This valuable work describes results from a set of simulation and empirical studies of a set-up assessing exploratory behavior in a potentially rewarding environment that contains danger. The core idea is that an instrumental agent can be helped to be both effective and safe, thus avoiding excessive danger, during exploratory behavior, if its influence is flexibly gated by an independent Pavlovian fear learning system. The conclusion that safe, but effective exploration can be achieved based on a flexibly weighted combination of a Pavlovian and an instrumental agent is solid.

    2. Reviewer #1 (Public review):

      Summary:

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions.

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling.

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases.

      Strengths:

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models.

      Weaknesses:

      I find the conclusions misleading, as they are not supported by the data.

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task?

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j).

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone.

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making.

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here.

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018).

    3. Reviewer #2 (Public review):

      Summary:

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.).

      Strengths:

      (1) Simplicity of the model which can at the same time model rather complex environments.

      (2) Introduction of a flexible omega parameter.

      (3) Direct application to a rather advanced VR task.

      (4) The paper is extremely well written. It was a joy to read.

      Weaknesses:

      Almost none! In very few cases, the explanations could be a bit better.

    4. Reviewer #3 (Public review):

      Summary:

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so.

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards.

      Strengths:

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study.

      Weaknesses:

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions. 

      We thank the reviewer for their thoughtful comments. To clarify, the grid-world setup was used as a didactic tool/testbed to understand the interaction between Pavlovian and instrumental systems (lines 80-81) [Dayan et al., 2006], specifically in the context of safe exploration and learning. It helps us delineate the Pavlovian contributions during learning, which is key to understanding the safety-efficiency dilemma we highlight. This approach generates a hypothesis about outcome uncertainty-based arbitration between these systems, which we then test in the approach-withdrawal VR experiment based on foundational studies studying Pavlovian biases [Guitart-Masip et al., 2012, Cavanagh et al., 2013].

      Although the VR task does not explicitly involve rewards, it provides a specific test of our hypothesis regarding flexible Pavlovian fear bias, similar to how others have tested flexible Pavlovian reward bias without involving punishments (e.g., Dorfman & Gershman, 2019). Both the simulation and VR experiment models are derived from the same theoretical framework and maintain algebraic mapping, differing only in task-specific adaptations (e.g., differing in action sets and temporal difference learning for multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task). This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. Therefore, we respectfully disagree that the two setups are completely unrelated and that both models include components merely labelled as Pavlovian.

      We will rephrase parts of the manuscript to prevent the main message of our manuscript from being misconveyed. Particularly in the Methods and Discussion, to clarify that our main focus is on Pavlovian fear bias in safe exploration and learning (as also summarised by reviewers #2 and #3), rather than on its role in complex navigational decisions. We also acknowledge the need for future work to capture more sophisticated safe behaviours, such as escapes and sophisticated planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020], and we will highlight these as avenues for future research.

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thank you for this comment. We acknowledge that our paper does not compare the Pavlovian fear system to a purely instrumental system with varying punishment sensitivity. Instead, our model assumes the coexistence of these two systems and demonstrates the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone. In light of the reviewer’s comment, we will soften our claims regarding the necessity of the Pavlovian system, despite its known existence.

      We also encourage the reviewer to consider the Pavlovian system as a biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies, the neural substrates for the Pavlovian fear system (e.g., the limbic loop) are well known (see Supplementary Fig. 16).

      Additionally, we point out that varying reward sensitivities while keeping punishment sensitivity constant allows our PAL agent to differentiate from an instrumental agent that combines reward and punishment into a single feedback signal. As highlighted in lines 136-140 and the T-maze experiment (Fig. 3 A, B, C), the Pavlovian system maintains fear responses even under high reward conditions, guiding withdrawal behaviour when necessary (e.g., ω = 0.9 or 1), which is not possible with a purely instrumental model if the punishment sensitivities are fixed. This is a fundamental point.

      We will revise our discussion and results sections to reflect these clarifications.

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thank you, we respectfully disagree with the statement that our models used in the experimental setup are dissimilar to the ones used in the first setup. Due to differences in the nature of the task setup, the action set differs, but the model equations and the theory are the same and align closely, as described in our response above. The only additional difference is the use of a baseline bias in human experiments and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in grid world simulations. We will improve our Methods section to ensure that model similarity is highlighted.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      We thank reviewer #1 for acknowledging the relevance of our models in advancing the field. We would like to further highlight that, to the best of our knowledge, this is the first time reaction times in Pavlovian-Instrumental arbitration tasks have been modelled using RLDDM, which adds a novel dimension to our approach.

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      We acknowledge the dissimilarity between the task setups (grid-world vs. approach-withdrawal). However, we believe these setups are computationally similar and may be biologically related, as suggested by prior work like Dayan et al. [2006], which integrates Go-No Go and grid-world tasks. Just as that work bridged findings in the appetitive domain, we aim to integrate our findings in the aversive domain. We will provide a more integrated interpretation in the discussion section of the revised manuscript.

      Dayan, P., Niv, Y., Seymour, B., and Daw, N. D. (2006). The misbehavior of value and the discipline of the will. Neural networks, 19(8):1153–1160.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Thank you for your feedback. As mentioned above, we invite the reviewer to potentially think of Pavlovian fear systems as a way how the brain might implement punishment sensitivity. Secondly, it provides a separate punishment memory that cannot be overwritten with higher rewards (see also Elfwing and Seymour 2017, and Wang et al, 2021)

      Elfwing, S., & Seymour, B. (2017, September). Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm. In 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 140-147). IEEE. 

      Wang, J., Elfwing, S., & Uchibe, E. (2021). Modular deep reinforcement learning from reward and punishment for robot navigation. Neural Networks, 135, 115-126.

      The simulation setups such as the following grid-worlds are common test-beds for algorithms in reinforcement learning [Sutton and Barto, 2018].

      Any experimental setup faces the problem of having a constrained experiment designed to test and model a specific effect versus designing a lesser constrained exploratory experiment which is more difficult to model. Here we chose the former, building upon previous foundational experiments on Pavlovian bias in humans [Guitart-Masip et al., 2012, Cavanagh et al., 2013].  The condition where withdrawal from a jellyfish leads to a sting, though less realistic, was included for balancing the four cue-outcome conditions. Overall the task was designed to isolate the effect we wanted to test - Pavlovian fear bias in choices and reaction times, to the best of our ability. In a free operant task, it is very well likely that other components not included in our model could compete for control.

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      We agree that safe behaviours, such as escapes, involve more sophisticated computations. We do not propose Pavlovian fear bias as the sole computation for safe behavior, but rather as one of many possible contributors. Knowing about the existence about the Pavlovian withdrawal bias, we simply study its possible contribution. We will include in our discussion that such behaviours likely occupy different parts of the threat-imminence continuum [Mobbs et al., 2020].

      Dean Mobbs, Drew B Headley, Weilun Ding, and Peter Dayan. Space, time, and fear: survival computations along defensive circuits. Trends in cognitive sciences, 24(3):228–241, 2020.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      We thank the reviewer for their comment. We selected the action space to build on existing models [Guitart-Masip et al., 2012, Cavanagh et al., 2013] that capture Pavlovian biases and we also wanted to minimize participant movement for EEG data collection. Unfortunately, despite restricting movement to just the arm, the EEG data was still too noisy to lead to any substantial results. We will explore more free-operant paradigms in future works.

      On the issue of the difference between VR and lab-based tasks, we note the reviewer's point. Note however that desktop monitor-based tasks lack the sensorimotor congruency between the action and the outcome. Second, it is also arguable, that the background context is important in fear conditioning, as it may help set the tone of the fear system to make aversive components easier to distinguish.

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      We thank the reviewers for their thoughtful inputs. We do not claim our model is the best fit for all naturalistic VR tasks, as they require multiple systems across the threat-imminence continuum [Mobbs et al., 2020] and are currently beyond the scope of the current work. However, we believe our findings on outcome-uncertainty-based arbitration of Pavlovian bias could inform future studies and may be relevant for testing differences in patients with mental disorders, as noted by reviewer #2. At a general level, it can be said that most well-controlled laboratory-based tasks need to bridge a sizeable gap to applicabilty in real-life naturalistic behaviour; although the principle of using carefully designed tasks to isolate individual factors is well established

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We thank the reviewer for their comments and ideas. In our discussion lines 257-264, we discuss other works which identify similar safety-efficiency dilemmas, in different models. Here, we simply focus on the safety-efficiency trade-off arising from the interactions between Pavlovian and instrumental systems. It is important to note that the computational argument for the modular system with separate rewards and punishments explicitly protects (up to a point, of course) against large rewards leading to death because the Pavlovian fear response is not over-written by successful avoidance in recent experience. Note also that in animals, reward utility curves are typically convex. We will clarify this in the discussion section.

      We completely agree that in certain scenarios, pruning decision trees could be more effective, especially with a model-based instrumental agent. Here we utilise a model-free instrumental agent, which leads to a simpler model - which is appreciated by some readers such as reviewer #2. Future work can incorporate model-based methods.

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We thank the reviewer for bringing this to our notice. We will discuss Tzovara et al., 2018 in our discussion in our revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      We thank reviewer #2 for their positive feedback and thoughtful recommendations. We will ensure that, in our revision, we clarify the explanations in the few instances where they may not be sufficiently detailed, as noted.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      We thank reviewer #3 for their thoughtful feedback and useful recommendations, which we will take into account while revising the manuscript.

      We acknowledge the complexity of specifying Pavlovian bias in the grid world and appreciate the opportunity to elaborate on how this bias is modelled. In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et. al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesized to function as a Pavlovian fear/threat learning system [Menegas et. al., 2018].

      Additionally, we explored the possibility of learning the action bias on the fly by tracking additional punishment Q-values instead of pre-training, which produced similar cumulative pain and step plots. While this approach is redundant, and likely not how the brain operates, it demonstrates an alternative algorithm.

      We thank the reviewer for pointing out these potentially unrealistic elements, and we will revise the manuscript to clarify and incorporate these explanations and improve the model descriptions.

      Eun Joo Kim, Omer Horovitz, Blake A Pellman, Lancy Mimi Tan, Qiuling Li, Gal Richter-Levin, and Jeansok J Kim. Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats. Proceedings of the National Academy of Sciences, 110(36):14795–14800, 2013

      William Menegas, Korleki Akiti, Ryunosuke Amo, Naoshige Uchida, and Mitsuko Watabe-Uchida. Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nature neuroscience, 21(10): 1421–1430, 2018

    1. eLife Assessment

      The authors identify a novel relationship between exosome secretion and filopodia formation that has implications for cancer cell metastasis and neuronal synapse formation. Further, they identify the exosomal cargo, THSD7A, as a regulator of this process. The data presented is convincing, and represents an important advancement in our understanding of how these two biological processes are linked and play roles in regulating cell migration and cell-cell communication.

    2. Reviewer #1 (Public review):

      Summary:

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development.

      Weaknesses:

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42.

      Strengths:

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance.

      Weaknesses:

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly

    4. Reviewer #3 (Public review):

      Summary:

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism.

      Strengths:

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel.

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function.

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes.

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia.

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings.

      Weaknesses:

      (1) A better characterization of the nature of the small EV population is missing:

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations.

      (2) Data analysis and quantification should be performed with increased rigor:

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A:

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells?

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development. 

      Weaknesses: 

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed. 

      We appreciate the reviewer's recognition of the impact of our study.  We will address the concerns about data analysis and statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review): 

      Summary: 

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42. 

      Strengths: 

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance. 

      Weaknesses: 

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly.

      We appreciate the reviewer's recognition of the impact of our study.  Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A.  We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review): 

      Summary: 

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism. 

      Strengths: 

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel. 

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function. 

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes. 

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia. 

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings. 

      Weaknesses: 

      (1) A better characterization of the nature of the small EV population is missing: 

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations. 

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a Coomassie gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent 4 bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor: 

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy. 

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate.  Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy to get an accurate calculation of this number.  Nonetheless, we will review our live imaging data for this experiment to determine if this calculation is possible. Again, we will be limited by the frame rate we used to capture the images, so we could possibly be missing secretion events taking place between the 10 second time intervals.  Regardless, for the secretion events that we visualized, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript.  A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging.  We will clarify this in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful. 

      Our data shows that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A.  Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013).  We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 mm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area, as there were no significant differences in cell area between conditions and experiments. We plan to include a new supplementary figure showing the data in Figure 2 plotted as filopodia per cell to show that this quantification gives the same results.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats. 

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions.  We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were usually unable to detect THSD7A using these same conditions for the mouse melanoma B16F1 samples, but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns. Based on our THSD7A trafficking data, we believe that in control cells, most of the THSD7A is getting trafficked and secreted via small EVs. As you can see in Figure 7A, the band for THSD7A in the shScr cell lysate is relatively light and also shows a double band similar to Figure 6E (both HT1080 samples).

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands.  If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant.  Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A: 

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8. 

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet.  In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells? 

      The images for Figure 7E were taken with high resolution on a confocal microscope.  Insets for Figure 7E were zoomed in so that readers could see the tiny structures.  Zoom 1 in Figure 7E shows areas of extracellular deposition. In these areas, we can see small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more export of THSD7A into small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.  Quantification of internal THSD7A localization is much more straightforward in this experimental regime.  Indeed, in Figure 7F we assessed internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

    1. eLife Assessment

      This valuable study investigates how the size of an LLM may influence its ability to model the human neural response to language recorded by ECoG. Overall, solid evidence is provided that larger language models can better predict the human ECoG response. Further discussion would be beneficial as to how the results can inform us about the brain or LLMs, especially about the new message that can be learned from this ECoG study beyond previous fMRI studies on the same topic. This study will be of interest to both neuroscientists and psychologists who work on language comprehension and computer scientists working on LLMs.

    2. Reviewer #1 (Public review):

      Summary:

      The authors perform an analysis of the relationship between the size of an LMM and the predictive performance of an ECoG encoding model made using the representations from that LMM. They find a logarithmic relationship between model size and prediction performance, consistent with previous findings in fMRI. They additionally observe that as the model size increases, the location of the "peak" encoding performance typically moves further back into the model in terms of percent layer depth, an interesting result worthy of further analysis into these representations.

      Strengths:

      The evidence is quite convincing, consistent across model families, and complementary to other work in this field. This sort of analysis for ECoG is needed and supports the decade-long enduring trend of the "virtuous cycle" between neuroscience and AI research, where more powerful AI models have consistently yielded more effective predictions of responses in the brain. The lag analysis showing that optimal lags do not change with model size is a nice result using the higher temporal resolution of ECoG compared to other methods like fMRI.

      Weaknesses:

      I would have liked to have seen the data scaling trends explored a bit too, as this is somewhat analogous to the main scaling results. While better performance with more data might be unsurprising, showing good data scaling would be a strong and useful justification for additional data collection in the field, especially given the extremely limited amount of existing language ECoG data. I realize that the data here is somewhat limited (only 30 minutes per subject), but authors could still in principle train models on subsets of this data.

      Separately, it would be nice to have better justification of some of these trends, in particular the peak layerwise encoding performance trend and the overall upside-down U-trend of encoding performance across layers more generally. There is clearly something very fundamental going on here, about the nature of abstraction patterns in LLMs and in the brain, and this result points to that. I don't see the lack of justification here as a critical issue, but the paper would certainly be better with some theoretical explanation for why this might be the case.

      Lastly, I would have wanted to see a similar analysis here done for audio encoding models using Whisper or WavLM as this is the modality where you might see real differences between ECoG and other slower scanning approaches. Again, I do not see this omission as a fundamental issue, but it does seem like the sort of analysis for which the higher temporal resolution of ECoG might grant some deeper insight.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates whether large language models (LLMs) of increasing size more accurately align with brain activity during naturalistic language comprehension. The authors extracted word embeddings from LLMs for each word in a 30-minute story and regressed them against electrocorticography (ECoG) activity time-locked to each word as participants listened to the story. The findings reveal that larger LLMs more effectively predict ECoG activity, reflecting the scaling laws observed in other natural language processing tasks.

      Strengths:

      (1) The study compared model activity with ECoG recordings, which offer much better temporal resolution than other neuroimaging methods, allowing for the examination of model encoding performance across various lags relative to word onset.

      (2) The range of LLMs tested is comprehensive, spanning from 82 million to 70 billion parameters. This serves as a valuable reference for researchers selecting LLMs for brain encoding and decoding studies.

      (3) The regression methods used are well-established in prior research, and the results demonstrate a convincing scaling law for the brain encoding ability of LLMs. The consistency of these results after PCA dimensionality reduction further supports the claim.

      Weaknesses:

      (1) Some claims of the paper are less convincing. The authors suggested that "scaling could be a property that the human brain, similar to LLMs, can utilize to enhance performance", however, many other animals have brains with more neurons than the human brain, making it unlikely that simple scaling alone leads to better language performance. Additionally, the authors claim that their results show 'larger models better predict the structure of natural language.' However, it remains unclear to what extent the embeddings of LLMs capture the "structure" of language better than the lexical semantics of language.

      (2) The study lacks control LLMs with randomly initialized weights and control regressors, such as word frequency and phonetic features of speech, making it unclear what the baseline is for the model-brain correlation.

      (3) The finding that peak encoding performance tends to occur in relatively earlier layers in larger models is somewhat surprising and requires further explanation. Since more layers mean more parameters, if the later layers diverge from language processing in the brain, it raises the question of what aspects of the larger models make them more brain-like.

    4. Reviewer #3 (Public review):

      This manuscript studies the connection between neural activity collected through electrocorticography and hidden vector representations from autoregressive language models, with the specific aim of studying the influence of language model size on this connection. Neural activity was measured from subjects who listened to a segment from a podcast, and the representations from language models were calculated using the written transcription as the input text. The ability of vector representations to predict neural activity was evaluated using 10-fold cross-validation with ridge regression models.

      The main results are that (as well summarized in section headings):

      (1) Larger models predict neural activity better.

      (2) The ability of language model representations to predict neural activity differs across electrodes and brain regions.

      (3) The layer that best predicts neural activity differs according to model size, with the "SMALL" model showing a correspondence between layer number and the language processing hierarchy.

      (4) There seems to be a similar relationship between the time lag and the ability of language model representations to predict neural activity across models.

      Strengths:

      (1) The experimental and modeling protocols generally seem solid, which yielded results that answer the authors' primary research question.

      (2) Electrocorticography data is especially hard to collect, so these results make a nice addition to recent functional magnetic resonance imaging studies.

      Weaknesses:

      (1) The interpretation of some results seems unjustified, although this may just be a presentational issue.

      a) Figure 2B: The authors interpret the results as "a plateau in the maximal encoding performance," when some readers might interpret this rather as a decline after 13 billion parameters. Can this be further supported by a significance test like that shown in Figure 4B?

      b) Figure S1A: It looks like the drop in PCA max correlation is larger for larger models, which may suggest to some readers that the same trend observed for ridge max correlation may not hold, contra the authors' claim that all results replicate. Why not include a similar figure as Figure 2B as part of Figure S1?

      (2) Discussion of what might be driving the main result about the influence of model size appears to be missing (cf. the authors aim to provide an explanation of what seems to drive the influence of the layer location in Paragraph 3 of the Discussion section). What explanations have been proposed in the previous functional magnetic resonance imaging studies? Do those explanations also hold in the context of this study?

      (3) The GloVe-based selection of language-sensitive electrodes (at least to me) isn't explained/motivated clearly enough (I think a more detailed explanation should be included in the Materials and Methods section). If the electrodes are selected based on GloVe embeddings, then isn't the main experiment just showing that representations from larger language models track more closely with GloVe embeddings? What justifies this methodology?

      (4) (Minor weakness) The main experiments are largely replications of previous functional magnetic resonance imaging studies, with the exception of the one lag-based analysis. Is there anything else that the electrocorticography data can reveal that functional magnetic resonance imaging data can't?

    5. Author response:

      We thank the reviewers for their thoughtful feedback and valuable comments. We plan to fully address their concerns by including the following experiments and analyses:

      Reviewer 1 suggested exploring data scaling trends for encoding models, as successful scaling would justify larger datasets for language ECoG studies. To estimate scaling effects, we will develop encoding models on subsets of our data.

      Reviewer 2 expressed uncertainty about the baseline for model-brain correlation and recommended adding control LLMs with randomly initialized weights. In response, we will generate embeddings using untrained LLMs to establish a more robust baseline for encoding results.

      Reviewer 2 also proposed incorporating control regressors such as word frequency and phonetic features of speech. We will re-run our modeling analysis using control regressors for word frequency, 8 syntactic features (e.g., part of speech, dependency, prefix/suffix), and 3 phonetic features (e.g., phonemes, place/manner of articulation) to assess how much these features contribute to encoding performance.

      Reviewer 3 raised concerns that the “plateau in maximal encoding performance” was actually a decline for the largest models. We will add significance tests in Figure 2B to clarify this issue.

      Reviewer 3 also noted that in Supplementary Figure 1A, the decline in encoding performance was more pronounced when using PCA to reduce embedding dimensionality, in contrast to the trend observed when using ridge regression. To address this, we will attempt to replicate the observed scaling trends in Figure 2B using PCA combined with OLS.

      Additionally, we will provide a point-by-point response and revise the manuscript with updated analyses and figures in the near future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The Notch signaling pathway plays an important role in many developmental and disease processes. Although well-studied there remain many puzzling aspects. One is the fact that as well as activating the receptor through trans-activation, the transmembrane ligands can interact with receptors present in the same cell. These cis-interactions are usually inhibitory, but in some cases, as in the assays used here, they may also be activating. With a total of 6 ligands and 4 receptors, there is potentially a wide array of possible outcomes when different combinations are co-expressed in vivo. Here the authors set out to make a systematic analysis of the qualitative and quantitative differences in the signaling output from different receptor-ligand combinations, generating sets of "signaling" (ligand expressing) and "receiving" (receptor +/- ligand expressing cells).

      The readout of pathway activity is transcriptional, relying on the fusion of GAL4 in the intracellular part of the receptor. Positive ligand interactions result in the proteolytic release of Gal4 that turns on the expression of H2B-citrine. As an indicator of ligand and receptor expression levels, they are linked via TA to H2B mCherry and H2B mTurq expression respectively. The authors also manipulate the expression of the glycosyltransferase Lunatic-Fringe (LFng) that modifies the EGF repeats in the extracellular domains impacting their interactions. The testing of multiple ligand-receptor combinations at varying expression levels is a tour de force, with over 50 stable cell lines generated, and yields valuable insights although as a whole, the results are quite complex.

      Strengths:

      Taking a reductionist approach to testing systematically differences in the signaling strength, binding strength, and cis-interactions from the different ligands in the context of the Notch1 and Notch 2 receptors (they justify well the choice of players to test via this approach) produces a baseline understanding of the different properties and leads to some unexpected and interesting findings. Notably:

      -                Jag1 ligand expressing cells failed to activate Notch1 receptor although were capable of activating Notch2. Conversely, Jag2 cells elicited the strongest activation of both receptors. The results with

      Jag1 are surprising also because it exhibits some of the strongest binding to plate-bound ligands. The failure to activate Notch1 has major functional significance and it will be important in the future to understand the mechanistic basis.

      -                Jagged ligands have the strongest cis-inhibitory effects and the receptors differ in their sensitivity to cis-inhibition by Dll ligands. These observations are in keeping with earlier in vivo and cell culture studies. More referencing of those would better place the work in context but it nicely supports and extends previous studies that were conducted in different ways.

      -                Responses to most trans-activating ligands showed a degree of ultrasensitivity but this was not the case for cis-interactions where effects were more linear. This has implications for the way the two mechanisms operate and for how the signaling levels will be impacted by ligand expression levels.

      -                Qualitatively similar results are obtained in a second cell line, suggesting they reflect fundamental properties of the ligands/receptors.

      We appreciate the positive and constructive feedback.

      Weaknesses:

      One weakness is that the methods used to quantify the expression of ligands and receptors rely on the co-translation of tagged nuclear H2B proteins. These may not accurately capture surface levels/correctly modified transmembrane proteins. In general, the multiple conditions tested partly compensate for the concerns - for example, as Jag1 cells do activate Notch2 even if they do not activate Notch1 some Jag1 must be getting to the surface. But even with Notch2, Jag1 activities are on the lower side, making it important to clarify, especially given the different outcomes with the plated ligands. Similarly, is the fact that all ligands "signalled strongest to Notch2" an inherent property or due to differences in surface levels of Notch 2 compared to Notch1? The results would be considerably strengthened by calibration of the ligand/receptor levels (and ideally their sub-cellular localizations). Assessing the membrane protein levels would be relatively straightforward to perform on some of the basic conditions because their ligand constructs contain Flag tags, making it plausible to relate surface protein to H2B, and there are antibodies available for Notch1 and Notch2.

      We agree that mCherry fluorescence does not provide a direct readout of active surface ligand levels. As the reviewer points out, the ability of Jag1 to activate Notch2 demonstrates that expressed Jag1 is competent for signaling. Further, in some cases, Jag1-Notch2 activation can be comparable to Dll1-Notch2 activation (Figure 2A). Following the reviewer’s suggestion, we performed a Western blot for multiple expression levels for each of three surface ligands (Dll1, Dll4, Jag1) (Figure 2—figure supplement 2). This blot revealed a signal for surface expression of Jag1. Interpretation is complicated by the expected dependence of the efficiency of surface protein purification on the number of primary amines in the protein, which varies among these ligands, and qualitatively correlates with the staining intensity. While this makes quantitative interpretation difficult, this result further supports the notion that Jag1 is present on the cell surface. Finally, we note that high signaling activity need not, in general, directly correlate with surface expression levels. In fact, one study showed an example in which increased ligand activity occurred with decreased basal ligand surface levels (Antfolk et al., 2017). While one would ideally like to know all parameters of the system, including surface protein levels, rates of recycling, etc. the perspective taken here is that the net effect of these many post-translational processing steps can be subsumed into the overall relationship between the expression of the protein (which, in our case, is read out by the co-translational reporter) and its activity, which is relevant for the behavior of developmental circuits, among other systems. To address this comment, we now explicitly mention the limitation of mCherry as a proxy for surface protein, and add a reference to previous work highlighting the relationship between surface levels and ligand activity.

      In terms of the dependence of signaling on Notch levels, the metric of signaling activity used here is explicitly normalized by the mTurquoise co-translational reporter of Notch expression to account for differences in receptor expression across receiver clones. We have added a new figure to show the variation in expression (Figure 1—figure supplement 1A) and to demonstrate this normalization (Figure 1—figure supplement 5). Having said that, as the reviewer correctly points out, we cannot directly address the dependence on surface receptor levels with mTurquoise alone. To address this comment, we have added a figure that shows cotranslational and surface receptor expression for a subset of our receiver clones (Figure 1—figure supplement 1B). Although antibody binding strengths may vary, it appears unlikely that higher surface levels could explain most ligands’ preferential activation of Notch2 over Notch1, since Notch2 levels were lower than Notch1 levels in both surface expression and cotranslational expression.

      Cis-activation as a mode of signaling has only emerged from these synthetic cell culture assays raising questions about its physiological relevance. Cis-activation is only seen at the higher ligand (Dll1, Dll4) levels, how physiological are the expression levels of the ligands/receptors in these assays? Is it likely that this would make a major contribution in vivo? Is it possible that the cells convert themselves into "signaling" and "receiving" sub-populations within the culture by post-translational mechanism? Again some analysis of the ligand/receptors in the cultures would be a valuable addition to show whether or not there are major heterogeneities.

      The cis-activation results in this paper are, as the reviewer points out, conducted in synthetic cell culture assays. Cis-activation is observed across a large dynamic range of ligand expression, possibly including non-physiologically high levels. However, our previous work (Nandagopal et al, eLife 2019) showed that cis-activation does not require over-expression, as it occurred in unmodified Caco-2 and NMuMG cells with their endogenous ligand and receptor expression levels. As shown here in Figure 4B, cis-activation for Notch2 increases monotonically and is substantial even at intermediate ligand concentrations. In other cases, cis-activation is maximal at intermediate concentrations. We agree that the in vivo role remains unclear, and is difficult to determine due to the typical close contacts among cells in tissues. Therefore, these assays do not speak to in vivo relevance. Note that we can, however, rule out the possibility of trans signaling between well-mixed cell populations at these densities (Figure 4A).

      It is hard to appreciate how much cell-to-cell variability in the "output" there is. For example, low "outputs" could arise from fewer cells becoming activated or from all cells being activated less. As presented, only the latter is considered. That may be already evident in their data, but not easy for the reader to distinguish from the way they are presented. For example, in many of the graphs, data have been processed through multiple steps of normalization. Some discussion/consideration of this point is needed.

      We agree that in different experiments changes in a mean response can reflect changes in fraction of activated cells, or level of activation or some combination of both. In this work, most assays were conducted by flow cytometry, which provides a full distribution of cellular responses. We provided distributions for some experiments in the supplementary figures (i.e., Figure 4—figure supplement 1, and Figure 5—figure supplement 4). The sheer number of experiments and samples prevents us from displaying all underlying histograms. Therefore, we have provided all flow data sets in an extensive archive that is publicly available on data.caltech.edu (https://doi.org/10.22002/gjjkn-wrj28).

      Impact:

      Overall, cataloging the outcomes from the different ligand-receptor combinations, both in cis and trans, yields a valuable baseline for those investigating their functional roles in different contexts. There is still a long way to go before it will be possible to make a predictive model for outcomes based on expression levels, but this work gives an idea about the landscape and the complexities. This is especially important now that signaling relationships are frequently hypothesized based on single-cell transcriptomic data. The results presented here demonstrate that the relationships are not straightforward when multiple players are involved.

      We appreciate this concise impact summary, and agree with its conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors extend their previous studies on trans-activation, cis-inhibition (PMID: 25255098), and cis-activation (PMID: 30628888) of the Notch pathway. Here they create a large number of cell lines using CHO-K1 and C2C12 cells expressing either Notch1-Gal4 or Notch2-Gal4 receptors which express a fluorescent protein upon receptor activation (receiver cells). For cis-inhibition and cis-activation assays, these cells were engineered to express one of the four canonical Notch ligands (Dll1, Dll4, Jag1, Jag2) under tetracycline control. Some of the receiver cells were also transfected with a Lunatic fringe (Lfng) plasmid to produce cells with a range of Lfng expression levels. Sender cells expressing all of the canonical ligands were also produced. Cells were mixed in a variety of co-culture assays to highlight trans-activation, cis-activation, and cis-inhibition. All four ligands were able to trans-activate Notch1 and Notch 2, except Jag1 did not transactivate Notch1. Lfng enhanced trans-activation of both Notch receptors by Dll1 and Dll2, and inhibited Notch1 activation by Jag2 and Notch2 activation by both Jag 1 and Jag2. Cis-expression of all four ligands was predominantly inhibitory, but Dll1 and Dll4 showed strong cis-activation of Notch2. Interestingly, cis-ligands preferentially inhibited trans-activation by the same ligand, with varying effects on other trans-ligands.

      Strengths:

      This represents the most comprehensive and rigorous analysis of the effects of canonical ligands on cis- and trans-activation, and cis-inhibition, of Notch1 and Notch2 in the presence or absence of Lfng so far. Studying cis-inhibition and cis-activation is difficult in vivo due to the presence of multiple Notch ligands and receptors (and Fringes) that often occur in single cells. The methods described here are a step towards generating cells expressing more complex arrays of ligands, receptors, and Fringes to better mimic in vivo effects on Notch function.

      In addition, the fact that their transactivation results with most ligands on Notch1 and 2 in the presence or absence of Lfng were largely consistent with previous publications provides confidence that the author's assays are working properly.

      We appreciate the thoughtful comments and feedback.

      Weaknesses:

      It was unusual that the engineered CHO cells expressing Notch1-Gal4 were not activated at all by co-culture with Jag1-expressing CHO cells. Many previous reports have shown that Jag1 can activate Notch1 in co-culture assays, including when Notch1 was expressed in CHO cells. Interestingly, when the authors used Jag1-Fc in a plate coating assay, it did activate Notch1 and could be inhibited by the expression of Lfng.

      In our assays, we do in fact also see some signaling of Jag1 to Notch1, especially when dLfng is coexpressed (Figure 2—figure supplement 4, formerly Figure 2—figure supplement 3). While these levels are lower than those observed for other ligand-receptor combinations, they are significantly elevated compared to baseline. In specific natural contexts, it will be important to determine whether the weak but non-zero Jag1-Notch1 signaling acts negatively to suppress signaling from other ligands, or provides weak but potentially functionally important levels of signaling. Evidence for both modes exists in the literature. To address this, we have expanded the discussion of Jag1-Notch1 signaling and added references to other work on Jag1-Notch1 signaling to the Discussion section.

      The cell surface level of the ligands was determined by flow cytometry of a co-translated fluorescent protein. Some calibration of the actual cell surface levels with the fluorescent protein would strengthen the results.

      This issue was also raised by Reviewers #1 and #3. Please see responses to Reviewer #1, above.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports a comprehensive analysis of Notch-Delta/Jagged signaling inclusive of the human Notch1 and Notch2 receptors and DLL1, DLL4, JAG1, and JAG2 ligands. Measurements

      encompassed signaling activity for ligand trans-activation, cis-activation, cis-inhibition, and activity modulation by Lfng. The most striking observations of the study are that JAG1 has no detectable activity as a Notch1 ligand when presented on a cell (though it does have activity when immobilized on a surface), even though it is an effective cis-inhibitor of Notch1 signaling by other ligands, and that DLL1 and DLL4 exhibit cis-activating activity for Notch1 and especially for Notch2. Notwithstanding the artificiality of the system and some of its shortcomings, the results should nevertheless be a valuable resource for the Notch signaling community.

      Strengths:

      (1)  The work is systematic and comprehensive, addressing questions that are of importance to the community of researchers investigating mammalian Notch proteins, their activation by ligands, and the modulation of ligand activity by LFng.

      (2)  A quantitative and thorough analysis of the data is presented.

      Weaknesses:

      (1) The manuscript is primarily descriptive and does not delve into the underlying, mechanistic origin or source of the different ligand activities.

      We agree that the goals of this paper were largely to discover the range of signaling modes that occur. A mechanistic analysis would be beyond the scope of this work, but we agree it is an important next step.

      (2) The amount of ligand or receptor expressed is inferred from the flow cytometry signal of a co-translated fluorescent protein-histone fusion, and is not directly measured. The work would be more compelling if the amount of ligand present on the cell surface were directly measured with anti-ligand antibodies, rather than inferred from measurements of the fluorescent protein-histone fusion.

      This issue was also raised by Reviewers #1 and #2. Please see responses to Reviewer #1, above.

      (3) It would be helpful to see plots of the raw activity data before transformation and normalization, because the plots present data after several processing steps, and it is not clear how the processed data relate to the original values determined in each measurement.

      We included examples showing how raw data is processed in Figure 4—figure supplement 1 and Figure 5—figure supplement 4. The sheer number of experiments precludes including similar figures for all data sets. However, all raw and processed data and data analysis code is publicly available at (https://doi.org/10.22002/gjjkn-wrj28).

      (4) The authors use sparse plating of engineered cells with parental (no ligand or receptor-expressing cell to measure cis activation). However, the cells divide within the cultured period of 22-24 h and can potentially trans-activate each other.

      If measured cis-activation signal arises solely from trans-activation, then the measured cis-activation signal per cell should increase with cell density, since trans-activation per cell does depend on cell density (Figure 4A). However, for the strongest cis-activators (Dll1- and Dll4-Notch2), signaling magnitude is similar when these cells are cultured sparsely or at confluence, which would otherwise allow efficient trans signaling (Figure 5A). Thus, for Dll1- and Dll4-Notch2 receivers, total signaling strength per cell depends little or not at all on the opportunity to signal intercellularly. Moreover, cis-activation signal for the Dll1- and Dll4-Notch2 combinations exceeded the maximum trans-signaling levels we could achieve for the same receivers when cis-ligand was suppressed (Figure 4B). These results argue that cis interactions dominate signaling in this context. However, we have not ruled out the possibility that trans-signaling between sister cells after division contributes to the comparatively weak cis-activation observed for Notch1 receivers.

      Reviewer #1 (Recommendations For The Authors):

      As outlined in the public review, there is a question of whether the nuclear H2B accurately reflects the surface levels of the transmembrane proteins (ligand and receptor). Clearly, it would not be feasible to check levels in all of the experimental conditions, but some baseline conditions should be analyzed.

      We addressed this above.

      Reviewer #2 (Recommendations For The Authors):

      (1)  As mentioned above, it was unusual that Jag1 did not activate Notch1 in co-culture assays, but did activate Notch1 in plate-coating assays. The authors should add some text to the Discussion to explain why they think this is happening in their engineered cells. One possibility is that the CHO cells express Manic fringe (Mfng) which is known to reduce Jag1-Notch1 activation. Data for Mfng levels in CHO cells were not included in Supplemental Table 2. Knocking down all three Fringes in CHO cells might increase Jag1-Notch1 activation.

      This is already addressed in a sentence in the results: “Strikingly, while Jag1 sender cells failed to activate Notch1 receivers above background (Figure 2D), plate-bound Jag1-ext-Fc activated Notch1 only ~3-fold less efficiently than it activated Notch2 (Figure 3B-D). This suggests that the natural endocytic activation mechanism, or potential differences in tertiary structure between the expressed and recombinant Jag1 extracellular domains, could play roles in preventing Jag1-Notch1 signaling in coculture.” Regarding the point about Mfng, we added a note to Supplementary Table about other CHO-K1 expression data.

      (2) Figure 1-supplemental figure 1: Both the Notch1-Jag1 and Notch1-Jag2 cells show high expression of Jag1 in low 4epi, but any higher concentration reduces to control levels. How much of a problem is this for interpreting your data?

      This was not the ideal behavior, but by binning cells by co-translational reporters for ligand expression, we were able to obtain enough cells in intermediate bins. (Note: Figure 1—figure supplement 1 is now Figure 1—figure supplement 2.)

      (3)  Figure 1C legend: Are these stably-expressing cells or Tet-off cells? Please state in legend.

      The figure legend has been updated.

      (4)  Figure 1E: How long is the knockdown of Rfng and Lfng effective? Does it affect the expression of Lfng later?

      siRNA effects generally last for at least 72-96 hours, so we do not anticipate this being an issue.

      (5) Page 9: "Lfng significantly decreased trans-activation of both receptors by Jag1 (>2.5-fold)". If there is no Jag1-Notch1 activation, how can Lfng decrease trans-activation?

      We added a note in the main text to clarify that while Jag1-Notch1 signaling is relatively low, it can still be detectably decreased.

      (6) Figure 4A legend: Please define what "2.5k ea senders and Rec" means. In the text, it says "To focus on cis-interactions alone, we then cultured receiver cells at low density, amid an excess of wildtype CHO-K1 cells" (page 14).

      This was clarified in the text.

      (7)  Page 14: "By contrast, Notch2 was cis-activated by both Dll1 and Dll4, to levels exceeding those produced by trans-activation by high-Dll1 senders (Figure 4B, lower left)." Where is the trans-activation data? 4B, lower right?

      We updated this reference in the main text.

      (8)  Page 16: "For Notch2-Dll1 and Notch2-Dll4, single cell reporter activities correlated with cis-ligand expression, regardless of whether cells were pre-induced at a high or low culture density (Figure 4D)." It appears that Notch2-Dll1 has lower Notch activation at sparse culture than confluent.

      We agree that the level signaling is lower in sparse compared to confluent on average. This is explained by the sensitivity of the Tet-OFF promoter to culture density (Figure 4—figure supplement 2). However, the key point of this experiment is the positive correlation, which is consistent with cis-activation, and inconsistent with the pre-generation of NEXT hypothesis diagrammed in Figure 4C, which would not be expected to produce such a correlation.

      (9a) For the creation of the C2C12-Nkd cells: Has genomic sequencing been done to confirm editing of Notch2 and Jag1 loci?

      We confirmed the knockdown but did not do genomic sequencing.

      (9b) The gel in Figure 7-Supplement 1C is not adequate for showing loss of Jag1. It should be repeated.

      In this case, we have only the single gel. We added a note in figure legend that no duplicate was performed.

      (10) Figure 7A: Which Fringes are expressed in C2C12 cells? You should provide a rationale for knocking down just Rfng.

      Figure 7—figure supplement 1A shows the levels of expression in C2C12. Note that Mfng is not highlighted because its levels were undetectable.

      (11) Figure 7-Supplement 1D: This is confusing. Notch2 levels are not reduced in the left panel, and Notch1 and Notch2 levels are not reduced in the right panel?

      C2C12-Nkd cells exhibit reduced levels of Notch1 and Notch3. This can be seen in Figure 7—figure supplement 1A. Panel D presents the results of additional siRNA knockdown, performed to prevent subsequent up-regulation of Notch1 and Notch3 during the assay. These knockdown results were variable, as shown. The Notch2 siRNA knockdown was not essential for these experiments, but performed despite very low levels of Notch2 to begin with. In the revision, we have added this note to the Methods.

      Reviewer #3 (Recommendations For The Authors):

      (1) The results section of the manuscript is very dense and difficult to follow, as are the figure legends.

      We appreciate the criticism, and regret that it is not easier to read in its current form.

      (2) The authors could emphasize areas of concordance with published results (where available) to place their artificial, engineered system into a better biological context. Are there any examples of studies in whole organisms where cis-activation plays a role?

      We are not aware of examples of cis-activation in whole organisms at this point.

      (3) How do the authors rationalize the different responses of Notch1 to cell-presented Jag1 as opposed to immobilized Jag1, where its signal strength is second in rank order on a molar basis?

      This comment was addressed above in response to the first recommendation from Reviewer #2.

      It is also difficult to understand Figure 2_—_figure Supplement 3B, in which it appears that Jag1 induces a Notch1 reporter response when LFng is knocked down (dLfng), and how those data relate to the inactive response to Jag1 shown in the main figures.

      The issue here is a difference of normalization. Figure 2A in the main text is normalized to the sender expression level, i.e. relative signaling strength. By contrast, Figure 2—figure supplement 4B (previously Figure 2—figure supplement 3B) shows absolute signaling activity, which can appear higher because it does not normalize for ligand expression. For Jag1-Notch1 signaling in particular, substantial signaling required very high levels of Jag1. We have added a new figure to demonstrate these two types of normalization (Figure 2—figure supplement 1A).

      See the Authr response image 1 below for a direct comparison of these two normalization modes using data from both Figure 2A and Figure 2—figure supplement 4B. Note how the Jag1-Notch1 signaling activities that are nonzero in the top plot go to zero in the bottom plot as a result of normalizing the values to ligand expression.

      Author response image 1.

      Comparison of normalization modes in Figure 2A and Figure 2—figure supplement 4B (formerly 3B).

      Normalized trans-activation signaling activities for different ligand-receptor combinations (with dLfng only), either with further normalization to ligand expression (bottom row) or without further normalization (top row). Normalized signaling activity is defined as reporter activity (mCitrine, A.U.) divided by cotranslational receptor expression (mTurq2, A.U.), normalized to the strongest biological replicate-averaged signaling activity across all ligand-receptor-Lfng combinations in this experiment. Saturated data points, defined here as those with normalized signaling activity over 0.75 in both dLfng and Lfng conditions, were excluded. Colors indicate the identity of the trans-ligand expressed by cocultured sender cells. Error bars denote bootstrapped 95% confidence intervals (Methods), in this case sampled from the number of biological replicates given in the legend—n1 (for Notch1) or n2 (for Notch2). See Methods and Figure 2A caption for more details. Note that the only difference between this figure and the new Figure 2—figure supplement 1A is that this figure additionally includes the Jag1-high data from Figure 2—figure supplement 4B.

      script>

    2. eLife Assessment

      This valuable study significantly enhances our understanding of how various ligands and receptors interact within the Notch signaling pathway. By developing novel cell-based assay systems, the authors systematically analyzed the effects of different ligand-receptor combinations on pathway activation. The convincing data reveal intriguing and unexpected differences and provide a foundation for interpreting Notch signalling in both normal and disease-related contexts.

    3. Reviewer #1 (Public review):

      Summary:

      The Notch signaling pathway plays important roles in many developmental and disease processes. Although well-studied there remain many puzzling aspects. One is the fact that as well as activating the receptor through a trans-activation, the transmembrane ligands can interact with receptors present in the same cell. These cis-interactions are usually inhibitory, but in some cases, as in the assays used here, they may also be activating. With a total of 6 ligands and 4 receptor there are potentially a wide array of possible outcomes when different combinations are co-expressed in vivo. Here the authors set out to make a systematic analysis of the qualitative and quantitative differences in the signaling output from different receptor ligand combinations, generating sets of "signaling" (ligand expressing) and "receiving" (receptor +/- ligand expressing cells).

      The readout of pathway activity is transcriptional, relying on the fusion of GAL4 in the intracellular part of the receptor. Positive ligand interactions result in proteolytic release of Gal4 that turns on expression of H2B-citrine. As an indicator of ligand and receptor expression levels, they are linked via TA to H2B mCherry and H2B mTurq expression respectively. The authors also manipulate expression of the glycosyltransferase Lunatic-Fringe (LFng) that modifies the EGF repeats in the extracellular domains impacting on their interactions. The testing of multiple ligand receptor combinations at varying expression levels is a tour de force, with over 50 stable cell lines generated, and yields valuable insights although as a whole, the results are quite complex.

      Strengths:

      Taking a reductionist approach to test systematically differences in the signaling strength, binding strength and cis-interactions from the different ligands in the context of the Notch1 and Notch 2 receptors (they justify well they choice of players to test via this approach) produces a baseline understanding of the different properties and leads to some unexpected and interesting findings. Notably:<br /> - Jag1 ligand expressing cells failed to activate Notch1 receptor although were capable of activating Notch2. Conversely, Jag2 cells elicited the strongest activation of both receptors. The results with Jag1 are surprising also because it exhibits some of the strongest binding to plate bound ligands. The failure to activate Notch1 has major functional significance and it will be important in future to understanding the mechanistic basis.<br /> - Jagged ligands have the strongest ciis-inhibitory effects and the receptors differ in their sensitivity to cis-inhibition by Dll ligands. These observations are in keeping with earlier in vivo and cell culture studies. More referencing of those would better place the work in context but it nicely supports and extends previous studies that were conducted in different ways.<br /> - Responses to most trans-activating ligands showed a degree of ultrasensitivity but this was not the case for cis-interactions where effects were more linear. This has implications for the way the two mechanisms operate and for how the signaling levels will be impacted by ligand expression levels.<br /> - Qualitatively similar results are obtained in a second cell line, suggesting they reflect fundamental properties of the ligands/receptors.

      Weaknesses:

      One weakness is that the methods used to quantify the expression of ligands and receptors rely on co-translation of tagged nuclear H2B proteins. These may not accurately capture surface levels/correctly modified transmembrane proteins. In general, the multiple conditions tested partly compensate for the concerns - for example as Jag1 cells do activate Notch2 even if they do not activate Notch1 some Jag1 must be getting to the surface. But even with Notch2, Jag1 activities are on the lower side, making it important to clarify, especially given the different outcomes with the plated ligands. Similarly, is the fact that all ligands "signalled strongest to Notch2" an inherent property or due to differences in surface levels Notch 2 compared to Notch1?.. The results would be considerably strengthened by calibration of the ligand/receptor levels (and ideally their sub-cellular localizations). Assessing the membrane protein levels would be relatively straightforward to perform on som eof the basic conditions because their ligand constructs contain Flag tags, making it plausible to relate surface protein to H2B, and there are antibodies available for Notch1 and Notch2

      In the revised version this has been addressed to some extent. A figure showing the relationship between co-translated mTurquiose and surface receptor expression for some clones (Figure 1-figure supplement 1B) goes some way to address the concerns that differences in Notch1 and Notch 2 could be due to the receptor levels. The data analyzing surface ligand levels is more equivocal, (a Western blot for biotinylated surface proteins), as the levels detected vary substantially between Dll1 and Dll4 (the latter barely detectable). But as a signal for surface expression of Jag1 was obtained this rules-out one concern that this ligand was failing to reach the surface. A discussion of the caveats of the approach is warranted, to make clear the limitations.

      Cis-activation as a mode of signaling has only emerged from these synthetic cell culture assays raising questions about its physiological relevance. Cis-activation is only seen at the higher ligand (Dll1, Dll4) levels, how physiological are the expression levels of the ligands/receptors in these assays? Is it likely that this would make a major contribution in vivo? Is it possible that the cells convert themselves into "signaling" and "receiving" sub-populations within the culture by post-translational mechanism. Again some analysis of the ligand/receptors in the cultures would be a valuable addition to show whether or not there are major heterogeneities.

      It is hard to appreciate how much cell to cell variability in the "output" there is. For example, low "outputs" could arise from fewer cells becoming activated or from all cells being activated less. As presented, only the latter is considered. That maybe already evident in their data, but not easy for the reader to distinguish from the way they are presented. For example, in many of the graphs, data have been processed through multiple steps of normalization. Some discussion/consideration this point is needed.

      Impact:<br /> Overall, cataloguing of the outcomes from the different ligand-receptor combinations, both in cis and trans, yields a valuable baseline for those investigating their functional roles in different contexts. There is still a long way to go before it will be possible to make a predictive model for outcomes based on expression levels, but this work gives an idea about the landscape and the complexities. This is especially important now that signaling relationships are frequently hypothesised based on single cell transcriptomic data. The results presented here demonstrate that the relationships are not straightforward when multiple players are involved.

    4. Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors extend their previous studies on trans-activation, cis-inhibition (PMID: 25255098) and cis-activation (PMID: 30628888) of the Notch pathway. Here they create a large number of cell lines using CHO-K1 and C2C12 cells expressing either Notch1-Gal4 or Notch2-Gal4 receptors which express a fluorescent protein upon receptor activation (receiver cells). For cis-inhibition and cis-activation assays, these cells were engineered to express one of the four canonical Notch ligands (Dll1, Dll4, Jag1, Jag2) under tetracycline control. Some of the receiver cells were also transfected with a Lunatic fringe (Lfng) plasmid to produce cells with a range of Lfng expression levels. Sender cells expressing all of the canonical ligands were also produced. Cells were mixed in a variety of co-culture assays to highlight trans-activation, cis-activation, and cis-inhibition. All four ligands were able to trans-activate Notch1 and Notch 2, although Jag1 transactivated Notch1 weakly. Lfng enhanced trans-activation of both Notch receptors by Dll1 and Dll4, and inhibited both receptors by Jag 1 and Jag2. Cis-expression of all four ligands were predominantly inhibitory, but Dll1 and Dll4 showed strong cis-activation of Notch2. Interestingly, cis-ligands preferentially inhibited trans-activation by the same ligand, with varying effects on other trans-ligands.

      Strengths:

      This represents the most comprehensive and rigorous analysis of the effects of canonical ligands on cis- and trans-activation, and cis-inhibition, of Notch1 and Notch2 in the presence or absence of Lfng so far. Studying cis-inhibition and cis-activation is difficult in vivo due to the presence of multiple Notch ligands and receptors (and Fringes) that often occur in single cells. The methods described here are a step towards generating cells expressing more complex arrays of ligands, receptors and Fringes to better mimic in vivo effects on Notch function.

      In addition, the fact that their transactivation results with most ligands on Notch1 and 2 in the presence or absence of Lfng were largely consistent with previous publications provides confidence that the author's assays are working properly.

      Weaknesses:

      In the original version, there was a major concern about quantifying the amount of Notch receptors and ligands on the cell surface (especially Jag1) based on total fluorescence. The authors have added data to demonstrate that most of the receptors and ligands are on the cell surface, allaying most of these concerns.

    1. eLife Assessment

      This important study uses in vitro and in vivo methods to identify HpARI proteins from H. polygyrus as modulators of the host immune system. The data from comprehensive approaches for investigating differential roles of HpARI proteins are convincing. This paper is relevant to those who investigate host-pathogen interactions at the systems and molecular levels.

    2. Reviewer #1 (Public Review):

      Colomb et al have further explored the mechanisms of action of a family of three immunodulatory proteins produced by the murine gastrointestinal nematode parasite Heligmosomoides polygyrus bakeri. The family of HpARI proteins binds to the alarmin interleukin 33 and depending on family members, exhibits differential activities, either suppressive or enhancing. The present work extends previous studies by this group showing the binding of DNA by members of this family through a complement control protein (CCP1) domain. Moreover, they identify two members of the family that bind via this domain in a non-specific manner to the extracellular matrix molecule heparan sulphate through a basic charged patch in CCP1. The authors thus propose that binding to DNA or heparan sulphate extends the suppressive action of these two parasite molecules, whereas the third family member does not bind and consequently has a shorter half-life and may function via diffusion.

    3. Reviewer #2 (Public Review):

      Colomb et al. investigated here the heparin-binding activity of the HpARI family proteins from H. polygyrus. HpARIs bind to IL-33, a pleiotropic cytokine, and modulate its activities. HpARI1/2 has suppressive functions, while HpARI3 can enhance the interaction between IL-33 and its receptor. This study builds upon their previous observation that HpARI2 binds DNA via its CCP1 domain. Here, the authors tested the CCP1 domain of HpARIs in binding heparan sulfate, an important component of the extracellular matrix, and found that 1/2 bind heparan, but 3 cannot, which is related to their half-lives in vivo.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Colomb et al have further explored the mechanisms of action of a family of three immunodulatory proteins produced by the murine gastrointestinal nematode parasite Heligmosomoides polygyrus bakeri. The family of HpARI proteins binds to the alarmin interleukin 33 and depending on family members, exhibits differential activities, either suppressive or enhancing. The present work extends previous studies by this group showing the binding of DNA by members of this family through a complement control protein (CCP1) domain. Moreover, they identify two members of the family that bind via this domain in a non-specific manner to the extracellular matrix molecule heparan sulphate through a basic charged patch in CCP1. The authors thus propose that binding to DNA or heparan sulphate extends the suppressive action of these two parasite molecules, whereas the third family member does not bind and consequently has a shorter half-life and may function via diffusion. 

      Strengths: 

      A strength of the work is the multifaceted approach to examining and testing their hypotheses, using a well-established and well-defined family of immunomodulatory molecules using multiple approaches including an in vivo setting. 

      Weaknesses: 

      There are a few weaknesses of the approach. Perhaps some discussion and speculation as to how these three family members might operate in concert during Heligmosomoides polygyrus bakeri infection would help place the biology of these molecules in context for the reader, e.g. when and where they are produced. 

      We agree that the roles of these proteins during infection requires further study and is not fully elucidated in infection here. We have added further discussion to the manuscript on their potential roles during infection (track changes manuscript, lines 277 – 283).

      Reviewer #2 (Public Review): 

      Summary: 

      Colomb et al. investigated here the heparin-binding activity of the HpARI family proteins from H. polygyrus. HpARIs bind to IL-33, a pleiotropic cytokine, and modulate its activities. HpARI1/2 has suppressive functions, while HpARI3 can enhance the interaction between IL-33 and its receptor. This study builds upon their previous observation that HpARI2 binds DNA via its CCP1 domain. Here, the authors tested the CCP1 domain of HpARIs in binding heparan sulfate, an important component of the extracellular matrix, and found that 1/2 bind heparan, but 3 cannot, which is related to their half-lives in vivo. 

      Strengths: 

      The authors use a comprehensive multidisciplinary approach to assess the binding and their effects in vivo, coupled with molecular modeling. 

      Weaknesses: 

      (1) Figure 1C should include Western. 

      We apologise for this oversight, and now include an uncropped western blot image as a Figure 1, Figure Supplement 1.

      (2) Figure 1E: Why does HpARI1 stop binding DNA at 50%? 

      It is currently unclear why HpARI1 does not bind to all DNA in the EMSA assay, however this was our repeated finding. With our revised findings we can now state definitively that HpARI1 has a lower affinity for HS compared to HpARI2, and in each of our assays (EMSA (Fig 1D-E), size exclusion chromatography (Fig 4A), HS-bead pull-down (Fig 4B), lung cell surface binding (Fig 4C) and ITC (Fig 4D)) HpARI1 always shows a weaker response compared to HpARI2. We hypothesise that HpARI1 binds more weakly to DNA/HS to allow it to diffuse further from the site of deposition, but we have yet to demonstrate this during infection. We add further discussion of this point (track changes manuscript, lines 262 – 266).

      (3) ITC binding experiment with HpARI1? Also, the ITC results from HpARI2 do not seem to saturate, thus it is difficult to really determine the affinity. 

      We have now included HpARI1-HS ITC, and re-ran the HpARI2 experiment to saturation (Fig 4D-E).

      (4) It would be helpful to add docking results from HpARI1. 

      We have now included HpARI1-HS docking, in Figure 5B.

      (5) Some conclusions are speculative and need to remain in the Discussion. e.g.: a) That HpARI3 may be able to diffuse farther 

      We have rewritten these points to remove the speculation on localisation from the abstract (lines 18-19) and introduction (line 78).

      b) That DNA/HS may trap HpARI1/2 at the infection site. 

      Likewise, these points have been rewritten in the abstract and introduction as above, and we have made it clearer that this is a model that we are proposing in the discussion (line 277-283).

      Reviewer #1 (Recommendations For The Authors): 

      The paper is well-written and the data well-presented. I have one small comment that the authors may like to consider. In the discussion, second paragraph, line 17, perhaps, "evolved" rather than "developed". 

      Thank you for this suggestion, we have made this change (line 248).

    1. eLife Assessment

      This is an important study on the damage-induced checkpoint maintenance and termination in budding yeast that provides novel and convincing evidence for a role of the spindle assembly checkpoint and mitotic exit network in halting the cell cycle after prolonged arrest in response to irreparable DNA double strand breaks (DSBs). The study identifies particular components from these checkpoints that are specifically required for the establishment and/or the maintenance of a cell cycle block triggered by such DSBs. The authors propose an interesting model for how these different checkpoints intersect and crosstalk for timely resumption of cell cycling even without repairing DNA damage that has been revised by addressing the bulk of the reviewers' comments to the first version of the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Zhou et al. analyze the factors controlling the activation and maintenance of a sustained cell cycle block in response to persistent DNA DSBs. By conditionally depleting components of the DDC using auxin-inducible degrons, the authors verified that some of them are only required for the activation (e.g., Dun1) or the maintenance (e.g., Chk1) of the DSB-dependent cell cycle arrest, while others such as Ddc2, Rad24, Rad9 or Rad53 are required for both processes. Notably, they further show that after a prolonged arrest (>24 h) in a strain carrying two DSBs, the DDC becomes dispensable and the mitotic block is then maintained by SAC proteins such as Mad1, Mad2 or the mitotic exit network (MEN) component Bub2.

      Strengths:

      The manuscript dissects the specific role of different components of the DDC and the SAC during the induction of a cell cycle arrest induced by DNA damage, as well as their contribution for the short-term and long-term maintenance of a DNA DSB-induced mitotic block. Overall, the experiments are well described and properly executed, and the data in the manuscript are clearly presented. The conclusions drawn are generally well supported by the experimental data. Their observations contribute to drawing a clearer picture of the relative contribution of these factors to the maintenance of genome stability in cells exposed to permanent DNA damage.

      Weaknesses:

      The main weakness of the study is that it is fundamentally based on the use of the auxin-inducible degron (AID) strategy to deplete proteins. This widely used method allows an efficient depletion of proteins in the cell. However, the drawback is that a tag is added to the protein, which can affect the functionality of the targeted protein or modify its capacity to interact with others. In fact, three of the proteins that are depleted using the AID systems are shown to be clearly hypomorphic, and hence their capacity to induce a strong checkpoint response might be compromised. A corroboration of at least some of the results using an alternative manner to eliminate the proteins would help to strengthen the conclusions of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript analyzes and attempts to discriminate genetic requirements for DNA damage-induced cell cycle checkpoint induction, maintenance, and adaptation in budding yeast bearing one or two unrepairable DNA double strand breaks using auxin-induced degradation (AID) of key DNA damage response (DDR) factors. The study paid particular attention to solving a puzzle regarding how yeasts bearing two unrepaired DNA breaks fail to engage in "adaptation" whereas those with a single unrepairable break eventually resume cell cycling after a prolonged (up to 12 h) G2 arrest.

      The key findings are: 1. Genetic requirements for the entry and the maintenance of DDC are separable. For instance, Dun1 is partially required for the entry but not the DDC maintenance whereas Chk1 is only required for maintenance. 2. Cells with two unrepairable breaks respond to DDR only up to a certain time (~12-15 h post damage) and beyond this point, depend on spindle assembly checkpoint (SAC) and mitotic exit network (MEN) to halt cell cycling. 3. The authors also propose an interesting concept that the location of DNA breaks and their distance to centromeres are important factors dictating the effect of SAC/MEN on the duration of cell cycle arrest after prolonged arrest (and cells become "deaf" to persistent arrest signals) and yeast's adaptability following DNA damage. The results provide most compelling evidence to date on the role of SAC/MEN in DNA damage response and cell cycle arrest albeit its impact might be limited to the handful of model systems due to the vastly different centromeric elements and far larger chromosome sizes in metazoan cells. The study albeit briefly discussed the basis of transitions from entry, maintenance, and adaptation ( ex. changes in centromeric architectures), it does not offer detailed explanations or a testable hypothesis to this topic.

      Overall, the conclusion of the study is well supported by the elegant set of genetic experimental data and employed multiple readouts on DDC factor depletion on checkpoint integrity and cell cycle status. Although the study simply measures Rad53 phosphorylation as the primary metric to assess checkpoint status, it successfully demonstrated how the signaling is modified through the different stages and that eventually cells become recalcitrant to DDC signaling after a prolonged arrest. The results are clear, and rigorously tested and carefully interpreted with good discussion on the possible limitations. The revision provided detailed responses to the reviewers' comments and addressed a few key concerns, one of which is universally raised by the reviewers on the full functionality of AID tagged DDC factors, by simply expressing excess Rad9-AID to restore more normal looking checkpoint response. It will be interesting if the excess expression of other DDC factors could overcome suboptimal checkpoints in cells after 24 h post damage.

    4. Reviewer #3 (Public review):

      Summary:

      The DNA damage checkpoint (DDC) inhibits the metaphase-anaphase transition to repair various types of DNA damage, including DNA double strand breaks (DSBs). One irreparable DSB can maintain the DDC for 12-15 hours in yeast, after which the cells resume the cell cycle. If there are two DSBs, the DDC is maintained for at least 24 hours. In this study, the authors take advantage of this tighter DDC to investigate whether the best-known proteins involved in establishing the DDC are also responsible for its long-term maintenance during irreparable DSBs. They do this by cleverly degrading such proteins after DSB formation. They show that most, but not all, DDC proteins maintain the cell cycle block. Interestingly, DDC proteins become dispensable after 15 hours and the block is then maintained by spindle assembly checkpoint (SAC) proteins.

      Strengths:

      The authors have engineered a tight yeast system to study DDC shutdown after irreparable DSBs and used it to address whether checkpoint proteins (DDC and SAC) contribute to the long-term maintenance of DSB-mediated G2/M block. The different roles of Ddc2, Chk1 and Dun1 are interesting, while the fact that SAC overtakes DDC after 15 hours is intriguing and highlights how DSBs near and far from centromeres can have a profound impact on cell adaptation to DSBs. In their revision, the authors have now improved the Rad9-AID methodology to place Rad9 in the context of DDC adaptation, as well as widening the association between adaptation and proximity to centromeres.

      Weaknesses:

      Some of the results they present essentially confirm their own previous findings, albeit with a tighter strain design for long-term arrest. Conclusions about the maintenance of G2/M in several mutant combinations could have been strengthened by adding simple microscopy experiments with DAPI staining. No clear mechanism for how depletion of Bub2, but not Bfa1, can relieve the G2/M (metaphase) block is given.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      To hopefully contribute to more strongly support the conclusions drawn by the authors, I am including a series of concerns regarding the manuscript, as well as some suggestions that could be useful to address these issues:

      (1) The main results of this study derive from the use of auxin-inducible degron (AID)-tagged proteins. Despite the great advantages of the AID strategy to conditionally deplete proteins, the AID tag can affect the normal function of a protein. In fact, some of the AID-labeled DDC components generated in this work are shown to be hypomorphic. Hence, the manuscript would have benefited from the additional confirmation of some of the observations using a different way to eliminate the proteins (e.g., temperature-sensitive mutants).

      Most ts mutants are also hypomorphic; hence we don’t see there is much advantage to their use. The addition of the AID to these proteins alone does not interfere with the ability to sustain checkpoint arrest as demonstrated in Figure S1. Instead we found that by overexpressing Rad9-AID we could demonstrate that inactivating Rad9 after 15 h behaved the same way as the inactivation of Ddc2, significantly strengthening our finding that the DDC checkpoint becomes dispensable while the SAC takes over. 

      (2) In cells depleted of Rad53-AID, the deletion of CHK1 stimulates an earlier release from a mitotic arrest induced by two DSBs (Figures 2D and 3C). Likewise, the authors claim that a faster escape from the cell cycle block can also be observed when upstream factors such as Ddc2, Rad9, or Rad24 are depleted in the absence of CHK1 (Figures 2A-C and Figures 3D-F). However, this earlier release from the cell cycle arrest, if at all, is only slightly noticeable in a Rad9-AID background (Figures 2B and 3E). In this sense, it is also worth pointing out that Rad9-AID chk1Δ (Figure 3E) and Rad24-AID chk1Δ (Figure 3F) cells were only evaluated up to 7 h, while in all other instances, cells were followed for 9 h, which hinders a fair assessment of the differences in the release from the cell cycle arrest.

      As noted above, we have now been able to examine Rad9 over the long-time frame.

      (3) Although only 25% of the cells depleted for Dun1 remained in G2/M arrest 7 h following the induction of two DSBs, it is shocking that Rad53 was nonetheless still phosphorylated after the cells had escaped the cell cycle blockage (Figure 4A).

      This persistence of Rad53 phosphorylation is also seen with the inactivation of Mad2, allowing escape in spite of continued Rad53 phosphorylation.

      (4) Generation of Rad9-AID2 and Rad24-AID2 strains did not fully restore the function of these proteins, since most cells had adapted 24 h after induction of two DSBs (Figure S1C). Nonetheless, Rad9-AID2 and Rad24-AID2 are still likely more stable than their AID counterparts, and hence the authors could have instead used the AID2 proteins for the experiments in Figure 2 to better evaluate the role of Rad9 and Rad24 in the maintenance of the DDC-dependent arrest.

      We note again that we have found a way to study Rad9 up to 24 h. 

      (5) Deletion of BFA1 has been shown to promote the escape from a cell cycle arrest triggered by telomere uncapping (Wang et al. 2000, Hu et al. 2001, Valerio-Santiago et al. 2013). Likewise, while cells carrying the cdc5-T238A allele cannot adapt to a checkpoint arrest induced by one irreparable DSB, BFA1 deletion rescues the adaptation defect of this mutant CDC5 allele (Rawal et al., 2016). The authors show how, using AID-degrons of Bfa1 and Bub2, that only Bub2, but not Bfa1, is required to maintain a prolonged cell cycle arrest after the induction of two DSBs. To reinforce this point, and as shown for mad2Δ cells (Figure S6A), the authors could perform a complete time course using both the Bfa1-AID and a bfa1Δ mutant to demonstrate that they do indeed show the same behavior in terms of the adaptation to a two DSB-induced cell cycle arrest.

      We thank the reviewer for noting these other instances where bfa1D promoted an escape from arrest. We tested a 2-DSB bfa1 deletion, data has been added to Figure S9E-F. We did not observe a difference in the percentage of cells escaping arrest between the 2-DSB bfa1 deletion and the 2-DSB BFA1-AID strains.

      (6) Bypass or adaptation of a checkpoint-induced cell cycle arrest in S. cerevisiae often leads to cells entering a new cell cycle without doing cytokinesis and, hence, to the accumulation of rebudded cells. However, the experiments shown in the manuscript only account for G1 or budded cells with either one or two nuclei. Do any of the mutants show cytokinesis problems and subsequent rebudding of the cells? If so, this should have been also noted and quantified in the corresponding assays.

      In the cases we have studied we have not seen instances where the cells re-bud without completing mitosis (at least as assessed by the formation of budded cells with two distinct DAPI staining masses). In the morphological assays we have done, we score the continuation of the cell cycle by the appearance of multiple buds, G1, and small budded cells. In our adaptation assays when cells escaped G2/M arrest they formed microcolonies indicating no short-term deficiency in cell division.

      (7) The location of the DSB relative to the centromere of a chromosome seems to be a factor that determines the capacity of the SAC to sustain a prolonged cell cycle arrest. The authors discuss the possibility that the DSB could somehow affect the structure of the kinetochore. Did they evaluate whether Mad1 or Mad2 were more actively recruited to kinetochores in those strains that more strongly trigger the SAC after induction of the DSBs?

      We have not attempted to follow Mad1/2 recruitment. ChIP-seq could be used to monitor Mad1/2 localization at the 16 centromeres in response to DSBs and the spread of g-H2AX across the centromere. Our previous data showed that g-H2AX could spread across the centromere region and could create a change that would be detected by Mad1/2.  This change does not, however, affect the mitotic behavior of a strain in which the H2A genes have been modified to the possibly phosphomimetic H2A-S129E allele.

      (8) The authors could speculate in the discussion about the reasons that could explain why the DDC is required for the maintenance of checkpoint arrest at early stages but then becomes dispensable for the preservation of a prolonged cell DNA DSB-induced cycle arrest, which is instead sustained at later stages by the SAC.

      Our suggestion is that cells would have adapted, but modification of the centromere region engages SAC.

      Finally, some minor issues are:

      (1) The lines in the graphs that display the results from adaptation assays (e.g., Figures 1B and 1E) or cell and nuclear morphology (e.g., Figures 1D and 1G) are too thick. This makes it sometimes difficult to distinguish the actual percentages of cells in each category, particularly in the experiments monitoring nuclear division.

      Fixed

      (2) While both the adaptation assay and the analysis of nuclear division in Figures 1E and 1G, respectively, show a complete DDC-dependent arrest at 4h, the Western blot in Figure 1F suggests that Rad53 is not phosphorylated at that time point. Do these figures represent independent experiments? Ideally, the analysis of cell budding and nuclear division, which is performed in liquid cultures, and the Western blot displaying Rad53 phosphorylation should correspond to the same experiment.

      Cell budding in liquid cultures and adaptation assays were performed in triplicate with 3 biological replicates and the collective results are shown in each graph showing the percentage of large-budded cells. Western blot samples were collected in each liquid culture experiment. The western blot in 1G is a representative western blot.

      (3) It is somewhat confusing that the blots for the proteins are not displayed in the same order in Figures 2A (Rad53 at the top) and 2B or 2C (Rad53 in the middle).

      Fixed.  We place Rad53 – the relevant protein - at the top.

      Reviewer #2 (Recommendations For The Authors):

      (1) Yeast with the two breaks responds to DNA damage checkpoint (DDC) until sometimes 4-15 h post DNA damage. Since the auxin-induced degradation does not completely deplete all the tagged proteins in cells, the results should be more carefully considered and not to interpret if the checkpoint entry or maintenance depends on each target protein's ability to induce Rad53 phosphorylation. It should be theoretically possible if checkpoint maintenance requires only a modest amount of checkpoint factors especially because the experiments involve the induction of one or two DSBs. The low levels of DDC factors may be insufficient for Rad53 activation but could still be effective for cell cycle arrest. Indeed, the Haber group showed that the mating type switch did not induce Rad53 phosphorylation but still invoked detectable DNA damage response. To test such possibilities, the authors might consider employing yet another marker for DDC such as H2A or Chk1 phosphorylation besides Rad53 autophosphorylation. Alternatively, the authors might check if auxin-induced depletion also disrupts break-induced foci formation for checkpoint maintenance or their enrichment at DNA breaks using ChIP assays at various points post-damage.

      DAPI staining of Ddc2-AID cells show that when IAA is added 4 h after DSB induction (Figure S3A), cells escape G2/M arrest as evidenced by the increase in large-budded cells with 2 DAPI signals, small budded cells, and G1 cells. Overexpression of Ddc2 can sustain the checkpoint past 24 h, but without SAC proteins like Mad2 they will eventually adapt (Figure S6B).

      That Rad9-AID or Rad24-AID in the absence of added auxin (but in the presence of TIR1) is unable to sustain arrest suggests to us that low levels of Rad9 or Rad24 are not sufficient to maintain arrest.  As the reviewer notes, normal MAT switching doesn’t cause Rad53 phosphorylation or arrest, though early damage-induced events such as H2A phosphorylation do occur.  But our point is that Rad9 or Ddc2 is needed to maintain arrest only up to a certain point, after which they become superfluous and a different checkpoint arrest is imposed. At that point apparently a low level of these proteins plays no obvious role.

      (2) It is interesting that DDC no longer responds to the damage signaling after 15 h of DSB-induced prolonged checkpoint arrest after two DNA double-strand breaks. Is this also applicable to other adaptation mutants? The results might improve the broad impact of the current conclusions. It is also possible that the transition from DDC to SPC depends on simply the changes in signaling or in part due to the molecular changes in the status of DNA breaks or its flanking regions. Indeed, the proposed model suggests that the spreading of H2A phosphorylation to centromeric regions induces SAC and thus mitotic arrest. The authors could measure H2A phosphorylation near the centromere using ChIP assays at various intervals post-DNA damage. It is particularly interesting if depletion of Ddc2 at 15 h post DNA damage does not alter the level of H2A phosphorylation at or near centromere.

      Our previous data have suggested that the involvement of the SAC in prolonging DSB-induced arrest involved post-translational modification of centromeric chromatin such as the Mec1- and Tel1-dependent phosphorylation of the histone H2A (Dotiwala). In budding yeast there is also a similar DSB-induced modification of histone H2B (Lee et al.). To ask if there is an intrinsic activation of the SAC if the regions around centromeres were modified by checkpoint kinase phosphorylation, we examined cell cycle progression in strains in which histone H2A or histone H2B was mutated to their putative phosphomimetic forms (H2A-S129E and H2B-T129E).  As shown in Figure S11, there was no effect on the growth rate of these strains, or of the double mutant, suggesting that cells did not experience a delay in entering mitosis because of these modifications. We note that although histone H2A-S129E is recognized by an antibody specific for the phosphorylation of histone H2A-S129, the mutation to S129E may not be fully phosphomimetic. 

      (3) It is puzzling why Rad9-AID or Rad24-AID are proficient for DDC establishment but cannot sustain permanent arrest in the two break cells. It appears Rad53 phosphorylation for DDC is weaker in cells expressing Rad9-AID or Rad24-AID according to Fig.2B and C even though their protein level before IAA treatment is still robust. This might also explain why the results of depleting Rad53 and Rad9 are very different. It also raises concern if the effect of Rad24 depletion on checkpoint maintenance is in part due to the weaker checkpoint establishment. It might be necessary to use the AID2 system to redo Rad24 depletion to exclude such a possibility.

      We believe that the AID mutants are very sensitive to the low level of IAA present in yeast.  The instability of the protein is entirely dependent on the TIR1 SCF factor, so the proteins themselves are not intrinsically defective; they are just subject to degradation.  Overexpressing Rad9 allowed us to evaluate its role at late time points. 

      (4) It is intriguing that the switch from DDC to SAC might take place at around 12 h when yeasts with a single unrepairable break ignore DDC and resume cell cycling (so-called "adaptation"). Since 4h and 15h are far apart and the transition point from DDC to SAC likely takes place between these two points, it will be very helpful to analyze and compare cell cycle exit after 24 h by treating IAA at multiple points between 4-15h.

      When we add IAA to Mad2-AID and Mad1-AID 4 h after DSB induction, cells remain arrested for up to 12 h after DSB induction. At 15 h cells begin to exit checkpoint arrest indicating that the handoff of checkpoint arrest must occur between 12 to 15 h after DSB induction. If we degraded DNA damage checkpoint proteins at any point before Mad2, Mad1, and Bub2 begin to contribute to checkpoint arrest, then arrested cells will likely adapt in a similar manner to when IAA was added 4 h after DSB induction.

      (5) Some of the Western blot quality is poor. For instance, in Figure 6C, Mad1-AID level after IAA addition is not compelling especially because the TIR level (the loading control) is also very low.

      In Figure 6C, while the relative levels of TIR1 are similar in the IAA treated and untreated samples, there is no detectable amount of Mad1-AID in the IAA treated samples indicating that Mad1-AID was successful degraded with the AID system.

      (6) Fig. 8 is complex. It might be helpful to define the different types of arrows in the figure. The legend also has a spelling error, Rad23 should be Rad24.

      We’ve defined what each arrow means in the legend and corrected the spelling error in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      Much of the manuscript states that two unrepairable DSBs lead to a long and severe G2/M arrest. Two main cytological approaches are used to make this statement: bud size and number on plates after micromanipulation (microcolony assay), and cell and nuclear morphology in liquid cultures. While the latter gives a clear pattern that can be assigned to a G2/M block as expected by DDC, i.e. metaphase-like mononucleated cells with large buds, the former can only tell whether cells eventually reach a second S phase (large budded cells on the plate can be in a proper G2/M arrest, but can also be in an anaphase block or even in the ensuing G1). The authors always performed the microcolony assay, but there are several cases where the much more informative budding/DAPI assay is missing. These include Dun1-aid and others, but more importantly chk1D and its combinations with DDC proteins. Incidentally, for the microcolony assay, it is more accurate to label the y-axis of the corresponding graphs (and in the figure legends and main text) with something like "large budded cells"; "G2/M arrested cells" is misleading.

      Figures have been updated to more accurately reflect what we are measuring.

      The results obtained with the Bfa1/Bub2 partner are intriguing. These two proteins form a complex whose canonical function is to prevent exit from mitosis until the spindle is properly aligned, acting in a distinct subpathway within the SAC that blocks MEN rather than anaphase onset. The data presented by the authors suggest that, on the one hand, both SAC subpathways work together to block the cell cycle. However, why does canonical SAC (Mad1/Mad2) inactivation not lead to a transition from G2/M (metaphase-like) arrested cells to anaphase-like arrest maintained by Bfa1-Bub2? Since Bfa1-Bub2 is a target of DDC, is it possible that DDC knockdown also inactivates this checkpoint, allowing adaptation? On the other hand, can the authors provide more data to confirm and strengthen their claim of a Bfa1-independent Bub2 role in prolonged arrest? Perhaps long-term protein localization and PTM changes. Bub2-independent roles for Bfa1 have been reported, but not vice versa, to the best of my knowledge.

      In the mitotic exit network Bfa1/Bub2 prime activation of the pathway by bringing Tem1 to spindle pole bodies. Phosphorylation of Bfa1 causes Tem1 to be released and phosphorylate Cdc5 to trigger exit by MEN. It has been shown that DNA damage, in a cdc13-1 ts mutant, phosphorylates Bfa1 in a Rad53 and Dun1 dependent manner. This phosphorylation of Bfa1 could release Tem1 and prime cells to exit checkpoint arrest when cells pass through anaphase. Looking at Tem1 localization to spindle pole bodies and interactions with Bfa1/Bub2 in response to DNA damage might give insight into why cells don’t experience an anaphase-like arrest when they are released by either deactivation of the DNA damage checkpoint or SAC.

      We have previously shown that a deletion of bub2 in a 1-DSB background shortens DSB-induced checkpoint arrest. Deletion of bfa1 in a 2-DSB background showed ~80-70% of cells stuck in a large-budded state as measured through an adaptation assay tracking the morphology of G1 cells on a YP-Gal plate and DAPI staining. Deletion or degradation of bfa1 might not release cells from arrest because the Mad2/Mad1 prevent cells from transitioning into anaphase. Our DAPI data for Bub2-AID shows an increase in cells with 2 DAPI signals (transition into anaphase) and small budded cells indicating that degradation of Bub2 is releasing cells into anaphase and allowing cells to complete mitosis.

      Further suggestions:

      It would be richer if authors could provide more than one experimental replicate in some panels (e.g., S1A,B; S4A; and S6B).

      S1C confirms that Rad9-AID and Rad24-AID will adapt by 24 h even with the point mutant TIR1(F74G) which has lower basal degradation than TIR1. S4A has been updated with additional experimental replicates. The 48 h timepoint after DSB induction was to show the importance of Mad2 even when Ddc2 is overexpressed.

      Figure 1: Rearrange figure panels when they are first mentioned in the text. For example, it makes more sense to have the plate adaptation assay as panel B for both 1-DSB and 2-DSB strains, budding plus DAPI as panel C, and Rad53 as panel D.

      These figures have been rearranged in the order that they are mentioned in the paper.

      Figure 5: Correct Ph-5-IAA in the Rad53 WBs (it should be 5-Ph-IAA).

      This has been corrected.

      Figure S2: The straight line under the "+IAA" text box is misleading. I think it should also cover the "-2" time point, right? Also, check the figure legend. Information is missing and does not correspond to the figure layout.

      This has been corrected.

      Figure S3: Perhaps "Cell cycle profile as determined by budding and DAPI staining" is a better and more accurate legend title.

      The legend title has been updated to “Cell cycle profile as determined by budding and DAPI staining in Ddc2-AID and Rad53-AID mutants ± IAA 4 h after galactose.”

      Figure S5: Detection of both Rad53 and Ddc2 in the same blot could lead to misinterpretation as hyperphosphorylated Rad53 appears to coincide with Ddc2 migration.

      Figure S5A-B are representative western blots where Rad53 was probed to show activation of the DNA damage checkpoint by Rad53 phosphorylation. When measuring the relative abundance of Ddc2 we did not probe all blots for Rad53.

      Table S1: Include the post-hoc test used for comparisons after ANOVA.

      A Sidak post-hoc test was used in PRISM for the one-way ANOVA test. PRISM listed the Sidak post-hoc test as the recommended test to correct for multiple comparisons. A column has been added to S. Table 1 to show which post-hoc test was used.

      Page 10, line 4: The putative additive effect of chk1 knockout with Dun1 depletion should also be compared to chk1 alone (in Figure 3A).

      We address the additive effect of chk1 knockout with Dun1-AID depletion in a later section on Page 11, line 6. Since we had not explored possible effects from downstream targets of Rad53 for prolonging checkpoint arrest when Rad53 was depleted, we did not mention the effect of the chk1 knockout on Dun1 depletion.

      Page 14, second paragraph, line 4: "Figure 6A-D", is it not?

      Figure S6A is measuring checkpoint arrest in a deletion of mad2 in a 2-DSB strain. Figure 6A-D shows how degradation of Mad2-AID and Mad1-AID after the handoff of arrest causes cells to exit the checkpoint in a Rad53 independent manner.

    1. eLife Assessment

      This important study examines the role of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Solid experimental results, including genome editing and chromatin profiling methods (ChIP-seq and Hi-C), provide a comprehensive picture of the critical role MORC plays in shaping parasite chromatin. Depletion of MORC results in a lethal collapse of heterochromatin and parasite death, nominating the factor as a new target of antimalarial therapies.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths:

      The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      Weaknesses:

      Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation. Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths: The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      We greatly appreciate the overall positive feedback and cognisense of our efforts. Our application of CRISPR/Cas9 genome editing tools coupled with complementary cellular and functional approaches shed light on the importance of _Pf_MORC in maintaining chromatin structural integrity in the parasite and highlights this protein as a promising target for novel therapeutic intervention.

      Weaknesses: Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation.

      Our conclusions were made on the basis of multiple, unbiased molecular and functional genomic assays that point to the relevance of the _Pf_MORC protein in maintaining the parasite’s chromatin landscape. Although we do not claim to have precise evidence on the step-by-step pathway to which _Pf_MORC is involved, we bring forth first-hand evidence of its role in heterochromatin binding, gene-regulation and its association with major TFs as well as chromatin remodeling and modifying enzymes. We however agree with the comment regarding the lack of direct effects of _Pf_MORC KD and have since provided additional evidence by performing ChIP-seq experiments against H3K9me3 and H3K9ac during KD. Our new results are presented in Fig. 5. We showed that the level of H3K9me3 decreased significantly during _Pf_MORC KD.

      Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      Validation of the identified interacting partners is indeed critical and essential to understanding their role in directing MORC to its targets. Our protein pull down experiments have been done using several biological replicates. Several of the interacting partners have also been identified and published by other labs and collaborators. To confirm our results, we completed a direct comparison of our work with previous published work. Results have now been incorporated into the revised manuscript to confirm the identified interacting partners and the accuracy of the data we obtained in our experiment. Molecular validation of novel proteins identified in our protein pull down requires generation of tagged lines and may take a few more years but will be submitted for publication in a follow up manuscript.

      Reviewer #2 (Public Review):

      Summary: This paper, titled "Regulation of Chromatin Accessibility and Transcriptional Repression by PfMORC Protein in Plasmodium falciparum," delves into the PfMORC protein's role during the intra-erythrocytic cycle of the malaria parasite, P. falciparum. Le Roch et al. examined PfMORC's interactions with proteins, its genomic distribution in different parasite life stages (rings, trophozoites, schizonts), and the transcriptome's response to PfMORC depletion. They conducted a chromatin conformation capture on PfMORC-depleted parasites and observed significant alterations. Furthermore, they demonstrated that PfMORC depletion is lethal to the parasite.

      Strengths: This study significantly advances our understanding of PfMORC's role in establishing heterochromatin. The direct consequences of the PfMORC depletion are addressed using chromatin conformation capture.

      We appreciate the Reviewer’s comments and reflection on the importance of our work.

      Weaknesses: The study only partially addressed the direct effects of PfMORC depletion on other heterochromatin markers.

      Here again, we agree with the reviewer’s comment and have performed additional experiments to delve deeper into the multifaceted roles of _Pf_MORC. We have performed additional ChIP-sequencing analysis on _Pf_MORC depleted conditions focusing on known heterochromatin and euchromatin markers H3K9me3 and H3K9ac respectively. We hope our new results presented in figure 5 will shed light on the more direct implications of _Pf_MORC on heterochromatin and gene silencing.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • Why does MORC, which was used in the pull-down, seem to be only minimally enriched in the volcano plot, while a series of proteins (marked in red) and AP2 (highlighted in green) are enriched with log2 fold changes exceeding 15?

      We apologize for the confusion. MORC was detected with the highest number of peptides (97 and 113) and spectra (1041 and 1177) confirming the efficiency of our pull-down. However, considering the relatively large size of the MORC protein (295kDa) and it weak detection in the control (5 and 7 peptides; 16 and 43 spectra), the Log2 FoldChange and Z-statistic after normalization are minimal compared to smaller proteins that were not identified in the control samples.

      Additionally, can you explain why these proteins appear to be enriched at the same fold? 

      We can postulate that these proteins form a complex with a ratio of 1:1. Two of these three proteins are described to interact with MORC in several publications, supporting a strong interaction between them.

      Variations in the interactome could result from the washing buffer's stringency.

      We agree that the IP conditions could affect the detection of the interactome as well as the parasite stage used. As indicated below, the overlap with previous publications and the presence of AP2 TFs and chromatin remodelers strongly support our results.

      It would be highly appropriate for the authors, similar to the co-submitted article (Maneesh Kumar Singh et al.), to present their mass spectrometry data in relation to previous purifications in Plasmodium (Bryant et al. 2020; Subudhi et al. 2023; Hillier et al. 2019) and also in Toxoplasma (Farhat et al. 2020). It would be good if authors could also put their results into perspective in light of the following pre-prints:

      We agree with the reviewer’s comment. In this revised manuscript, we compared our IP-MS data to previous published manuscripts. Key proteins including the AP2-P (PF3D7_1107800) and HDAC1 were indeed identified in several experiments validating our initial findings of the formation of large complexes with MORC. However, it’s important to highlight that the MORC protein was not used as the bait protein in previously published papers, and thus some discrepancies can be observed.

      Given the tendency of MORCs to form multiple complexes with AP2 factors, have you explored whether specific AP2s are conserved between Plasmodium and Toxoplasma, within the phylum?

      P. falciparum encodes for 27 putative AP2s, while T. gondii has over 60 AP2s, making direct comparison challenging. Some Plasmodium AP2s have multiple counterparts in T. gondii and typically conservation is limited to the AP2 binding domains. Attempts to identify sequence homology among AP2s and the regions of conservation have been performed (PMID: 30959972, PMID: 30959972, PMID: 16040597). Although this information would provide interesting insight, we believe exploring this topic at this time would diverge from our primary objectives. It would be more appropriate to address this in future studies.

      Could this conservation be identified either through phylogenetic means or by using tools such as AlphaFold, especially considering not just the AP2 domains but also any existing ACDC domains?

      Although this may reveal important information regarding the association between MORC proteins and AP2 domains, we believe investigating the conservation between AP2 across apicomplexan parasites may prove too challenging and is beyond the scope of this work.

      Most of the genes are depicted without their immediate surroundings (Fig. 2d and Fig S2c, d). For instance, the promoter region of AP2g is not shown (Fig. 2d). It is therefore very challenging to determine the presence or absence of MORC upstream or downstream; considering that this factor, which can create DNA loop protrusions, might bind at a distance from the genes in question.

      All gene coverage plots, including AP2-G, show 500 bp up- and downstream of the displayed gene. We have modified our figure legends to make sure that this information is provided.

      Upon examining Figure S3, it is evident that the authors have indicated a decline in PfMORC expression, represented as percentages over two unique time frames. The methodology behind this quantification remains ambiguous. It's essential for the authors to specify whether normalization was done using a loading control. As a benchmark, Singh et al. (2021) in their Figure 4 transparently used GAPDH as a loading control and included an untreated sample in their western blot analysis.

      We thank the Reviewer for bringing this to our attention. Our initial quantification was performed using ImageJ. To address the Reviewer’s comment, we have reperformed the experiment. Our quantitative analysis was performed through Bio-Rad ImageLab software using aldolase expression as a loading control (50% of the MORC loading). This information has now been incorporated into the supplementary figures (Figure S3).

      There's a striking observation that, despite significant degradation of PfMORC (as depicted in Figures S1 and S3), only the upper band in the western blot diminishes. This inconsistency needs addressing, as it can raise questions about the interpretation of the results.

      We agree with the reviewer's comment. We experienced some challenges upon performing a Western Blot on such a large protein (295kDa). Our initial attempts required long exposure that may have highlighted non-specific signals of smaller proteins. To address the reviewer’s comment, we have performed the experiment one more time and made necessary changes to our WB protocol. Our new result better reflects the expected down regulation of _Pf_MORC. These changes have been incorporated to our manuscript and Fig S3.

      Recommendations for improving the writing and presentation.

      MORC KD quantification and consistency with previous findings (Figure S3): When comparing their results with those from another study (Singh et al. 2021), it's critical to ensure that the experimental conditions, especially the methodology for KD and the quantification of protein levels, are similar. If not, a direct comparison might be misleading.

      We greatly appreciate the suggestions and have made efforts to redesign the MORC KD quantifications according to the reviewer’s recommendations.

      While the manuscript mentions the level of KD, it does not delve into the functional consequences of such a decrease in protein levels. It would be of interest to understand how this level of KD affects the parasite's biology, especially in the context of the paper's main findings.

      We have addressed this question by looking at the changes in chromatin structure in WT versus KD parasites upon atc removal. We have also validated this initial result by designing an additional ChIP-seq experiment against histone marks in WT versus KD parasites upon atc removal. Our findings showed a significant downregulation in H3K9me coverage in heterochromatin regions, specifically in genes associated with antigenic variation and invasion genes. These findings suggest that PfMORC regulates at least partially gene silencing and chromatin arrangements. The manuscript has been edited accordingly. 

      Concluding page 5, the authors present an interpretation of their findings that suggests a multi-faceted role of PfMORC in regulating stage-specific gene families, particularly the gametocyte-related genes and merozoite surface proteins. While the narrative they present is intriguing, several concerns arise:

      Over-reliance on correlation: The authors draw a direct line between the levels of PfMORC binding and the function of these genes in the parasite's life cycle. However, a mere correlation between PfMORC binding and stage-specific gene activity does not necessarily imply causation. They would need to provide experimental evidence showing that manipulation of PfMORC levels directly impacts these genes' expression.

      We agree with the reviewer's comment. We have however partially addressed this issue by comparing our ChIP-seq, RNA-seq and Hi-C experiments. We concluded that several of the transcriptional changes observed were due to an indirect effect of PfMORC KD and were most likely induced by a cell cycle arrest and partial collapse of the chromatin structure. The collapse of the heterochromatin structure was validated using our Hi-C experiment. To further address additional concerns the review’s had, we have included additional ChIP-seq experiments targeting histone marks to confirm our initial hypothesis. Result of this additional experiment has been incorporated in the revised version of the manuscript.

      Ambiguity surrounding "low levels" and "high levels": The terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. Without quantification or a clear benchmark, these descriptions remain vague.

      We agree with the reviewers that the terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. We have however quantified our change in DNA binding using normalized reads (RPKM). In trophozoite and schizont stages, most of the genes contain a mean of <0.5 RPKM normalized reads per nucleotide of Pf_MORC binding within their promoter region, whereas antigenic gene families such as _var and rifin contain ~1.5 and 0.5 normalized reads, respectively (Fig. 2b). Similar results are also obtained for the gametocyte-specific transcription factor AP2-G  that contains levels of Pf_MORC binding similar to what is observed in _var genes (Fig. 2c and S2c, d).

      Shift in Binding Sites: The observed minor switch in PfMORC binding sites from gene bodies to intergenic and promoter regions is mentioned, but without context on how these shifts impact gene expression or any comparative analysis with other proteins showing similar shifts. The claim that this shift implicates PfMORC as an "insulator" is a leap without direct evidence.

      We apologize for the confusion. We  have compared our ChIP-seq with RNA seq results at different time points of the cell cycle and demonstrated that the shift observed has an effect in gene expression. We have edit the manuscript to clarify these results.

      Overextension of PfMORC's Role: The authors suggest that PfMORC moves to the regulatory regions around the TSS to guide RNA Polymerase and transcription factors. This is a substantial claim and would require additional experiments to validate. Simply observing binding in a region is insufficient to assign a specific functional role, especially one as critical as guiding RNA Polymerase. Historically, the MORC family has been primarily linked with gene silencing across Apicomplexan, plants, and metazoans. On page 7, the authors noted a minimal overlap between the ChIP-seq and RNA-seq signals (Fig. 4e). They also acknowledged that the pronounced gene expression shifts at schizont stages result from a combination of direct and indirect impacts of PfMORC degradation, which could cause cell cycle arrest and potential heterochromatin disintegration, rather than just decreased PfMORC binding. Therefore, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      We agree with the reviewer's comment and have edited the manuscript accordingly.  

      DISCUSSION:

      The authors concluded that "Using a combination of ChIP-seq, protein knock down, RNA-seq and Hi-C experiments, we have demonstrated that the MORC protein is essential for the tight regulation of gene expression through chromatin compaction, preventing access to gene promoters from TFs and the general transcriptional machinery in a stage specific manner."

      Again, the assertion that MORC protein is essential for tight regulation of gene expression, based purely on correlational data (e.g., ChIP-seq showing binding doesn't prove functionality), assumes causality which might not be fully substantiated. The phrase "preventing access to gene promoters from TFs and the general transcriptional machinery in a stage-specific manner" needs also validation. Asserting that MORC is essential for this function might oversimplify the process and overlook other critical contributors.

      We agree with the reviewer’s comments and the conclusion has since been edited accordingly.

      The discussion is quite poor. It would be pertinent to put MORC in perspective within the broader picture of regulatory mechanisms of chromatin state at telomeres and var genes. For instance, how do SIR2 and HDAC1 (associated with MORC) divide the task of deacetylation? Or the contribution of HP1 and other non-coding RNAs.

      We agree with the reviewer’s suggestion. However, in order to put MORC in perspective within a broader picture, we would need to measure changes in localization of several molecular components regulating heterochromatin in WT versus KD condition. This will require access to several molecular tools and specific antibodies that we do not currently have. We have addressed these issues in our discussion.  

      Minor corrections to the text and figures.

      Figure 1d: Could you provide the ID for each AP2 directly on the volcano plot? While some IDs are referenced in the manuscript, visual representation in the plot would facilitate a clearer understanding of their enrichment levels.

      ID for unknown AP2 proteins have been added on the volcano plot.

      I recommend presenting Figure S2b as a panel within a primary figure. This change would offer readers a more quantitative understanding of the distinct differences between developmental stages. Notably, there seems to be a limited number of genes in common when considering the total, and there is an apparent lack of enrichment in the ring stage.

      This has been done.

      The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. 

      We have improved the figure legends and add the number of biological replicates as well as the statistic used in each figure legend.

      Figure 1A: The protein diagram with its domains does not take scale into account.

      The figure has been modified.

      Reviewer #2 (Recommendations For The Authors):

      (1) The study lacks a direct link between PfMORC's inferred function and the state of heterochromatin in the genome post-depletion.

      We agree with the reviewer's comment and have included additional ChIP-seq experiments to measure changes in histone marks in PfMORC depleted parasite line. We show a significant decrease in histone H3K9me3 marks in PfMORC KD condition.

      Conducting ChIP-seq on well-known heterochromatin markers such as H3K9me3, HP1, or H3K36me2/3 could shed light on the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      With no access to an anti-HP1 antibody with reasonable affinity, we have not been able to study the impact of MORC KD on HP1 but have successfully observed the impact on H3K9me3 marks. These results have been added to the revised manuscript in (Fig. 5).

      (2) The authors should conduct a more comprehensive analysis of PfMORC's genomic localization, comparing it to ApiAP2 binding (interacting proteins) and histone modifications. This would provide valuable insights.

      We have performed a more comprehensive genome wide analysis of MORC binding through ChIP-seq on WT and MORC-KD conditions. Our results show that Pf_MORC localizes to heterochromatin with significant overlap with H3K9-trimethylation (H3K9me3) marks, at or near _var gene regions. When downregulated, level of H3K9me3 was detected at a lower level, validating a possible role of _Pf_MORC in gene repression. Regarding the comparison with AP2 binding, our proteomics datasets have shown extensive MORC binding with several AP2 proteins.

      (3) RNA-seq data reveals that only a few genes are affected after 24 hours of PfMORC depletion, with an equivalent number of up-regulated and down-regulated genes. The reasons behind down-regulation resulting from a heterochromatin marker depletion are not clearly established.

      We agree with the reviewer’s comment. At this stage (24 hours), _Pf_MORC depletion is limited and the effects at the transcriptional level are quite restricted. Furthermore, it is highly probable that down-regulated genes are most likely due to an indirect effect of a cell cycle arrest. We have edited the manuscript to address this comment. 

      The relationship between this data and the partial depletion of PfMORC needs further discussion.

      We agree with the reviewers and have improved our discussion in the revised version of the manuscript.

      (4) The authors did not compare their ChIP-seq data with the genes found downregulated in the RNA-seq data. Examining the correlation between these datasets would enhance the study.

      We apologize for the confusion. We have compared ChIP-seq and RNA-seq data and identified a very limited number of overlapping genes indicating that most of the changes observed in gene expression are in fact most likely indirect due to a cell cycle arrest and a collapse of the chromatin. We have edited the manuscript to clarify this issue.

      (5) The discussion section is relatively concise and does not fully address the complexity of the data, warranting further exploration.

      We have improved the discussion section in the revised version of the manuscript.

    1. eLife Assessment

      This valuable study focuses on the regulation of Notch signaling during the immune response in Drosophila. The authors provide solid evidence in support of roles for Su(H) and Pkc53E-induced phosphorylation in Drosophila immunity. The work will be of interest to colleagues in immunity and receptor signaling.

    2. Reviewer #1 (Public review):

      The authors previously showed in cell culture that Su(H), the transcription factor mediating Notch pathway activity in Drosophila, was phosphorylated on S269 and they found that a phospho-deficient Su(H) allele behaves as a moderate gain of Notch activity in flies, notably during blood cell development. Since downregulation of Notch signaling is important for the production of specialized blood cell types (lamellocytes) in response to wasp parasitism, the authors hypothesized that Su(H) phosphorylation might be involved in this cellular immune response.<br /> Consistent with their hypothesis, the authors now show that Su(H)S269A knock-in flies display a reduced response to wasp parasitism and that Su(H) is phosphorylated upon infestation. Using in vitro kinase assays and a genetic screen, they identify the PKCa family member Pkc53E as the putative kinase involved in Su(H) phosphorylation and they show that Pkc53E can bind Su(H). They further show that Pkc53E deficit or its knock-down in larval blood cells results in similar blood cell phenotypes as Su(H)S269A and their epistatic analyses indicate that Pkc53E acts upstream of Su(H). Finally, they show that Pkc53E mutants aslo display a compromised immune response to wasp parasitism.

      Strengths

      The manuscript is well presented and the experiments are sound, with a good combination of genetic and biochemical approaches and several clear phenotypes backing the main conclusions. Notably Su(H)S269A mutation strongly reduces lamellocyte production. Moreover, the epistatic data are convincing, notably concerning the relationship between Notch/Su(H) and Pkc53E for crystal cell production.<br /> Even though it is not fully established, the overall model is credible and interesting. In addition, it opens further avenues of research to study the activation of Pkc in response to an immune challenge.

      Weaknesses

      Apparently, the hypothesis that Pkc53E is required for Su(H) phosphorylation in vivo could not be directly tested due to the lack of an appropriate tool (the specificity and sensitivity of the current anti-pS269 antibody was insufficient).<br /> Also, the poor immune response of Pkc53E mutant might rather be linked to their constitutively reduced circulating blood cell number than to a deficit in Notch/Su(H) down-regulation following wasp infestation.

    3. Reviewer #2 (Public review):

      The current draft by Deischel et.al., describes the role of Pkc53E in the phosphorylation of Su(H) to down regulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. The authors have proved the central idea linking phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity to mount a robust immune response is now well addressed in its entirety and I find the paper indeed very interesting.

      Comments on revised version:

      The authors have addressed all pending concerns and I have no further comments. I indeed complement the authors for their wonderful piece of work.

    4. Reviewer #3 (Public review):

      Diechsel et al. provide important and valuable insights into how Notch signaling is shut down in response to parasitic wasp infestation in order to suppress crystal cell fate and favor lamellocyte production. The study shows that CSL transcription factor Su(H) is phosphorylated at S269A in response to parasitic wasp infestation and this inhibitory phosphorylation is critical for shutting down Notch. The authors go on to perform a screen for kinases responsible for this phosphorylation and have identified Pkc53E as the specific kinase acting on Su(H) at S269A. Using analysis of mutants, RNAi and biochemistry-based approaches the authors convincingly show how Pkc53E-Su(H) interaction is critical for remodeling hematopoiesis upon wasp challenge. I find the study interesting, and the data presented supports the overall conclusions made by the authors. The authors have addressed all my comments satisfactorily in the revised submission.

      Strengths:

      The manuscript is well presented, and the conclusions made are backed by genetic, biochemical and molecular biology-based approaches. Overall, the authors convincingly demonstrate how Pkc53E mediated phosphorylated of Su(H) shuts down Notch signaling during wasp infestation in Drosophila.

      Weaknesses:

      The exact molecular trigger for activation of Pkc53E is still uncharacterized and it would be interesting to know how Pkc53E gets activated during wasp infestation and whether Pkc53E gets activated turning down Notch in other stress induced scenarios.

      The authors have addressed comments satisfactorily. Overall, I think the findings are interesting and would be useful to the field of developmental biology and immunology and address an important gap in the field. The most significant conclusion from the work is how Notch acts as a molecular switch during parasitic wasp infestation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors previously showed in cell culture that Su(H), the transcription factor mediating Notch pathway activity, was phosphorylated on S269 and they found that a phospho-deficient Su(H) allele behaves as a moderate gain of Notch activity in flies, notably during blood cell development. Since a downregulation of Notch signaling was proposed to be important for the production of a specialized blood cell types (lamellocytes) in response to wasp parasitism, the authors hypothesized that Su(H) phosphorylation might be involved in this cellular immune response.

      Consistent with their hypothesis, the authors show that Su(H)S269A knock-in flies display a reduced response to wasp parasitism and that Su(H) is phosphorylated upon infestation. Using in vitro kinase assays and a genetic screen, they identify the PKCa family member Pkc53E as the putative kinase involved in Su(H) phosphorylation and they show that Pkc53E can bind Su(H). They further show that Pkc53E deficit or its knock-down in larval blood cells results in similar blood cell phenotypes as Su(H)S269A, including a reduced response to wasp parasitism, and their epistatic analyses indicate that Pkc53E acts upstream of Su(H).

      Strengths

      The manuscript is well presented and the experiments are sound, with a good combination of genetic and biochemical approaches and several clear phenotypes which back the main conclusions. Notably Su(H)S269A mutation or Pkc53E deficiency strongly reduces lamellocyte production and the epistatic data are convincing.

      Weaknesses

      The phenotypic analysis of larval blood cells remains rather superficial. Looking at melanized cells is a crude surrogate to quantify crystal cell numbers as it is biased toward sessile cells (with specific location) and does not bring information concerning the percentage of blood cells differentiated along this lineage.

      In Su(H)S269A knock-in or Pkc53E zygotic mutants, the increase in crystal cells in uninfected conditions and the decreased capacity to induce lamellocytes following infection could have many origins which are not investigated. For instance, premature blood cell differentiation could promote crystal cell differentiation and reduce the pool of lamellocytes progenitors. These mutations could also affect the development and function of the posterior signaling center in the lymph gland, which plays a key role in lamellocyte induction.

      Similarly, the mild decrease on resistance to wasp infestation (Fig. 2A) could reflect a constitutive reduction in blood cell numbers in Su(H)S269A larvae rather than a defective down-regulation of Notch activity.

      We fully agree with the reviewer that sessile crystal cells counts are a coarse approach to capture hemocytes. However, they allowed the screening of numerous genotypes in the course of our kinase candidate screen. We recorded the hemocyte numbers in the various genetic backgrounds and with regard to wasp infestation. There was no significant difference between Su(H)S269A and Su(H)gwt control, independent of infection. This is in agreement with earlier observations of unchanged plasmatocyte numbers in N or Su(H) mutants compared to the wild type (Duvic et al., 2002). We noted, however, a small drop in hemocyte numbers in Su(H)S269D and a strong one in Pkc53ED28 mutants in both conditions relative to control. Presumably, Pkc53E has a more general role in blood cell development, which we have not further analysed. The results were included in new Figure 1_S1 and Figure 9_S1 supplements. Based on the link between hemocyte numbers and wasp resistance (e.g. McGonigle et al., 2017), we cannot exclude that the lowered resistance of Pkc53ED28 mutants regarding wasp attacks is partly due to reduced hemocyte numbers, albeit we did not see significant differences between either Su(H)S269A, nor Pkc53ED28 nor the double mutant. We have included this notion in the text.

      Lamellocytes arise in response to external challenges like parasitoid wasp infestation by trans-differentiation from larval plasmatocytes, and by maturation of lamellocyte precursors in the lymph gland, yet barely in the Su(H)S269A and Pkc53ED28 mutants.

      We find it hard to envisage, however, that a premature differentiation of plasmatocytes into crystal cells in our case could deplete the pool of lamellocyte progenitors in the hemolymph. (Is there a precedent?). Crystal cells make up about 5% of the hemocyte pool; they are increased max. 2 fold in the Su(H)S269A and Pkc53E mutants. Even if these extra crystal cells (now  ̴10%) had arisen by premature differentiation, there should be still enough plasmatocytes (̴ 80%) remaining with a potential to further divide and transdifferentiate into lamellocytes.

      Indeed, we cannot exclude an effect of the Su(H)S269A mutant on the development and function of the posterior signaling center of the lymph gland. We noted, however, a slight but significant enlargement of the PS in the Su(H)S269A mutant, that to our understanding cannot explain the reduced lamellocyte numbers.

      Whereas the authors also present targeted-knock down/inhibition of Pkc53E suggesting that this enzyme is required in blood cells to control crystal cell fate (Fig. 6), it is somehow misleading to use lz-GAL4 as a driver in the lymph gland and hml-GAL4 in circulating hemocytes as these two drivers do not target the same blood cell populations/steps in the crystal cell development process.

      We fully agree with the reviewer that the two driver lines target different blood cell populations/ steps in hematopoiesis. The hml-Gal4 driver is regarded pan-hemocyte, common to both plasmatocytes and pre-crystal cells (e.g. Tattikota et al., 2020). It has been reported to drive specifically within differentiated hemocytes prior to or at the stage of crystal cells commitment (Mukherjee et al., 2011). Hence, hml-Gal4 appeared suitable to hit sessile and circulating hemocytes prior to final differentiation into crystal cells or lamellocytes, respectively.

      In the lymph gland, however, hml is expressed within the cortical zone, where it appears specific to the plasmatocytes lineage, and not present in the crystal cell precursors (Blanco-Obregon et al., 2020). In contrast, lz-Gal4 is specific to the differentiating crystal cells in both lineages, i.e. in circulating and sessile hemocytes and in the lymph gland. Hence, we choose lz-Gal4 instead of hml-Gal4 at the risk of driving markedly later in the course of crystal cell differentiation. We included the reasoning in the text. Overall, we feel that this choice does not limit our conclusions.

      In addition, the authors do not present evidence that Pkc55E function (and Su(H) phosphorylation) is required specifically in blood cells to promote lamellocyte production in response to infestation.

      We have tried to address this interesting question by several means. Firstly, we show that Pkc53E is indeed expressed in the various cell types of larval hemocytes, shown in a new Figure 8 and Figure 8_S1 supplement. I.e., there is the potential of Pkc53E to promote lamellocyte formation. Moreover, RNAi-mediated downregulation of Pkc53E within hemocytes affected crystal cell formation similar to the Pkc53ED28 mutant, in agreement with a specific requirement within blood cells (Figure 6). Finally, we show a major drop in Notch target gene transcription (NRE-GFP) in response to wasp infestation within isolated hemocytes from Su(H)gwt in contrast to Su(H)S269A larvae (see new Figure 1 G). These data show that Su(H)-mediated Notch activity must be downregulated in hemocytes prior to lamellocyte formation in agreement with our hypothesis.

      Finally, the conclusion that Pkc53E is (directly) responsible for Su(H) phosophorylation needs to be strengthened. Most importantly, the authors do not demonstrate that Pkc53E is required for Su(H) phosphorylation in vivo (i.e. that Su(H) is not phosphorylated in the absence of Pkc53E following infestation).

      We would very much like to show respective results. Unfortunately, the low affinity of our pS269 antibody does not allow any in situ or in vivo experiments. We very much hope to obtain a more specific phosphoS269-Su(H) antibody allowing us further in situ studies, and show, for example co-localization with Pkc53E.

      In addition, the in vitro kinase assays with bacterially purified Pkc53E (in the presence of PMA or using an activated variant of Pkc53E) only reveal a weak activity on a Su(H) peptide encompassing S269 (Fig. 4).

      The reviewer correctly notes the poor activity of our purified Pkc53EEDDD kinase. This low activity also holds true for the standard peptide (PS), which in fact is even less well accepted than the Swt substrate. Indeed, the commercially available PKCα is a magnitude more active. Whether this reflects the poor quality of our isolated protein compared to the commercial PKCα, or whether it reflects a true biochemical property of Pkc53E remains to be shown in the future. We noted this observation in the manuscript.

      Moreover, while the authors show a coIP between an overexpressed Pkc53E and endogenous Su(H) (Fig. 7) (in the absence of infestation), it has recently been reported that Pkc53E is a cytoplasmic protein in the eye (Shieh et al. 2023), calling for a direct assessment of Pkc53E expression and localization in larval blood cells under normal conditions and upon infestation.

      Indeed, it is interesting that a Pkc53E-GFP fusion protein is cytoplasmic in the eye. The construct reported by Shieh et al. however, i.e. the B-isoform, is preferentially expressed in photoreceptors, where it regulates the de-polymerization of the actin cytoskeleton.

      Due to the eye-specific expression, we unfortunately cannot use the Pkc53E-B-GFP construct to test for Pkc53E’s distribution in other tissues.

      As this construct is of little use for studying hematopoiesis, we have instead used Pck53E-GFP (BL59413) derived from a protein trap: again, GFP is primarily seen in the cytoplasm of hemocytes, including lamellocytes of infected larvae. However, in a small number of hemocytes, GFP appears to be also nuclear (Fig. 8A), leaving the possibility that activated Pkc53E may localize to the nucleus, eventually phosphorylating Su(H) and downregulating Notch activity. As Su(H) enters the nucleus piggy-back with NICD, however, phosphorylation may as well occur at the membrane or within the cytoplasm. We note, however, that these hypotheses require a much more detailed analysis.

      Furthermore, the effect of the PKCa agonist PMA on Su(H)-induced reporter gene expression in cell culture and crystal cell number in vivo is somehow consistent with the authors hypothesis, but some controls are missing (notably western blots to show that PMA/Staurosporine treatment does not affect Su(H)-VP16 level) and it is unclear why STAU treatment alone promotes Su(H)-VP16 activity (in their previous reports, the authors found no difference between Su(H)S269A-VP16 and Su(H)-VP16) or why PMA treatment still has a strong impact on crystal cell number in Su(H)S269A larvae.

      We have added a Western blot showing that the treatment does not affect Su(H)-VP16 expression levels (Figure 5_supplement 1). As STAU is a general kinase inhibitor, it may obviate any inhibitory phosphorylation of Su(H)-VP16 in the HeLa cells, e.g. that by Akt1, CAMK2D or S6K which pilot T271, phosphorylation of which is expected to affect the DNA-binding of Su(H) as well (Figure 3_supplement 2). Moreover, in the previous report, we used different constructs with regard to the promoter, and we used RBPJ instead of Su(H), which may explain some of the discrepancies. As PMA is not specific to just Pkc53E, the altered crystal cell numbers may result from the influence on other kinases involved in blood cell homeostasis, as predicted by our genetic screen (Figure 3_supplement 1).

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide a more elaborate examination of larval blood cell types and blood cell counts under normal conditions and following infestation in the different zygotic mutants as well as upon Pkc53 knock-down. A thorough examination of PSC integrity should be performed and the maintenance of core blood cell progenitors examined. The authors should also clarify when after infestation the LG and larval bleeds are analyzed.

      - a more elaborate examination of larval blood cell types:

      - examination of larval blood cell counts under normal conditions: hemocyte # in gwt, SA, SD, & Pkc

      - examination of larval blood cell counts after infestation: hemocyte # in gwt, SA, SD, & Pkc

      - thorough examination  of PSC integrity: in gwt, SA, SD, & Pkc

      - thorough examination of blood cell progenitors: in gwt, SA, SD, & Pkc

      - clarify timing

      Hemocyte numbers of the various genotypes and conditions were recorded and are presented in Figure 1_S1 and Figure 9_S1. Timing was elaborated in the text and the Methods section.

      (2) The authors should clarify why they use lz-GAL4 or hml-GAL4 and what we can infer from using these different drivers.

      See above. The reasoning was included in the text.

      (3) The percentage of hatching of Su(H)S269A and Su(H)gwt flies in the absence of infestation should also be scored; a small decrease in Su(H)S269A viability might explain the observed differences in survival to wasp infestation. Absolute blood cell numbers (in the absence of infestation) have also been correlated with survival to infection and should be checked.

      Percentage of the emerging flies and hemocyte numbers in the absence of infestation were recorded and included in Figure 2, Figure 1_S1, Figure 9_S1.

      (4) Whereas the impact of Su(H)S269A or Pkc53E mutation on lamellocytes production is clear, there is still a substantial reduction in crystal cell production following infestation. So I wouldn't conclude that the Su(H) larvae are "unable" to detect this immune challenge or respond to it (line 116).

      Thank you for the hint, we corrected the text.

      (5) The expression and localization of Pkc53E in larval blood cells should be investigated, for instance using the Pkc53E-GFP line recently published by Shieh et al. (or at least at the RNA level).

      Firstly, we confirmed expression of Pkc53E in hemocytes by RT-PCR (Figure 8_S1 supplement). Secondly, expression of Pkc53E-GFP was monitored in hemocytes (Figure 8). To this end, we used the protein trap (BL59413), since the one published by Shieh et al., 2023 is restricted to photoreceptors.

      (6) It would be interesting to test the anti-pS269 antibody in immunostaining (using Su(H)S269A as negative control).

      Unfortunately, the pS269 antiserum does not work in situ at all.

      (7) The authors must perform a western blot with anti-pS269 in Pkc53e mutant to show that Su(H) is not phosphorylated anymore after wasp infestation.

      The blot gives a negative result.

      (8) It is surprising that no signal is seen in the absence of infestation with anti-pS269: the fact that Su(H)S269A have more crystal cells suggest that there is a constitutive level of phosphorylation of Su(H).

      We fully agree: In the ideal world, we would expect a low level of S269 phosphorylation in the wild type as well. However, given the lousy specificity of our antibody, we were happy to see phospho-Su(H) in infected larvae. We are currently working hard to get a better antibody. 

      (9) The authors should check Su(H)-VP16 levels and phosphorylation status after PMA and/or staurosporine treatment. Some clarifications are also needed to explain the impact of PMA in Su(H)S269 larvae (this clearly suggests that PKC has other substrates implicated in crystal cell development).

      Su(H)-VP16 expression levels were monitored by Western blot and were not altered conspicuously (Figure 5_1 supplement). Presumably, Pkc53E is not the only kinase involved in Su(H) phosphorylation or the transduction of stress signals. Moreover, PMA may have a more general effect on larval development and hematopoiesis affecting both genotypes. We included this reasoning in the text.

      (10) Concerning the redaction, the authors forgot to mention and discuss the work of Cattenoz et al. (EMBO J 2020). The presentation of the screen for kinase candidates could be streamlined and better illustrated (notably supplement table 4, which would be easier to grasp as a figure/graph). The discussion could be shortened (notably the part on T cells), and I don't really understand lines 374-376 (why is it consistent?).

      We are sorry for omitting Cattenoz et al. 2020, which we have now included. We fully agree that this paper is of utmost importance to our work. We streamlined the screen and included a new figure in addition to table 4 summarizing the results graphically (Figure 3_S1 supplement). We cut on the T cell part and omitted the strange lines.

      Reviewer #2 (Public Review):

      Summary:

      The current draft by Deischel et.al., entitled "Inhibition of Notch activity by phosphorylation of CSL in response to parasitization in Drosophila" decribes the role of Pkc53E in the phosphorylation of Su(H) to downregulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. However, I have a number of concerns with the manuscript which are central to the idea that link the phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity. I enlist them one by one subsequently.

      Strengths:

      I find the study interesting and relevant especially because of the following:

      (1) The identification of Pkc53E in phosphorylation of Su(H) is very interesting.

      (2) The role of this interaction in modulating Notch signaling and thereafter its requirement in mounting a strong immune response to wasp infection is also another strong highlight of this study.

      Weaknesses:

      (1) Epistatic interaction with Notch is needed: In the entire draft, the authors claim Pkc53E role in the phosphorylation of Su(H) is down-stream of notch activity. Given the paper title also invokes Notch, I would suggest authors show this in a direct epistatic interaction using a Notch condition. If loss of Notch function makes many more lamellocytes and GOF makes less, then would modulating Pkc53E (and SuH)) in this manifest any change? In homeostasis as well, given gain of Notch function leads to increased crystal cells the same genetic combinations in homeostasis will be nice to see.

      While I understand that Su(H) functions downstream of Notch, but it is now increasingly evident that Su(H) also functions independent of Notch. An epistatic relationship between Notch and Pkc will clarify if this phosphorylation event of Su(H) via Pkc is part of the canonical interaction being proposed in the manuscript and not a non-canoncial/Notch pathway independent role of Su(H).

      This is important, as I worry that in the current state, while the data are all discussed inlight of Notch activity, any direct data to show this affirmatively is missing. In our hands we do find Notch independent Su(H) function in immune cells, hence this is a suggestion that stems from our own personal experience.

      The role of Notch in Drosophila hematopoiesis, notably during crystal cell development in both hematopoietic compartments is well established; likewise the role of Su(H) as integral signal transducer in this context (e.g. Duvic et al., 2002). Not only promotes Notch activity crystal cell fate by upregulating target genes, at the same time it prevents adopting the alternative plasmatocyte fate (e.g. Terriente-Felix et al., 2013). We could confirm the downregulation of Notch target gene expression in response to wasp infestation by qRT-PCR, which was discovered earlier by Small et al. (2014). This is clearly in favor of a repression of Notch activity rather than a relief of inhibition by Su(H). A ligand-independent activation of Notch signaling has been uncovered in the context of crystal cell maintenance in the lymph gland involving Sima/Hif-α, including Su(H) as transcriptional mediator (Mukherjee et al., 2011). However, we are unaware of a respective Su(H) activity independent of Notch.

      Certainly, Su(H) acts independently of Notch in terms of gene repression. Here, Su(H) forms a repressor complex together with H and co-repressors Groucho and CtBP to silence Notch target genes. Accordingly, loss of Su(H) or H may induce the upregulation of respective gene expression independent of Notch activity. This has been demonstrated, for example, during wing and heart development (Klein et al., 2000; Kölzer, Klein, 2006; Panta et al., 2020). Moreover, during axis formation of the early embryo, global repression is brought about by Su(H) and relieved by activated Notch (Koromila, Stathopolous, 2019). In all these instances, Su(H) is thought to act as a molecular switch, and the activation of Notch causes a strong expression of the respective genes. Likewise, the loss of DNA-binding resulting from the phosphorylation of Su(H) allows the upregulation of repressed Notch target genes in wing imaginal discs, e.g. dpn, as we have demonstrated before with overexpression and clonal analyses (Nagel et al. 2017; Frankenreiter et al., 2021). However, H does not contribute to crystal cell homeostasis, i.e. de-repression of Notch target genes does not appear to be a major driver in this context, asking for additional mechanisms to downregulate Notch activity. Our work provides evidence that these inhibitory mechanisms involves the phosphorylation of Su(H) by Pkc53E. Formally, we cannot exclude alternative mechanisms. Hence, we have tried to avoid the direct link between Su(H) phosphorylation and the inhibition of Notch activity throughout the text, including the title. Moreover, we have discussed the possible consequences of Su(H) lack of DNA binding, interfering either with the activation of Notch target genes or abrogating their repression.

      In addition, we have performed new experiments addressing the epistasis between Notch and Su(H) during crystal cell formation (Figure 1_supplement 1). To this end, we knocked down Notch activity in hemocytes by RNAi (hml::N-RNAi) in the Su(H)gwt and Su(H)S269A background, respectively. Indeed, Notch downregulation strongly impairs crystal cell development independent of the genetic background as expected if Notch were epistatic to Su(H). We attribute the slightly elevated crystal cell numbers observed in the Su(H)S269A background to the increase in the embryonic precursors (see Fig. 4; Frankenreiter et al. 2021). Of note, the Notch gain of function allele Ncos479 also displayed a likewise increase in embryonic crystal cell precursors as well as in crystal cells within the lymph gland (Frankenreiter et al. 2021).

      (2) Temporal regulation of Notch activity in response to wasp-infection and its overlapping dynamics of Su(H) phosphorylation via Pkc is needed:

      First, I suggest the authors to show how Notch activity post infection in a time course dependent manner is altered. A RT-PCR profile of Notch target genes in hemocytes from infected animals at 6, 12, 24, 48 HPI, to gauge an understanding of dynamics in Notch activity will set the tone for when and how it is being modulated. In parallel, this response in phospho mutant of Su(H) will be good to see and will support the requirement for phosphorylation of Su(H) to manifest a strong immune response.

      Indeed, it would be extremely nice to follow the entire processes in every detail, ideally at the cellular level. The challenge, however, is quantities. The mRNA isolated from hemocytes could be barely quantified, although the subsequent ct-values were ok. We quantified NRE-GFP expression, introduced into Su(H)gwt and Su(H)S269A, as well as atilla expression. We were able to generate data for two time slots, 0-6 h and 24-30 h post infection. The data are provided in the extended Figure 1G, and show a strong drop of NRE-GFP in the infected Su(H)gwt control compared to the uninfected animals, whereas expression in Su(H)S269A plateaus at around 60%-70% of the infected Su(H)gwt control. Atilla expression jumps up in the control, but stays low in Su(H)S269A hemocytes.

      Second, is the dynamics of phosphorylation in a time course experiment is missing. While the increased phosphorylation of Su(H) in response to wasp-infestation shown in Fig.2B is using whole animal, this implies a global down-regulation of Su(H)/Notch activity. The authors need to show this response specifically in immune cells. The reader is left to the assumption that this is also true in immune cells. Given the authors have a good antibody, characterizing this same in circulating immune cells in response to infection will be needed. A time course of the phosphorylation state at 6, 12, 24, 48 HPI, to guage an understanding of this dynamics is needed.

      We really would love to do these experiments. Unfortunately, our pS269 antibody is rather lousy. It does not allow to detect Su(H) protein in tissue or cells, nor does it work on protein extracts in Westerns or for IP. Hence, we have no way so far to demonstrate cell or tissue specificity of Su(H) phosphorylation. So far, we were lucky to detect mCherry-tagged Su(H) proteins pulled down in rather large amounts with the highly specific nano-bodies. We have tried very hard to repeat the experiment with hemolymph and lymph glands only, but we have failed so far. Hence, we have to state that our antibody is neither suitable for in vivo analyses, nor for a detection of phospho-Su(H) at lower levels.

      The authors suggest, this mechanism may be a quick way to down-regulate Notch, hence a side by side comparison of the dynamics of Notch down-regulation (such as by doing RT-PCR of Notch target genes following different time point post infection) alongside the levels of pS269 will strengthen the central point being proposed.

      We fully agree and hope to address these issues in the future by improving our tools.

      Last, in Fig7. the authors show Co-immuno-precipitation of Pkc53EHA with Su(H)gwt-mCh 994 protein from Hml-gal4 hemocytes. I understand this is in homeostasis but since this interaction is proposed to be sensitive to infection, then a Co-IP of the two in immune cells, upon infection should be incorporated to strengthen their point.

      We do not fully agree with the reviewer. Although we also think that the interaction between Pkc53E and Su(H) might occur more frequently upon infection, we propose that this is a transient process occurring in several but not all hemocytes at a given time. Moreover, in the described experiment, Pkc53E-HA was expressed in hemocytes via the UAS/Gal4 system. We cannot exclude that this approach causes an overexpression. Hence, we would not expect considerable differences between unchallenged and infested animals.

      (3) In Fig 5B, the authors show the change in crystal cell numbers as read out of PMA induced activation of Pkc53E and subsequent inhibition of Su(H) transcriptional activity, I would suggest the authors use more direct measures of this read out. RT-PCR of Su(H) target genes, in circulating immune cells, will strengthen this point. Formation of crystal cells is not just limited to Notch, I am not convinced that this treatment or the conditions have other affect on immune cells, such as any impact on Hif expression may also lead to lowering of CC numbers. Hence, the authors need to strengthen this point by showing that effects are direct to Notch and Su(H) and not non-specific to any other pathway also shown to be important for CC development.

      We agree with the Reviewer that the rather general influence of PMA on PKCs might present a systemic stress to the animal. For example, we observed a slight drop of crystal cell numbers also in Su(H)S269A, suggesting other kinases apart from Pkc53E were affected that are involved in crystal cell homeostasis. We have included this notion in the text. To provide more conclusive evidence we also fed Staurosporine to the larvae which reversed the PMA effect. In addition, we assayed the expression of NRE-GFP in hemocytes of infected animals by qRT-PCR, and observed a strong drop in the infected versus uninfected control but less so in Su(H)S269A. The new data are provided in extended Figures 1G and 5B.

      (4) In addition to the above mentioned points, the data needs to be strengthened to further support the main conclusions of the manuscript. I would suggest the authors present the infection response with details on the timing of the immune response. Characterization of the immune responses at respective time points (as above or at least 24 and 48 HPI, as norms in the field) will be important. Also, any change in overall cell numbers, other immune cells, plasmatocytes or CC post infection is missing and is needed to present the specificity of the impact. The addition of these will present the data with more rigor in their analysis.

      Total hemocyte numbers of the various genotypes, i.e. control, Su(H)S269A, Su(H)S269D, and Pkc53ED28 were included before and after wasp infestation in supplemental Figures 1_S1 and 9_S1. 

      (5) Finally, what is the view of the authors on what leads to activation of Pkc53E, any upstream input is not presented. It will be good to see if wasp infection leads to increased Pkc53 kinase activity.

      The analysis of the full process is an ongoing project. We propose that ROS is produced upon the wasps’ sting, which is to trigger the subsequent cascade of events. These have to end with activation of Pkc53E in the presumptive pre-lamellocyte pool of both lineages, i.e. in plasmatocyte of the hemolymph, presumably in the sessile compartment (Tattikotta et al., 2021) and at the same time in the lymph gland cortex harboring the LM precursors (Blanco-Obregon et al., 2020). One of the known upstream kinases, Pdk1 has a similar impact on crystal cell development as Pkc53E, making its involvement likely. Moreover, we think that other PKCs influence the process as well.

      Without a good read out, e.g. a functional pSu(H) antiserum working in situ or a Pkc-activity reporter, it will be quite difficult to follow up this question. However, we already know that Pkc53E is expressed in hemocytes of all types independent of wasp infestation, in agreement with a role during lamellocyte differentiation. We hope to unravel the process in more of it in the future.

      Overall, I think the findings in the current state are interesting and fill an important gap, but the authors will need to strengthen the point with more detailed analysis that includes generating new data and also presenting the current data with more rigor in their approach. The data have to showcase the relationship with Notch pathway modulation upon phosphorylation of CSL in a much more comprehensive way, both in homeostasis and in response to infection which is entirely missing in the current draft.

      Reviewer #3 (Public Review):

      Diechsel et al. provide important and valuable insights into how Notch signalling is shut down in response to parasitic wasp infestation in order to suppress crystal cell fate and favour lamellocyte production. The study shows that CSL transcription factor Su(H) is phosphorylated at S269A in response to parasitic wasp infestation and this inhibitory phosphorylation is critical for shutting down Notch. The authors go on to perform a screen for kinases responsible for this phosphorylation and have identified Pkc53E as the specific kinase acting on Su(H) at S269A. Using analysis of mutants, RNAi and biochemistry-based approaches the authors convincingly show how Pkc53E-Su(H) interaction is critical for remodelling hematopoiesis upon wasp challenge. The data presented supports the overall conclusions made by the authors. There are a few points below that need to be addressed by the authors to strengthen the conclusions:

      (1) The authors should check melanized crystal cells in Su(H)gwt and Su(H)S269A in presence of PMA and Staurosporine?

      Thank you for the suggestion. We included the results of PMA + Staurosporine feeding into an extended Fig. 5B; they match those from the HeLa cells. Unfortunately, Staurosporine alone was lethal for the larvae at various concentrations, presumably owing to the overarching inhibition of kinase activity. This global effect also explains the high crystal cell numbers in the control fed with PMA + STAU compared to the untreated animals, as the downregulation of many kinases results in higher crystal cell numbers, a fact uncovered in our genetic screen.

      (2) Data for number of dead pupae, flies eclosed, wasps emerged post infestation should be monitored for the following genotypes and should be included:

      Pkc53EΔ28_, Su(H)S269A,_ Pkc53EΔ28 Su(H)S269A, Su(H)S269D, Su(H)S269D Pkc53EΔ28

      We extended the data with and without infection. The respective data are shown in a new Fig. 9 and an extended Fig. 2,  except for the Su(H)S269D allele. Su(H)S269D is larval lethal, i.e. dies too early for wasp development, and hence could not be included in the assay. Overall, Pkc53EΔ28 matched Su(H)S269A_._

      (3) The exact molecular trigger for activation of Pkc53E upon wasp infestation is not clear.

      Indeed, and we would love to know! Perhaps, the generation of Ca2+ by the wasp’s breach of the larval cuticle results in Pkc53E activation. The generation of ROS could be involved as well. At this point, we can only speculate. We hope to be able in the future to obtain direct experimental evidence for the one or the other hypothesis.

      (4) The authors should check if activating ROS alone or induction of Calcium pulses/DUOX activation can mimic this condition and can trigger activation of Pkc53E and thereby cause phosphorylation of Su(H) at S269

      The reviewer’s suggestions open up a new field of investigations, and are hence beyond of the scope of this article. However, we want to pursue the research in this direction, albeit we realize that counting crystal cells is too coarse but to give a first impression, and that lamellocytes may form already by breaching the larval cuticle. A major challenge shall be direct measurements of Pkc53E activation. To date, we have no tools for this, but ideally, we would like to have a direct, biochemical read out. Although we have been unsuccessful in the past, we want to develop a strong and specific phospho-S269 antibody that is also working in situ. Alternatively, we think of developing a PS-phosphorylation reporter, to allow reasonably addressing these questions.

      (5) Does Pkc53E get activated during sterile inflammation?

      We are in the process of addressing this issue, however, feel that his topic is beyond the scope of this paper. Our preliminary experiments, however, support the notion of a phospho-dependent regulation of Su(H) also in this context.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide a graphical representation of major phenotypes that form the basis of their investigation and conclusions but have not supplemented the quantitation with images that represent these phenotypes. The authors need to include the following data to strengthen their conclusions:

      (1) The authors should include representative images for each of the genotypes/conditions (in presence and absence of wasp infestation) based on which corresponding plots have been made in Figure 1. Please include this for both circulating lamellocytes in the hemolymph and in the lymph glands since this is one of the main figures presenting the key findings.

      The data have been included in Figure 1-S2 supplement.

      (2) Please include representative images of LG with Hnt staining and corresponding images for melanization for each of the genotypes used in the plots in Figure 6A and B.

      The data have been included in Figure 6-S2 supplement.

      (3) Representative images for each of the genotypes in Figure 7A & B should be included (circulating crystal cells and lymph gland crystal cell numbers).

      Representative images for each of the genotypes for Fig. 7A have been included in Figure 7-S1 and for the old Fig. 7B in Figure 9-S2 supplement, respectively.

    1. eLife Assessment

      This important study provides a new perspective on how human immunity shapes the antigenic evolution of pathogens. By combining theory and simulation the authors make a compelling case for the importance of eco-evolutionary interactions in population-level virus-host dynamics, which arise due to coupling between the dynamics of immune memories and viral variants. Although the work does not propose improved data-driven viral forecasting methods, it makes a conceptual contribution that advances the field's understanding of this problem's intrinsic difficulty.

    2. Reviewer #1 (Public review):

      In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written.

      In the revised version, the authors have addressed questions on the role of clonal interference by new simulations in the SI, clarified the connection between the SIR model and vanishing-fitness models, and placed their analysis into the broader context of consumer resource dynamics.

      However, the general conclusion, as stated in the abstract, that variant trajectories become unpredictable as a consequence of the SIR dynamics remains somewhat misleading. Two aspects contribute to this problem. (1) The empirical observation of ``quasi-neutrality', i.e. the absence of a net frequency increase inferred as an average of many trajectories at intermediate frequencies, does not imply that individual trajectories are neutral (i.e., fully stochastic and unpredictable) over the time span of observation. Rather, it just says that some have a positive and some have a negative selection coefficient over that time span. (2) As stated by the authors, the observation of average quasi-neutrality is indeed incompatible with the travelling wave model, where initially successful new variants are assumed to retain a fixed, positive selection coefficient from origination to fixation. This observation also limits predictions by extrapolation, where a positive selection coefficient inferred at small frequency is assumed to remain the same at later times and higher frequencies. However, predictions derived from Gog and Grenfell's multi-strain SIR model, as used by several authors, do not make the assumption of fixed selection coefficients and incorporate trajectory-specific, time-dependent expiration effects into their model predictions. This distinction remains blurred throughout the text of the paper.

    3. Reviewer #3 (Public review):

      In this work the authors present a multi-strain SIR model in which viruses circulate in a heterogeneous population with different groups characterized by different cross-immunity structures. They reformulate the qualitative features of these SIR dynamics as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2

      The idea that vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively.<br /> This general framework has the potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

      The qualitative connection between the coarse-grained features of these vanishing fitness dynamics and structured SIR processes offers additional intuition relevant to host-pathogens interactions, although as noted by the authors other ecological processes could drive similar evolutionary patterns. The additions in the revised manuscript, substantiating more thoroughly the connection between the SIR and the vanishing fitness description, are important to better appreciate the scope of the work.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the Editor and the Reviewers for their constructure review. In the light of this feedback, we have made a number of changes and additions to the manuscript, that we think improved the presentation and hopefully address the majority of the concerns by the reviewers.

      Main changes:

      •   We added a new SI section (B1) with a population dynamics simulation in the high clonal interference regime and without expiring fitness (see R1: (1)).

      •   We added a new SI section (A9) with the derivation of the equilibrium state of our SIR model in the case of 𝑀 immune groups and in the limit 𝜀 → 0 (see R1: (5)).

      •   The text of the section Abstraction as “expiring” fitness advantage has been modified.

      •   We added a new SI section (A4) describing the links between parameters of the “expiring fitness” and SIR models.

      All three reviewers had concerns about the relation between our SIR model and the “expiring fitness” model, that we hope will be addressed by the last two items listed above. In particular, we would like to underline the following points:

      •   The goal of our SIR model is to give a mechanistic explanation of partial sweeps using traditional epidemiological models. While ecological models (e.g. consumer resource) can give rise to the same phenomenology, we believe that in the context of host-pathogen interaction it is relevant to explicitely show that SIR models can result in partial sweeps.

      •   The expiring fitness model is mainly an effective model: it reproduces some qualitative features of the SIR but does not quantitatively match all aspects of the frequency dynamics in SIR models.

      •   It is possible to link the parameters of the SIR (𝛼,𝛾,𝑏,𝑓) and expiring fitness (𝑠,𝑥,𝜈) models at the beginning of the invasion of the variant (new SI section A4). However, the two models also differ in significant ways (the SIR model can for example oscillate, while the effective model can not). The correspondence of quantities like the initial invasion rate and the ‘expiration rate’ of fitness effects is thus only expected to hold for some time after the emergence of a novel variant.

      Public reviews:

      Reviewer 1:

      Summary In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written. Some aspects, detailed below, are not yet fully convincing and should be treated in a substantial revision.

      We thank the reviewer for their constructive criticism. The deep split in the A/H3N2 HA segment from 2013 to 2020 is indeed the one of the more striking examples of such meandering frequency dynamics in otherwise rapidly adapting populations. But the up and down of H1N1pdm clade 5a.2a.1 in recent years might be a more recent example. We argue that such meandering dynamics might be a common contributor to seasonal influenza dynamics, even if it only spans 3-6 years.

      (1) The quasi-neutral behaviour of amino acid changes above a certain frequency (reported in Fig, 3), which is the main overlap between influenza data and the authors’ model, is not a specific property of that model. Rather, it is a generic property of travelling wave models and more broadly, of evolution under clonal interference (Rice et al. Genetics 2015, Schiffels et al. Genetics 2011). The authors should discuss in more detail the relation to this broader class of models with emergent neutrality. Moreover, the authors’ simulations of the model dynamics are performed up to the onset of clonal interference 𝜌/ 𝑠0 \= 1 (see Fig. 4). Additional simulations more deeply in the regime of clonal interference (e.g. 𝜌/ 𝑠0 \= 5) show more clearly the behaviour in this regime.

      We agree with the reviewer that we did not discuss in detail the effects of clonal interference on quasi-neutrality and predictability. As suggested, we conducted additional simulations of our population model in the regime of high clonal interference (𝜌/ 𝑠0 ≫ 1) and without expiring fitness effects. The results are shown in a new section of the supplementary information. These simulations show, as expected, that increasing clonal interference tends to decrease predictability: the fixation probability of an adaptive mutation found at frequency 𝑥 moves closer to 𝑥 as 𝜌 increases. However, even in a case of strong interference 𝜌/ 𝑠0 \= 32, 𝑝fix remains significantly different from the neutral expectation. We conclude from this that while it is true that dynamics tend to quasi-neutrality in the case of strong interference, this effect alone is unlikely to explain observations of H3N2 influenza dynamics. In our previous publication (BarratCharlaix et al, MBE, 2021) we have also investigated the effect of epistatic interactions between mutations, along side strong clonal interference. We concluded that, while most of these processes make evolution less predictable and push 𝑝fix towards the diagonal, it is hard to reproduce the empirical observations with realistic parameters. The “expiring fitness” model, however, produces this quite readily.

      But there are qualitative differences between quasi-neutrality in traveling wave models and the expiring fitness model. In the traveling wave, a genotype carrying an adaptive mutation is always fitter than if it didn’t carry the mutation. Quasi-neutrality emerges from the accumulation of fitness variation at other loci and the fact that the coalescence time is not much bigger than the inverse selection coefficient of the mutation. In the expiring fitness model, the selective effect of the mutation itself goes away with time. We now discuss the literature on quasi-neutrality and cite Rice et al. 2015 and Schiffels et al. 2011.

      In this context, I also note that the modelling results of this paper, in particular the stalling of frequency increase and the decrease in the number of fixations, are very similar to established results obtained from similar dynamical assumptions in the broader context of consumer resource models; see, e.g., Good et al. PNAS 2018. The authors should place their model in this broader context.

      We thank the reviewer for pointing out the link between consumer resource models and our work. We further strengthened our discussion of the similarity of the phenomenology to models typically used in ecology and made an effort to highlight the link between consumer-resource models and ours in the introduction and in the part on the SIR model.

      (2) The main conceptual problem of this paper is the inference of generic non-predictability from the quasi-neutral behaviour of influenza changes. There is no question that new mutations limit the range of predictions, this problem being most important in lineages with diverse immune groups such as influenza A(H3N2). However, inferring generic non-predictability from quasi-neutrality is logically problematic because predictability refers to individual trajectories, while quasi-neutrality is a property obtained by averaging over many trajectories (Fig. 3). Given an SIR dynamical model for trajectories, as employed here and elsewhere in the literature, the up and down of individual trajectories may be predictable for a while even though allele frequencies do not increase on average. The authors should discuss this point more carefully.

      We agree with the reviewer that the deterministic SIR model is of course predictable. Similarly, a partial sweep is predictable. But we argue that expiring fitness makes evolution less predictable in two ways: (i) When a new adaptive mutation emerges and rises in frequency, we typically don’t know how rapidly its fitness effect is ‘expiring’. Thus even if we can measure its instantaneous growth rate accurately, we can’t predict its fate far into the future. (ii) Compared to the situation where fitness effects are not expiring, time to fixation is longer and there are more opportunities for novel mutations to emergence and change the course of the trajectory. We have tried to make this point clearer in the manuscript.

      (3) To analyze predictability and population dynamics (section 5), the authors use a Wright-Fisher model with expiring fitness dynamics. While here the two sources of the emerging neutrality are easily tuneable (expiring fitness and clonal interference), the connection of this model to the SIR model needs to be substantiated: what is the starting selection 𝑠0 as a function of the SIR parameters (𝑓,𝑏,𝑀,𝜀), the selection decay 𝜈 = 𝜈(𝑓,𝑏,𝑀,𝜀,𝛾)? This would enable the comparison of the partial sweep timing in both models and corroborate the mapping of the SIR onto the simplified W-F model. In addition, the authors’ point would be strengthened if the SIR partial sweeps in Fig.1 and Fig.2 were obtained for a combination of parameters that results in a realistic timescale of partial sweeps.

      We added a new section to the SI (A4) that relates the parameters of the SIR and expiring fitness models. In particular, we compute the initial growth rate 𝑠0 and a proxy for the fitness expiry rate 𝜈 as a function of the SIR parameters 𝛼,𝛾,𝑓,𝑏,𝑀, at the instant where the variant is introduced. The initial growth rate depends primarily on the degree of immune escape 𝑓, while the expiration rate 𝜈 is related to incidence 𝐼wt + 𝐼𝑚. However, as both models have fundamentally different dynamics, these relations are only valid on time scales shorter than potential oscillations of the SIR model. Beyond that, the connection between the models is mostly qualitative: both rely on the fact that growth rate of a strain diminishes when the strain becomes more frequent, and give rise to partial sweeps.

      In Figure 1, the time it takes a partial sweep to finish is roughly 100− 200 generations (bottom right panel). If we consider H3N2 influenza and take one generation to be one week, this corresponds to a sweep time of 2 to 4 years, which is slightly slower but roughly in line with observations for selective sweeps. This time is harder to define if oscillatory dynamics takes place (middle right panel), but the time from the introduction of the mutant to the peak frequency is again of about 4 years. The other parameters of the model correspond to a waning time of 200 weeks and immune escape on the order of 20-30% change in susceptibility.

      Reviewer 2:

      Summary

      This work addresses a puzzling finding in the viral forecasting literature: high-frequency viral variants evince signatures of neutral dynamics, despite strong evidence for adaptive antigenic evolution. The authors explicitly model interactions between the dynamics of viral adaptations and of the environment of host immune memory, making a solid theoretical and simulation-based case for the essential role of host-pathogen eco-evolutionary dynamics. While the work does not directly address improved data-driven viral forecasting, it makes a valuable conceptual contribution to the key dynamical ingredients (and perhaps intrinsic limitations) of such efforts.

      Strengths

      This paper follows up on previous work from these authors and others concerning the problem of predicting future viral variant frequency from variant trajectory (or phylogenetic tree) data, and a model of evolving fitness. This is a problem of high impact: if such predictions are reliable, they empower vaccine design and immunization strategies. A key feature of this previous work is a “traveling fitness wave” picture, in which absolute fitnesses of genotypes degrade at a fixed rate due to an advancing external field, or “degradation of the environment”. The authors have contributed to these modeling efforts, as well as to work that critically evaluates fitness prediction (references 11 and 12). A key point of that prior work was the finding that fitness metrics performed no better than a baseline neutral model estimate (Hamming distance to a consensus nucleotide sequence). Indeed, the apparent good performance of their well-adopted “local branching index” (LBI) was found to be an artifact of its tendency to function as a proxy for the neutral predictor. A commendable strength of this line of work is the scrutiny and critique the authors apply to their own previous projects. The current manuscript follows with a theory and simulation treatment of model elaborations that may explain previous difficulties, as well as point to the intrinsic hardness of the viral forecasting inference problem.

      This work abandons the mathematical expedience of traveling fitness waves in favor of explicitly coupled eco-evolutionary dynamics. The authors develop a multi-compartment susceptible/infected model of the host population, with variant cross-immunity parameters, immune waning, and infectious contact among compartments, alongside the viral growth dynamics. Studying the invasion of adaptive variants in this setting, they discover dynamics that differ qualitatively from the fitness wave setting: instead of a succession of adaptive fixations, invading variants have a characteristic “expiring fitness”: as the immune memories of the host population reconfigure in response to an adaptive variant, the fitness advantage transitions to quasi-neutral behavior. Although their minimal model is not designed for inference, the authors have shown how an elaboration of host immunity dynamics can reproduce a transition to neutral dynamics. This is a valuable contribution that clarifies previously puzzling findings and may facilitate future elaborations for fitness inference methods.

      The authors provide open access to their modeling and simulation code, facilitating future applications of their ideas or critiques of their conclusions.

      We thank the reviewer for their summary, assessement, and constructive critique.

      (1) The current modeling work does not make direct contact with data. I was hoping to see a more direct application of the model to a data-driven prediction problem. In the end, although the results are compelling as is, this disconnect leaves me wondering if the proposed model captures the phenomena in detail, beyond the qualitative phenomenology of expiring fitness. I would imagine that some data is available about cross-immunity between strains of influenza and sarscov2, so hopefully some validation of these mechanisms would be possible.

      We agree with the reviewer that quantitatively confronting our model with data would be very interesting. Unfortunately, most available serological data for influenza and SARS-CoV-2 is obtained using post-infection sera from previoulsy naive animal models. To test our model, we would require human serology data, ideally demographically resolved, and a way to link serology to transmission dynamics. Furthermore, our model is mostly an explanation for qualitative features of variant dynamics and their apparent lack of predictability. We therefore considered that quantitative validation using data is out of scope of this work.

      (2) After developing the SIR model, the authors introduce an effective “expiring fitness” model that avoids the oscillatory behavior of the SIR model. I hoped this could be motivated more directly, perhaps as a limit of the SIR model with many immune groups. As is, the expiring fitness model seems to lose the eco-evolutionary interpretability of the SIR model, retreating to a more phenomenological approach. In particular, it’s not clear how the fitness decay parameter 𝜈 and the initial fitness advantage 𝑠0 relate to the key ecological parameters: the strain cross-immunity and immune group interaction matrices.

      The expiring fitness model emerges as a limiting case, at least qualitatively, of the SIR model when growth rate of the new variant is small compared to the waning rate and the SIR model does not oscillate. This can be readily achieved by many immune groups, which reconciles the large effect of many escape mutations and the lack of oscillation by confining the escape to some fraction of the population. Beyond that, the expiring fitness model is mainly an effective model that allows us to study the consequences of partial sweeps on predictability on long timescales. As stated in the “Main changes” section at the start of this reply, we added an SI section which links parameters of the two models. However, we underline the fact that beyond the phenomenon of partial sweeps, the dynamics of the two are different.

      Reviewer 3:

      Summary

      In this work the authors start presenting a multi-strain SIR model in which viruses circulate in an heterogeneous population with different groups characterized by different cross-immunity structures. They argue that this model can be reformulated as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2.

      Strengths

      The idea that a vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively. This general framework has a potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

      We thank the reviewer for their positive remarks and constructive criticism below.

      Weaknesses

      The authors build the narrative around a multi-strain SIR model in which viruses circulate in an heterogeneous population, but the connection of this model to the rest of the paper is not well supported by the analysis. When presenting the random walk coarse-grained description in section 3 of the Results, there is no quantitative relation between the random walk ingredients importantly 𝑃(𝛽) - and the SIR model, just a qualitative reasoning that strains would initially grow exponentially and saturate at intermediate frequencies. So essentially any other microscopic description with these two features would give rise to the same random walk.

      As also highlighted in the response to other reviewers, we now discuss how the parameter of the SIR model are related to the initial growth rate and the ‘expiration’ rate of the effective model. While the phenomenology of the SIR model is of course richer, this correspondence describes its overdamped limit qualitatively well.

      Currently it’s unclear whether the specific choices for population heterogeneity and cross-immunity structure in the SIR model matter for the main results of the paper. In section 2, it seems that the main effect of these ingredients are reduced oscillations in variants frequencies and a rescaled initial growth rate. But ultimately a homogeneous population would also produce steady state coexistence between strains, and oscillation amplitude likely depends on parameters choices. Thus a homogeneous population may lead to a similar coarse-grained random walk.

      The reviewer is correct that the primary effects of using many immune groups is to slow down the increase of novel variant, which in turn dampens the oscillations. Having multiple immune groups widens the parameter space in which partial sweeps without dramatic oscillations are observed. For slow sweeps, similar dymamics are observed in a homogeneous population.

      Similarly, it’s unclear how the SIR model relates to the vanishing fitness framework, other than on a qualitative level given by the fact that both descriptions produce variants saturating at intermediate frequencies. Other microscopic ingredients may lead to a similar description, yet with quantitative differences.

      Both of these points were also raised by other reviewers and we agree that it is worth discussing them at greater length. We now discuss how the parameters of the ‘expiring fitness’ model relate to those of the SIR. We also discuss how other models such as ecological models give rise to similar coarse grained models.

      At the same time, from the current analysis the reader cannot appreciate the impact of such a mean field approximation where strains lose fitness independently from one another, and under what conditions such assumption may be valid.

      In the SIR model, the rate at which strains lose fitness does depend on the precise state of the host population through the quantities 𝑆𝑚 and 𝑆wt , which is apparent in equation (A27) of the new SI section. The fact that a new variant shifts the equilibrium frequencies of previous strains in a proportional way is valid if the “antigenic space” is of very high dimensions, as explained in section Change in frequency when adding subsequent strains of the SI. It would indeed be interesting to explore relaxations of this assumption by considering a larger class of cross immunity matrices 𝐾. However, in the expiring fitness model, the fact that strains lose fitness independently from each ohter is a necessary simplification.

      In summary, the central and most thoroughly supported results in this paper refer to a vanishing fitness model for human RNA viruses. The current narrative, built around the SIR model as a general work on host-pathogen eco-evolution in the abstract, introduction, discussion and even title, does not seem to match the key results and may mislead readers. The SIR description rather seems one of the several possible models, featuring a negative frequency dependent selection, that would produce coarse-grained dynamics qualitatively similar to the vanishing fitness description analyzed here.

      We have revised the text throughout to make the connections between the different parts of the manuscript, in particular the SIR model and the expiring fitness model, clearer. We agree that the phenomenology of the expiring fitness model is more general than the case of human RNA viruses described by the SIR model, but we think this generality is an attractive feature of the coarse-graining, not a shortcoming. Indeed, other settings with negative frequency dependent selection or eco-systems that adapt on appropriate time scale generate similar dynamics.

      Recommendations for the authors:

      Reviewer 1:

      (4) Line 74: what does fitness mean?

      Many population dynamics models, including ones used for viral forecasting, attach a scalar fitness to each strain. The growth rate of each strain is then computed by substracting the average population fitness to the strain’s fitness. In this sentence, fitness is intended in this way.

      (5) Fig. 1: The equilibrium frequency in the middle and bottom rows is hardly smaller than the equilibrium frequency in the top row for one immune group. This is surprising since for M=10, the variant escapes in only 1/10th of the population, which naively should impact the equilibrium frequency more strongly. Could the authors comment on this?

      This is indeed non-trivial, and a hand-waving argument can be made by considering the extreme case 𝜀 = 0. The variant is then completely neutral for the immune groups 𝑖 > 1, and would be at equilibrium at any frequency in these immune groups. Its equilibrium frequency is then only determined by group 1, which is the only one breaking degeneracy. For 𝜀 > 0 but small, we naturally expect a small deviation from the 𝜀 = 0 case and thus 𝛽 should only change slightly.

      A more rigorous argument with a mathematical proof in the case 𝜀 = 0 is now given in section A4 of the supplementary information.

      (6) Fig. 1: In the caption, it is stated that the simulations are performed with 𝜀 = 0.99. Is this a typo? It seems that it should be 𝜀 = 0.01, as in and just below equation (7).

      This was indeed a typo. It is now fixed.

      (7) Fig. 3: The data analysis should be improved. In order to link the average frequency trajectories to standard population genetics of conditional fixation probabilities, the focal time should always be the time where the trajectory crosses the threshold frequency for the first time. Plotting some trajectories from a later time onwards, on their downward path destined to loss, introduces a systematic bias towards negative clonal interference (for these trajectories, the time between the first and the second crossing of the threshold frequency is simply omitted). The focal time of first crossing of the threshold frequency can easily be obtained, e.g., by linear interpolation of the trajectory between subsequent time points of frequency evalution. In light of the modified procedure, the statements on the on the inertia of the trajectories after crossing 𝑥⋆ (line 356) should be re-examined.

      The way we process the data is already in line with the suggestions of the reviewer. In particular, we use as focal time the first time at which a trajectory is found in the threshold frequency bin. Trajectories that are never seen in the bin because of limited time-resolution are simply ignored.

      In Fig. 3, there are no trajectories that are on their downward path at the focal time and when crossing the threshold frequency. Our other work on predictability of flu Barrat-Charlaix et. al. (2021) has a similar figure, which maybe created confusion.

      (8) Fig. 4: authors write 𝛼/ 𝑠0 in the figure, but should be 𝜈/ 𝑠0.

      Fixed.

      (9) Line 420: authors refer to the blue curve in panel B as the case with strong interference. However, strong interference is for higher 𝜌/ 𝑠0, that is panel D (see point 1).

      Fixed.

      (10) Line 477: typo “there will a variety of mutations”.

      Fixed.

      Reviewer 2:

      Should 𝛼 be 𝜈 in Figure 4 legends?

      Thank you very much for spotting this error. We fixed it.

      Equations 4-5 could be further simplified.

      We factorised the 𝐼 term in equation 4. In equation 5, we prefered to keep the 1− 𝛿/ 𝛼 term as this quantity appears in different calculations concerning the model. For instance, 𝑆 = 𝛿/ 𝛼 at equilibrium.

      The sentence before equation 8 references 𝑃𝛽(𝛽), but this wasn’t previously introduced.

      We now introduce 𝑃𝑏𝜂 at the beginning of the section Ultimate fate of the variant.

      In the last paragraph of page 12, “monotonously” maybe should be “monotonically”.

      Fixed.

      For the supplement section B, you might want a more descriptive title than “other”.

      We renamed this section to Expiring fitness model and random walk.

      Reviewer 3:

      To expand on my previous comments, my main concerns regard the connection of section 2 and the SIR model with the rest of the paper.

      In the first paragraph of page 9 the authors argue that a stochastic version of the SIR model would lead to different fixation dynamics in homogeneous vs heterogeneous populations due to the oscillations. This paragraph is quite speculative, some numerical simulations would be necessary to quantitatively address to what extent these two scenarios actually differ in a stochastic setting, and how that depends on parameters.

      Likewise, the connection between the SIR model, the random walk coarse-grained description and the vanishing fitness model can be investigated through numerical simulations of a stochastic SIR given the chosen population and cross-immunity structures with i.e. 10-20 strains. This would allow for a direct comparison of individual strain dynamics rather than the frequency averages, as well as other scalar properties such as higher moments, coalescent, and fixation probability once reaching a given frequency. It would also be possible to characterize numerically the SIR P(beta) bridging the gap with the random walk description. It’s not obvious to me that the SIR P(beta) would not depend on the population size in the presence of birth-death stochasticity, potentially changing the moments scalings. I appreciate that such simulations may be computationally expensive, but similar numerical studies have been performed in previous phylodynamics works so it shouldn’t be out of reach.

      An alternative, the authors should consider re-centering the narrative directly on the random walk of the vanishing fitness model, mentioning the SIR more briefly as a possible qualitative way to get there. Either way the authors should comment on other ways in which this coarse-grained dynamics could arise.

      In the vanishing fitness model, where variants fitnesses are independent, is an infinite dimensional antigenic space implicitly assumed? If that’s the case, it should be explained in the main text.

      A long simulation of the SIR model would indeed be interesting, but is numerically demanding and our current simulation framework doesn’t scale well for many strains and susceptibilities. We thus refrained from adding extensive simulations.

      In Figure 2B of the main text, the simulation with 7 strains illustrates the qualitative match between the expiring fitness and the SIR model. However, it is clearly not long enough to discuss statistical properties of the corresponding random walk. Furthermore, we do not expect the individual strain dynamics of the SIR and expiring fitness models to match. The latter depends on few parameters (𝛼, 𝑠0), while the former depends on the full state of the host population and of the previous variants.

      In the sectin linking the parameters of the two models, we now discuss the distribution 𝑃(𝛽) of the SIR model for two strains and a specific choice of distribution for the cross immunity 𝑏 and 𝑓.

      Minor comments:

      There is some back and forth in the writing. For instance, when introducing the model, 𝐶𝑖𝑗 is first defined as 1/ 𝑀, then a few paragraphs later the authors introduce that in another limit 𝐶𝑖𝑖 is just much higher than any 𝐶𝑖𝑗, and finally they specify that the former is the fast mixing scenario.

      Another example is in section 2, in the first paragraph they put forward that heterogeneity and crossimmunity have different impacts on the dynamics, but the meaning attributed to these different ingredients becomes clear only a while later after the homogeneous population analysis. Uniforming the writing would make it easier for the reader to follow the authors’ train of thought.

      We removed the paragraph below Equation (1) mentioning the 𝐶𝑖𝑗 \= 1/ 𝑀 case, which we hope will linearize the writing.

      When mentioning geographical structure, why would geography affect how immunity sees pairs of viral strains (differences in 𝐾)?

      Geographic structure could influence cross-immunity because of exposure histories of hosts. For instance in the case of influenza, different geographical regions do not have the same dominating strains in each season, and hosts from different regions may thus build up different immunity.

      In the current narrative there are some speculations about non-scalar fitness, especially in section 2. The heterogeneity in this section does not seem so strong to produce a disordered landscape that defies the notion of scalar fitness in the same way some complex ecological systems do. A more parsimonious explanation for the coexistence dynamics observed here may be a negative frequency dependent selection.

      Our language here was not very precise and we agree that the phenomenology we describe is related to that of frequency dependent selection (mediated by via immunity of the host population that integrates past frequencies). Traveling wave models typically use fitness function that are independent of the population distribution and only account for the evolution via an increasing average fitness. We have made discussion more accurate by stating that we consider a case where fitness depends explicitly on present and past population composition, which includes the case of negative frequency dependent selection.

      I don’t understand the comparison with genetic drift (typo here, draft) in the last paragraph of section 3 given that there is no stochasticity in growth death dynamics.

      We compare the random walk to genetic drift because of the expression of the second moment of the step size. The genetic draft has the same functional form. If one defines the effective population size as in the text, the drift due to random sampling of alleles (neutral drift) and the changes in strain frequency in our model have the same first and second moments. The stochasticity here does not come from the dynamics, which are indeed deterministic, but from the appearance of new mutations (variants) on backgrounds that are randomly sampled in the population. This latter property is shared with genetic draft.

      In the vanishing fitness model, I think the reader would benefit from having 𝑃(𝑠) in the main text, and it should be made more clear what simulations assume what different choice of 𝑃(𝑠).

      We added the expression of 𝑃(𝑠) in the main text. Simulations use the value 𝑠0 \= 0.03, which we added in the caption of Figure 4.

      When comparing the model and data, is the point that COVID is not reproduced due to clonal interference? It seems from the plot that flu has clonal interference as well though. Why is that negligible?

      A similar point has been raised by the first reviewer (see R1-(1)). Clonal interference is not negligible, but we find it to be insufficient to explain the observations made for H3N2 influenza, namely the lack of inertia of frequency trajectories or the probability of fixation. This is shown in the new section (B1) of the SI. Both SARS-CoV-2 and H3N2 influenza experience clonal interference, but the former is more predictable than the latter. Our point is that expiring fitness effects should be stronger in influenza because of the higher immune heterogeneity of the host population, making it less predictable than SARS-CoV-2.

      Does the fixation probability as a function of frequency threshold match the flu data for some parameters sets?

      For H3N2 influenza, the fixation probability is found to be equal to the threshold frequency (see Barrat-Charlaix MBE 2021, also indirectly visible from Fig. 3). In Figure 4, we obtain that either a high expiry rate or intermediate expiry rates and clonal interference regimes match this observation.

      It would be instructive to see examples of the individual variant dynamics of the vanishing fitness model compared to the presented data.

      We added an extra SI figure (S7) showing 10 randomly selected trajectories of individual variants in the case of H3N2/HA influenza and for the expiring fitness model with different parameter choices.

      Figure 4E has no colorbar label. The reader shouldn’t have to look for what that means in the bottom of the SIs. In panels A and B the label should be 𝜈, not 𝛼. Same thing in most equations of page 42.

      We added the colorbar label to the figure and also updated the caption: a darker color corresponds to a higher probability of sweeps to overlap. We fixed the 𝜈 – 𝛼 confusion in the SI and in the caption of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations For The Authors:

      Reviewer #1:

      ●      It might help the reader if you make it explicit that mDES allows you to create an approximate amalgam of different kinds of experiences by assuming that, across individuals, there is a general consensus of experiences at particular points in the movie. Whether this assumption is an accurate reflection of the way in which each individual's brain is an important, testable prediction that could be discussed/examined in different projects. For instance, in other projects there are clear idiosyncratic responses to the same naturalistic stimuli: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8064646/.

      Thank you, this is an excellent point. We have included this article in our revision and expanded on the introduction to emphasize how this study relates to our work. Additionally, we have included an additional figure that helps illustrate how mDES can be used to evaluate the idiosyncrasy for each respective thought component to visually display the variance across moments in the film:

      Page 6-7 [137-148] In our study, we used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [8]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [22, 32, 33] and in daily life [34, 35], and is sensitive to accompanying changes in brain activity [24, 36]. Studies that use mDES to describe experience ask participants to provide experiential reports by answering a set of questions about different features of their thought on a continuous scale from 1 (Not at all) to 10 (Completely) [24, 32-41]. Each question describes a different feature of experience such as if their thoughts are oriented in the future or the past, about oneself or other people, deliberate or intrusive in nature, and more (See methods for a full list of questions used in the current study).

      ●      A cartoon describing the mDES technique could be helpful for uninitiated readers.

      Thank you for your suggestion, we have added an additional figure (Figure 3) that illustrates the process of mDES in the laboratory during this experiment, clarifying that participants answer mDES items using a slider to indicate their score (rather than expressing it verbally).

      ●      Did the authors check for any measures of reliability across mDES estimates other than split-half reliability? For instance, the authors could demonstrate construct validity by showing that engagement with certain features of the thought-sampling space aligned with specific points in the movies. If so, the start of the Results section would be a great place to demonstrate the reliability of the approach. For instance, did any two participants sample the same 15-second window of time in a particular stimulus? If so, you could compare their experience samples to determine whether the method was extensible across subjects.

      This is a great point, thank you very much for highlighting this. We have eight individuals at each time point in our analysis, which is probably not enough to calculate meaningful reliability measures. However, we have added a time series analysis of experience in each clip to our revision (Figure 3). In these time plots, it is possible to see clear moments in the film in which scores do not straddle 0 (using 95% CI), and often, these persist across successive moments (Figure 3; see time-series plot four for the clearest example).  When the confidence intervals of a sampling epoch do not overlap with zero, this suggests a high degree of agreement in thought content across participants. At the same time, our analysis shows that individual differences do exist since the relative presence of each component for each participant was linked to objective measures of movie watching (in this case, comprehension). In this revision we have specifically addressed this question by conducting ANOVAs to determine how scores on each component across the clip (See also supplementary table 11). This additional analysis shows that mDES effectively captures shared aspects of movie-watching and is also sensitive to individual variation (since it can describe individual differences).

      Page 15 [304-323]: Next, we examined how each pattern of thought changes across each movie clip. For this analysis, we conducted separate ANOVA for each film clip for the four components (see Table 1 and Figure 3). Clear dynamic changes were observed in several components for different films. We analyzed these data using an Analysis of Variance (ANOVA) in which the time in each clip were explanatory variables of interest. This identified significant change in “Episodic Social Cognition” scores across Little Miss Sunshine, F(1, 712) = 10.80, p = .001, , η2 = .03, and Citizenfour, F(1, 712) = 5.23, p = .023, , η2 = .02. There were also significant change in “Verbal Detail” scores across Little Miss Sunshine, F(1, 712) = 31.79, p <.001, η2 = .09. Lastly, there were significant changes in “Sensory Engagement” scores for both Citizenfour, F(1, 712) = 6.22, p = .013, η2 = .02, and 500 Days of Summer, F(1, 706) = 80.41, p <.001, η2 = .18. These time series are plotted in Figure 3 and highlight how mDES can capture the dynamics of different types of experience across the three movie clips. Moreover, in several of these time series plots, it is clear that thought patterns reported extend beyond adjacent time periods (e.g. scores above zero between time periods 150 to 400 for Sensory Engagement in 500 days of Summer and for time periods between 175 and 225 for Verbal Detail in Little Miss Sunshine). It is important to note that no participant completed experience sampling reports during adjacent sampling points (see Supplementary Figure 7), so the length of these intervals indicates agreement in how specific scenes within a film were experienced and conserved across different individuals. Notably, the component with the least evidence for temporal dynamics was “Intrusive Distraction.”

      ●      P10: "Generation of the thought-space" - how stable are these word clouds to individual subjects? If there are subject-specific differences, are there ways to account for this with some form of normalization?

      Thank you for bringing up this point. Our current goal was to show how the average experience of one group of participants relates to the brain activity of a second group. In this regard it is important to seek the patterns of similarity across individuals in how they experience the film. However, as is normal in our studies using mDES, we can also use the variation from the mean to predict other cognitive measures and, in this way, account for the variability that individuals have in their movie-watching experience. In other words, the word clouds reflect the mean of a particular dimension, so when an individual score is close to 0, their thought content does not align with this dimension -- however, deviating scores, positive or negative, indicating that this dimension provides meaningful information about the individual's experience. Evidence of the meaningful nature of this variation can be seen in the links between the reported thoughts and the individuals’ comprehension (e.g. individuals whose thoughts do not contain strong evidence of “Intrusive Distraction”, or in other words, a negative score, tended to do better on comprehension tests of information in the movies they watched).

      ●      P11: "Variation in thought patterns" - can the authors use a null model here to demonstrate that the associations they've observed would occur above chance levels (e.g., for a comparison of time series with similar temporal autocorrelation but non-preserved semantic structure)? Further, were there any pre-defined hypotheses over whether any of the three different movies would engage any of the 4 observed dimensions?

      This is a great point. We chose to sample from three distinctly different films to help us understand if mDES was sensitive to different semantic and affective features of films. Our analysis, therefore, shows that at a broad level, mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, researchers in the future could derive mechanistic insights into how the semantic features may influence the mDES data. For example, future studies could ask participants to watch movies in a scrambled order to understand how varying the structure of semantics or information breaks the mapping between brains and ongoing experience. In this revision we have amended the text to reflect this possibility:

      Page 34 [674-679]. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES.

      ●      P14: "Brain - Thought Mappings: Voxel-space Analysis" - this is a cool analysis, and a nice validation of the authors' approach. I would personally love to see some form of reliability analysis on these approaches - e.g., do the same locations in the cerebral cortex align with the four features in all three movies? Across subjects?

      This is another great point, and we thank you for your enthusiasm. The data we have has only sampled mDES during a relatively short period of brain activity which we suspect would make an individual-by-individual analysis underpowered. In the future, however, it may be possible to adopt a precision mapping approach in which we sample mDES during longer periods of movie watching and identify how group-level mappings of experience relate to brain activity within a single subject. To reflect this possibility, we have amended the text in this revision in the following way:

      Page 34-35 [672-687]: In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants' experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future, it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [1]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience

      Reviewer #2:

      (1) The three-dimensional scatter plot in Figure 2 does not represent "Intrusive Distraction." Would it make sense to color-code dots by this important dimension?

      Thank you for this suggestion. Although it could be possible to indicate the location of each film in all four dimensions, we were worried that this would make the already complex 3-D space confusing to a naive reader. In this case, we prefer to provide this information in the form of bar graphs, as we did in the previous submission.

      (2) The coloring of neural activation patterns in Figure 3 is not distinct enough between the different dimensions of thought. Please reconsider color intensities or coding. The same applies to the left panel in Figure 4.

      Thanks for this comment; we found it quite difficult to find a colour mapping that allows us to show the distinction between four states in a simple manner, yet we believe it is valuable to show all of the results on a similar brain. Nonetheless, to provide a more fine-grained viewing of our results in this revision we have provided a supplementary figure (Supplementary Figure 6) that shows each of the observed patterns of activity in isolation.

      (3) The new method (mDES) is mentioned too often without explanation, making it hard to follow without referring to the methods section. It would be helpful to state prominently that participants rated their thoughts on different dimensions instead of verbalizing them.

      Thank you for this point, we have adjusted the Introduction to clarify and expand on the mDES method. We have also included an example of the mDES method in an additional figure that we have now included to visually express how participants respond to mDES probes (Figure 3).

      Page 6-7 [136-148]: In our study, we used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [2]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [3-5] and in daily life [6, 7], and is sensitive to accompanying changes in brain activity when reports are gained during scanning [8, 9]. Studies that use mDES to describe experience ask participants to provide experiential reports by answering a set of questions about different features of their thought on a continuous scale from 1 (Not at all) to 10 (Completely) [3, 5-14]. Each question describes a different feature of experience, such as if their thoughts are oriented in the future or the past, about oneself or other people, deliberate or intrusive in nature, and more (See Methods for a full list of questions used in the current study).

      Author response image 1.

      (4) Reporting of single-movie thought patterns seems quite extensive. Could this be condensed in the main text?

      Thank you for this point, upon re-visiting the manuscript, we have adjusted the text to be more concise.

      Reviewer #3:

      ●      This is a very elegant experiment and seems like a very promising approach. The text is currently hard to read.

      Thank you for this point, we have since revisited the text and adjusted the manuscript to be more concise and add more clarity.

      ●      The introduction (+ analysis goals) fails to explain the basic aspects of the analysis and dataset. It is not clear how many participants and datapoints were used to establish the group-level thought patterns, nor is it entirely clear that the fMRI data is a separate existing dataset. Some terms are introduced and highlighted and never revisited (e.g decoupled states and the role of the DMN).

      Thank you for this critique, we have since adjusted the introduction to clearly explain the difference between Sample 1 and Sample 2 and further clarify that the fMRI data is an entirely separate, independent sample compared to the laboratory mDES sample:

      Page 7-8 [158-174]: Thus, to overcome this obstacle, we developed a novel methodological approach using two independent sample participants. In the current study, one set of 120 participants was probed with mDES five times across the three ten-minute movie clips (11 minutes total, no sampling in the first minute). We used a jittered sampling technique where probes were delivered at different intervals across the film for different people depending on the condition they were assigned. Probe orders were also counterbalanced to minimize the systematic impact of prior and later probes at any given sampling moment. We used these data to construct a precise description of the dynamics of experience for every 15 seconds of three ten-minute movie clips. These data were then combined with fMRI data from a different sample of 44 participants who had already watched these clips without experience sampling [15]. By combining data from two different groups of participants, our method allows us to describe the time series of different experiential states (as defined by mDES) and relate these to the time series of brain activity in another set of participants who watched the same films with no interruptions. In this way, our study set out to explicitly understand how the patterns of thoughts that dominate different moments in a film in one group of participants relate to the brain activity at these time points in a second set of participants and, therefore, better understand the contribution of different neural systems to the movie-watching experience.

      Page 8-9 [177-188] The goal of our study, therefore, was to understand the association between patterns of brain activity over time during movie clips in one group of participants and the patterns of thought that participants reported at the corresponding moment in a different set of participants (see Figure 1). This can be conceptualized as identifying the mapping between two multi-dimensional spaces, one reflecting the time series of brain activity and the other describing the time series of ongoing experience (see Figure 1 right-hand panel). In our study, we selected three 11-minute clips from movies (Citizenfour, Little Miss Sunshine and 500 Days of Summer) for which recordings of brain data in fMRI already existed (n = 44) [15] (Figure 1, Sample 1). A second set of participants (n = 120) viewed the same movie clips, providing intermittent reports on their thought patterns using mDES (Figure 1, Sample 2). Our goal was to understand the mapping between the patterns of brain activity at each moment of the film and the reports of ongoing thought recorded at the same point in the movies.

      ●      It is unclear what the utility of the method is - is it meant to be done in fMRI studies on the same participants? Or is the idea to use one sample to model another?

      Great point, thank you for highlighting this important question. This paper aimed to interrogate the relationship between experience and neural states while preserving the novelty of movie-watching. Although it could be done in the same sample, it may be difficult to collect frequent reports of experience without interrupting the dynamics of the brain. However, in the future it could be possible to collect mDES and brain activity in the same individuals while they watched movies. For example, our prior studies (e.g. [9]) where we combined mDES with openly-available brain data activity during tasks. In the future, this online method could also be applied during movie watching to identify direct mapping between brain activity and films. However, this online approach would make it very expensive to produce the time series of experience across each clip given that it would require a large number of participants (e.g. 200 as we used in our current study). The following has been included in our manuscript:

      Page 7 [149-159] One challenge that arises when attempting to map the dynamics of thought onto brain activity during movie watching is accounting for the inherently disruptive nature of experience sampling: to measure experience with sufficient frequency to map the dynamics of thoughts during movies would disrupt the natural dynamics of the brain and would also alter the viewer’s experience (for example, by pausing the film at a moment of suspense). Therefore, if we periodically interrupt viewers to acquire a description of their thoughts while recording brain activity, this could impact capturing important dynamic features of the brain. On the other hand, if we measured fMRI activity continuously over movie-watching (as is usually the case), we would lack the capacity to directly relate brain signals to the corresponding experiential states. Thus, to overcome this obstacle, we developed a novel methodological approach using two independent sample participants

      ●      The conclusions currently read as somewhat trivial (e.g "Our study, therefore, establishes both sensory and association cortex as core features of the movie-watching experience", "Our study supports the hypothesis that perceptual coupling between the brain and external input is a core feature of how we make sense of events in movies").

      Thank you for this comment. In this revision we have attempted to extend the theoretical significance of our work in the discussion (for example, in contrasting the links between Intrusive distraction and the other components). To this end we have amended the text in this revision by including the following sections:

      Page 33-35 [654-687]: Importantly, our study provides a novel method for answering these questions and others regarding the brain basis of experiences during films that can be applied simply and cost-effectively. As we have shown mDES can be combined with existing brain activity allowing information about both brain activity and experience to be determined at a relatively low cost.  For example, the cost-effective nature of our paradigm makes it an ideal way to explore the relationship between cognition and neural activity during movie-watching during different genres of film. In neuroimaging, conclusions are often made using one film in naturalistic paradigm studies [16]. Although the current study only used three movie clips, restraining our ability to form strong conclusions regarding how different patterns of thought relate to specific genres of film, in the future, it will be possible to map cognition across a more extensive set of movies and discern whether there are specific types of experience that different genres of films engage. One of the major strengths of our approach, therefore, is the ability to map thoughts across groups of participants across a wide range of movies at a relatively low cost.

      Nonetheless, this paradigm is not without limitations. This is the first study, as far as we know, that attempts to compare experiential reports in one sample of participants with brain activity in a second set of participants, and while the utility of this method enables us to understand the relationship between thought and brain activity during movies, it will be important to extend our analysis to mDES data during movie watching while brain activity is recorded. In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [1]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience

      ●      The beginning of the discussion is very clear and explains the study very well. Some of it could be brought up in the intro/analysis goal sections.

      Thank you for this comment, this is an excellent idea. We have revisited the introduction and analysis goals section to mirror this clarity across the manuscript.

      ●      The different components are very interesting, and not entirely clear. Some examples in the text could help. Especially regarding your thought that verbal components would refer to a "decoupled" mental verbal analysis participants might be performing in their thoughts.

      Thank you for this point. We would prefer not to elaborate on this point since, at present, it would simply be conjecture based on our correlational design. However, we have included a section in the discussion which explains how, in principle, we would draw more mechanistic conclusions (for example, by shuffling the order of scenes in a movie as suggested by another reviewer). In the current revision, we have amended the text in the following way:

      Page 34 [674-679]: Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES

      ●      The reference to using neurosynth as performing a meta-analysis seems a little stretched.

      We have adjusted the manuscript to remove ‘meta-analysis’ when referring to the analysis computed with neurosynth. Thank you for bringing this to our attention.

      ●      State-space is defined as brain-space in the methods.

      Thank you, we have since updated this.

      ●      It could be useful to remind the reader what thought and brain spaces are at the top of the state-space results section.

      This is an excellent point, and it has since been updated to remind the reader of thought- and brain-space. Thank you for this comment.

      Page 24 [458-467]: Our next analysis used a “state-space” approach to determine how brain activity at each moment in the film predicted the patterns of thoughts reported at these moments (for prior examples in the domain of tasks, see [12, 17], See Methods). In this analysis, we used the coordinates of the group average of each TR in the “brain-space” and the coordinates of each experience sampling moment in the “thought-space.”. To clarify, the location of a moment in a film in “brain-space” is calculated by projecting the grand mean of brain activity for each volume of each film against the first five dimensions of brain activity from a decomposition of the Human Connectome Project (HCP) resting state data, referred to as Gradients 1-5. “Thought-space” is the decomposition of mDES items to create thought pattern components, referred to as “Episodic Knowledge”, “Intrusive Distraction”, “Verbal Detail” and “Sensory Engagement.”

      ●      DF missing from the t-test for episodic knowledge/grad 4.

      Thank you for catching this, the degrees of freedom has since been included in this revision.

      Page 24 [474-476]: First, we found a significant main effect of Gradient 4 (DAN to Visual), which predicted the similarity of answers to the “Episodic Knowledge” component, t(2046) = 2.17, p = .013, η2 = .01.

      Public Reviews:

      Reviewer #1:

      ●      The lack of direct interrogation of individual differences/reliability of the mDES scores warrants some pause.

      Our study's goal was to understand how group-level patterns of thought in one group of participants relate to brain activity in a different group of participants. To this end, we decomposed trial-level mDES data to show dimensions that are common across individuals, which demonstrated excellent split-half reliability. Then we used these data in two complementary ways. First, we established that these ratings reliably distinguished between the different films (showing that our approach is sensitive to manipulations of semantic and affective features in a film) and that these group-level patterns were also able to predict patterns of brain activity in a different group of participants (suggesting that mDES dimensions are also sensitive to broad differences in how brain activity emerges during movie watching). Second, we established that variation across individuals in their mDES scores predicted their comprehension of information from the films. This establishes that when applied to movie-watching, mDES is sensitive to individual differences in the movie-watching experience (as determined by an individual's comprehension). Given the success of this study and the relative ease with which mDES can be performed, it will be possible in the future to conduct mDES studies that hone in on the common and distinct features of the movie-watching experience.

      Reviewer #2:

      (1) The dimensions of thought seem to distinguish between sensory and executive processing states. However, it is unclear if this effect primarily pertains to thinking. I could imagine highly intrusive distractions in movie segments to correlate with stagnating plot development, little change in scenery, or incomprehensible events. Put differently, it may primarily be the properties of the movies that evoke different processing modes, but these properties are not accounted for. For example, I'm wondering whether a simple measure of engagement with stimulus materials could explain the effects just as much. How can the effects of thinking be distinguished from the perceptual and semantic properties of the movie, as well as attentional effects? Is the measure used here capturing thought processes beyond what other factors could explain?

      Our study used mDES to identify four distinct components of experience, each of which had distinct behavioural and neural correlates and relationships to comprehension. Together this makes it unlikely that a single measure of engagement would be able to capture the range of effects we observed in our study. For example, “Intrusive Distraction” was associated with regions of association cortex, while the other three components highlighted regions of sensory cortex. Behaviorally, we found that some components had a common effect on comprehension (e.g. “Intrusive distraction” was related to worse comprehension across all films), while others were linked to clear benefits to comprehension in specific films (e.g. “Episodic Knowledge” was associated with better comprehension in only one of the films). Given the complex nature of these effects, it would be difficult for a single metric of engagement to explain this pattern of results, and even if it did, this could be misleading because our analysis implies that they are better explained by a model of movie-watching experience in which there are several relatively orthogonal dimensions upon which our experience can vary.

      At the same time, we also found that films vary in the general types of experience they can engender. For example, Citizenfour was high on “Intrusive Distraction” and participants performed relatively low on comprehension. This shows that manipulations of the semantic and affective content of films also have implications for the movie-watching experience. This pattern is consistent with laboratory studies that applied mDES during tasks and found that different tasks evoke different types of experience (for example, patterns of ‘intrusive’ thoughts were common in movie clips that were suspenseful, [18]). At the same time, in the same study, patterns of intrusive thought across the tasks were also associated with trait levels of dysphoria reported by participants. Other studies using mDES in daily life have shown that the data can be described by multiple dimensions and that each of these types of thought is more prevalent in certain activities than others ([19]). For example, in daily life, patterns of ‘intrusive distraction’ thoughts were more prevalent when individuals were engaged in activities that were relatively unengaging (such as resting). Collectively, therefore, studies using mDES suggest that is likely that human thought is multidimensional in nature and that these dimensions vary in a complex way in terms of (a) the contexts that promote them, and (b) how they are impacted by features of the individual (whether they be traits like anxiety or depression or memory for information in a film).

      (2) I'm skeptical about taking human thought ratings at face value. Intrusive distraction might imply disengagement from stimulus materials, but it could also be an intended effect of the movie to trigger higher-level, abstract thinking. Can a label like intrusive distraction be misleading without considering the actual thought and movie content?

      Our method uses a data-driven approach to identify the dimensions that best describe the range of answers that our participants provided to describe their experience. We use these dimensions to understand how these patterns of thought emerge in different contexts and how they vary across individuals (in this case, in different movies, but in other studies, laboratory tasks [3, 8, 9, 12, 20-22] or activities in daily life[6, 7]). These context relationships help constrain interpretations of what the components mean. For example, “Intrusive Distraction” scores were highest in the film with the most real-world significance for the participants (Citizenfour) and were associated with worse comprehension. In daily life, however, patterns of “Intrusive Distraction” thoughts tend to occur when activities engage in non-demanding activities, like resting. Psychological perspectives on thoughts that arise spontaneously occur in this manner since there is evidence that they occur in non-demanding tasks with no semantic content (when there is almost no external stimulus to explain the occurrence of the experience, see [23]), however, other studies have shown that specific cues in the environment can also cue the experience (see [23]). Consistent with this perspective, and our current data, patterns of ‘Intrusive Distraction’ thought are likely to arise for multiple reasons, some of which are more intrinsic in nature (the general association with poor comprehension across all films) and others which are extrinsic in nature (the elevation of intrusive distraction in Citizenfour).

      It is also important to note that our data-driven approach also found patterns of experience that provide more information about the content of their experience, for example, the dimension of “Episodic Knowledge” is characterized by thoughts based on prior knowledge, involving the past, and concerning oneself, and was most prevalent in the romance film (500 Days of Summer). Likewise, “Sensory Engagement” was associated with experiences related to sensory input and positive emotionality and occurred more during the romance movie (500 Days of Summer) than in the documentary (Citizenfour) and was linked to increased brain activity across the sensory systems. This shows that mDES can also provide information about the content of that experience, and discriminate between different sources of experience. In the future, it will be possible to improve the level of detail regarding the content of experiences by changing the questions used to interrogate experience.     

      (3) A jittered sampling approach is used to acquire thought ratings every 15 seconds. Are ratings for the same time point averaged across participants? If so, how consistent are ratings among participants? High consistency would suggest thoughts are mainly stimulus-evoked. Low consistency would question the validity of applying ratings from one (group of) participant(s) to brain-related analyses of another participant.

      In this experiment, we sampled experience every 15 seconds in each clip, and in each sampling epoch, we gained mDES responses from eight participants. Furthermore, no participant was sampled at an adjacent time point, as our approach jittered probes approximately 2 minutes apart (See Supplementary Figure 7). To illustrate the consistency of mDES data, we have included an additional figure (Figure 3) highlighting how experience varies over time in each clip. It is evident from these plots that there are distinct moments in which group-averaged reported thoughts across participants are stable and that these can extend across adjacent sampling points (i.e. when the confidence intervals of the score at a timepoint do not overlap with zero). Therefore, in some cases, adjacent sampling points, consisting of different sets of eight participants, describe their experiences as having similar positions on the same mDES dimension. This suggests that there is agreement among individuals regarding how they experienced a specific moment in a film, and in some cases, this agreement was apparent in successive sets of eight participants. Together, our findings indicate a conservation of agreement across participants that spans multiple moments in a film. A clear example of agreement on experience across multiple sets of 10 participants can be seen between 150-400 seconds in the clip from 500 Days of Summer for the dimension of “Sensory Engagement” (time series plot 4 in Figure 3).

      (4) Using three different movies to conclude that different genres evoke different thought patterns (e.g., line 277) seems like an overinterpretation with only one instance per genre.

      We found that mDES was able to distinguish between each film on at least one dimension of experience. In other words, information encoded in the mDES dimensions was sensitive to variation in semantic and affective experiences in the different movie clips. This provides evidence that is necessary but not sufficient to conclude that we can distinguish different genres of films (i.e. if we could not distinguish between films, then we would not be able to distinguish genres). However, it is correct that to begin answering the broader question about experiences in different genres then it would be necessary to map cognition across a larger set of movies, ideally with multiple examples of each genre.

      (5) I see no indication that results were cross-validated, and no effect sizes are reported, leaving the robustness and strength of effects unknown.

      Thank you for drawing this to our attention. We have re-run the LMMs and ANOVA models to include partial eta-squared values to clarify the strength of the effects in each of our reported outcomes.

      Reviewer #3:

      ●      What are the considerations for treating high-order thought patterns that occur during film viewing as stable enough to be used across participants? What would be the limitations of this method? (Do all people reading this paper think comparable thoughts reading through the sections?)

      It is likely, based on our study, that films can evoke both stereotyped thought patterns (i.e. thoughts that many people will share) and others that are individualistic. It is clear that, in principle, mDES is capable of capturing empirical information on both stereotypical thoughts and idiosyncratic thoughts. For example, clear differences in experiences across films and, in particular, during specific periods within a film, show that movie-watching can evoke broadly similar thought patterns in different groups of participants (see Figure 3 right-hand panel). On the other hand, the association between comprehension and the different mDES components indicate that certain individuals respond to the same film clip in different ways and that these differences are rooted in objective information (i.e. their memory of an event in a film clip). A clear example of these more idiosyncratic features of movie watching experience can be seen in the association between “Episodic Knowledge” and comprehension. We found that “Episodic Knowledge” was generally high in the romance clip from 500 Days of Summer but was especially high for individuals who performed the best, indicating they remembered the most information. Thus good comprehends responded to the 500 Days of Summer clip with responses that had more evidence of “Episodic Knowledge” In the future, since the mDES approach can account for both stereotyped and idiosyncratic features of experience, it will be an important tool in understanding the common and distinct features that movie watching experiences can have, especially given the cost effective manner with which these studies can be run.   

      ●      How does this approach differ from collaborative filtering, (for example as presented in Chang et al., 2021)?

      Our study is very similar to the notion of collaborative filtering since we can use an approach that is similar to crowd-sourcing as a tool for understanding brain activity. One of its strengths is its generalizability since it is also a method that can be used to understand cognition because it is not limited to movie-watching. We can use the same mDES method to sample cognition in multiple situations in daily life ([6, 19]), while performing tasks in the behavioural lab [18, 24], and while brain activity is being acquired [8, 25, 26]. In principle, therefore, we can use mDES to understand cognition in different contexts in a common analytic space (see [27] for an example of how this could work)

      Page 5 [106-110]: In our study, we acquired experiential data in one group of participants while watching a movie clip and used these data to understand brain activity recorded in a second set of participants who watched the same clip and for whom no experiential data was recorded. This approach is similar to what is known as “collaborative filtering” [28].

      ●      In conclusion, this study tackles a highly interesting subject and does it creatively and expertly. It fails to discuss and establish the utility and appropriateness of its proposed method.

      Thank you very much for your feedback and critique. In our revision and our responses to these questions, we provided more information about the method's robustness utility and application to understanding cognition.

      References

      (1) Gordon, E.M., et al., Precision Functional Mapping of Individual Human Brains. Neuron, 2017. 95(4): p. 791-807.e7.

      (2) Smallwood, J., et al., The neural correlates of ongoing conscious thought. Iscience, 2021. 24(3).

      (3) Konu, D., et al., Exploring patterns of ongoing thought under naturalistic and conventional task-based conditions. Consciousness and Cognition, 2021. 93.

      (4) Smallwood, J., et al., The default mode network in cognition: a topographical perspective. Nature Reviews Neuroscience, 2021. 22(8): p. 503-513.

      (5) Turnbull, A., et al., Age-related changes in ongoing thought relate to external context and individual cognition. Consciousness and Cognition, 2021. 96: p. 103226.

      (6) McKeown, B., et al., The impact of social isolation and changes in work patterns on ongoing thought during the first COVID-19 lockdown in the United Kingdom. Proceedings of the National Academy of Sciences, 2021. 118(40): p. e2102565118.

      (7) Mulholland, B., et al., Patterns of ongoing thought in the real world. Consciousness and Cognition, 2023. 114: p. 103530.

      (8) Konu, D., et al., A role for the ventromedial prefrontal cortex in self-generated episodic social cognition. NeuroImage, 2020. 218: p. 116977.

      (9) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature Communications, 2019. 10.

      (10) Ho, N.S.P., et al., Facing up to the wandering mind: Patterns of off-task laboratory thought are associated with stronger neural recruitment of right fusiform cortex while processing facial stimuli. NeuroImage, 2020. 214: p. 116765.

      (11) Karapanagiotidis, T., et al., Tracking thoughts: Exploring the neural architecture of mental time travel during mind-wandering. NeuroImage, 2017. 147: p. 272-281.

      (12) McKeown, B., et al., Experience sampling reveals the role that covert goal states play in task-relevant behavior. Scientific Reports, 2023. 13(1): p. 21710.

      (13) Vatansever, D., et al., Distinct patterns of thought mediate the link between brain functional connectomes and well-being. Network Neuroscience, 2020. 4(3): p. 637-657.

      (14) Wang, H.-T., et al., Dimensions of Experience: Exploring the Heterogeneity of the Wandering Mind. Psychological Science, 2017. 29(1): p. 56-71.

      (15) Aliko, S., et al., A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Scientific Data, 2020. 7(1).

      (16) Yang, E., et al., The default network dominates neural responses to evolving movie stories. Nature Communications, 2023. 14(1): p. 4197.

      (17) Turnbull, A., et al., Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Scientific Reports, 2020. 10(1): p. 9912.

      (18) Konu, D., et al., Exploring patterns of ongoing thought under naturalistic and conventional task-based conditions. Consciousness and cognition, 2021. 93: p. 103139.

      (19) Mulholland, B., et al., Patterns of ongoing thought in the real world. Consciousness and cognition, 2023. 114: p. 103530.

      (20) Christoff, K., et al., Experience sampling during fMRI reveals default network and executive system contributions to mind wandering. Proc Natl Acad Sci U S A, 2009. 106(21): p. 8719-24.

      (21) Zhang, M., et al., Perceptual coupling and decoupling of the default mode network during mind-wandering and reading. eLife, 2022. 11: p. e74011.

      (22) Zhang, M.C., et al., Distinct individual differences in default mode network connectivity relate to off-task thought and text memory during reading. Scientific Reports, 2019. 9.

      (23) Smallwood, J. and J.W. Schooler, The science of mind wandering: Empirically navigating the stream of consciousness. Annual review of psychology, 2015. 66(1): p. 487-518.

      (24) Turnbull, A., et al., The ebb and flow of attention: Between-subject variation in intrinsic connectivity and cognition associated with the dynamics of ongoing experience. Neuroimage, 2019. 185: p. 286-299.

      (25) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature communications, 2019. 10(1): p. 3816.

      (26) Mckeown, B., et al., Experience sampling reveals the role that covert goal states play in task-relevant behavior. Scientific reports, 2023. 13(1): p. 21710.

      (27) Chitiz, L., et al., Mapping cognition across lab and daily life using experience-sampling. 2023.

      (28) Chang, L.J., et al., Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Science Advances, 2021. 7(17): p. eabf7129.

    2. eLife Assessment

      This study presents a valuable methodological advancement in quantifying thoughts over time. A novel multi-dimensional experience-sampling approach is presented, identifying data-driven patterns that the authors use to interrogate fMRI data collected during naturalistic movie-watching. The experimentation is inventive and the analyses carried out are convincing.

    3. Reviewer #1 (Public review):

      Summary:

      The authors used a novel multi-dimensional experience sampling (mDES) approach to identify data-driven patterns of experience samples that they use to interrogate fMRI data collected during naturalistic movie-watching data. They identify a set of multi-sensory features of a set of movies that delineate low-dimensional gradients of BOLD fMRI signal patterns that have previously been linked to fundamental axes of cortical organization.

      Strengths:

      * The novel solution to challenges associated with experience sampling offer potential access to aspects of experience that have been challenging to assess.

      Weaknesses:

      * The lack of direct interrogation of individual differences/reliability of the mDES scores warrants some pause.

    4. Reviewer #2 (Public review):

      Summary:

      The present study explores how thoughts map onto brain activity, a notoriously challenging question because of the dynamic, subjective, and abstract nature of thoughts. To tackle this question, the authors collected continuous thought ratings from participants watching a movie, and additionally made use of an open-source fMRI dataset recorded during movie watching as well as five established gradients of brain variation as identified in resting state data. Using a voxel-space approach, the results show that episodic knowledge, verbal detail, and sensory engagement of thoughts commonly modulate visual and auditory cortex, while intrusive distraction modulates the frontoparietal network. Additionally, sensory engagement mapped onto a gradient from primary to association cortex, while episodic knowledge mapped onto a gradient from the dorsal attention network to visual cortex. Building on the association between behavioral performance and neural activation, the authors conclude that sensory coupling to external input and frontoparietal executive control are key to comprehension in naturalistic settings.

      The manuscript stands out for its methodological advancements in quantifying thoughts over time and its aim to study the implementation of thoughts in the brain during naturalistic movie watching. However, the conceptualization of thoughts remains vague, limiting the study's insights into brain function.

      Strengths:

      (1) The study raises a question that has been difficult to study in naturalistic settings so far but is key to understanding human cognition, namely how thoughts map onto brain activation.<br /> (2) The thought ratings introduce a novel method for continuously tracking thoughts, promising utility beyond this study.<br /> (3) The authors used diverse data types, metrics, and analyses to substantiate the effects of thinking from multiple perspectives.

      Weaknesses:

      (1) The distinction between thinking and stimulus processing (in the sense of detecting and assigning meaning to features, modulated by factors such as attention) remains unclear. Is "thinking" a form of conscious access or a reportable read-out from sensory and higher-level stimulus processing? Or does it simply refer to the method used here to identify different processing states?<br /> (2) The dimensions of thought appear to be directly linked to brain areas traditionally associated with core faculties of perception and cognition. For example, superior temporal cortex codes for speech information, which is also where thought reports on verbal detail localize in this study. This raises the question of whether the present study truly captures mechanisms specific to thinking and distinct from processing, especially given that individual variations in reports were not considered and movie-specific features were not controlled for.

    5. Reviewer #3 (Public review):

      This study attempted to investigate the relations between processing in the human brain during movie watching and corresponding thought processes. This is a highly interesting question, as movie watching presents a semi-constrained task, combining naturally occurring thoughts and common processing of sensory inputs across participants. This task is inherently difficult because in order to know what participants are thinking at any given moment, one has to interrupt the same thought process which is the object of study.

      This study attempts to deal with this issue by aggregating staggered experience sampling data across participants in one behavioral study and using the population level thought patterns to model brain activity in different participants in an open access fMRI dataset.

      The behavioral data consist of 120 participants who watched 3 11-minute movie clips. Participants responded to the mDES questionnaire: 16 visual scales characterizing ongoing thought 5 times, two minutes apart, in each clip. The 16 items are first reduced to 4 factors using PCA, and their levels are compared across the different movies. The factors are "episodic knowledge", "intrusive distraction", "verbal detail", and "sensory engagement". The factors differ between the clips, and distraction is negatively correlated with movie comprehension and sensory engagement is positively correlated with comprehension.

      The components are aggregated across participants (transforming single subject mDES answers into PCA space and concatenating responses of different participants) and are used as regressors in a GLM analysis. This analysis identifies brain regions corresponding to the components. The resulting brain maps reveal activations that are consistent with the proposed mental processes (e.g. negative loading for intrusion in frontoparietal network, positive loadings for visual and auditory cortices for sensory engagement).

      Then, the coordinates for brain regions which were significant for more than one component are entered into a paper search in neurosynth. It is not clear what this analysis demonstrates beyond the fact that sensory engagement contained both visual and auditory components.

      The next analysis projected group-averaged brain activation onto gradients (based on previous work) and used gradient timecourses to predict the behavioral report timecourses. This revealed that high activations in gradient 1 (sensory→association) predicted high sensory engagement, and that "episodic knowledge" thought patterns were predicted by increased visual cortex activations. Then, permutation tests were performed to see whether these thought pattern related activations corresponded to well defined regions on a given cluster.

      This paper is framed as presenting a new paradigm but it does little to discuss what this paradigm serves, what are its limitations and how it should have been tested. The novelty appears to be in using experience sampling from 1 sample to model the responses of a second sample.

      What are the considerations for treating high-order thought patterns that occur during film viewing as stable enough to use across participants? What would be the limitations of this method? (Do all people reading this paper think comparable thoughts reading through the sections?) This is briefly discussed in the revised manuscript and generally treated as an opportunity rather than as a limitation.

      In conclusion, this study tackles a highly interesting subject and does it creatively and expertly. It fails to discuss and establish the utility and appropriateness of its proposed method.

    1. eLife Assessment

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past but lacked in vivo support. Despite its elegant design and many strengths, the evidence supporting the claims of the authors is incomplete, particularly regarding whether Caspase-3 expression can really be isolated to synapses vs locally dying cells, whether microglia direct or instruct synapse elimination, and whether astrocytes are also involved. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4).

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination.

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases.

      Strengths:

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration.

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes.

      Weaknesses:

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN?

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past but lacked in vivo support. Despite its elegant design and many strengths, the evidence supporting the claims of the authors is incomplete, particularly regarding whether Caspase-3 expression can really be isolated to synapses vs locally dying cells, whether microglia direct or instruct synapse elimination, and whether astrocytes are also involved. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

      Regarding significance:

      This study provides in vivo evidence that caspase-3 is important for synapse elimination in the visual pathway (Figure 3 and 4) and corroborates the previously proposed but not yet validated “synaptosis” hypothesis. But more significantly, we show that caspase-3 is activated in dLGN relay neurons in response to synapse inactivation (Figure 1) when synaptic competition is present (Figure 2), and that caspase-3 is important for efficient elimination of weakened synapses by microglia (Figure 5 and 6). We consider the causal link between synapse weakening/inactivation and caspase-3 activation to be the most important finding of this study and believe it is an error to not include this aspect of the study in the assessment. The mechanism by which neuronal activity influences synapse elimination is a fundamental question in neuroscience, and our study presents a significant advancement in understanding this problem.

      Regarding strength of evidence:

      We do not agree with the assessment that our evidence should be broadly labeled as “incomplete”. In fact, we argue that many concerns raised by the reviewers are not focused on the main claims made in this study.

      (1) Regarding whether caspase-3 activation (not “expression”, which is the term used in the assessment) is isolated to synapses or occurs in entire cells, we show in Figure 1 that both types of signals can be present. The main concern of the reviewers seems to be that activated caspase-3 signals in apoptotic dLGN relay neurons are irrelevant to our analysis and confound interpretation. We argue that this is not the case.

      In Figure 1, we have two sets of controls demonstrating that the observed apoptosis of dLGN relay neurons occurs specifically in response to synapse inactivation. For each animal that received TeTxLC injection in the right eye, activated caspase-3 signal is compared between the left dLGN, where most of the inactivated synapses are located, and the right dLGN, where the minority of the inactivated synapses are located (between Figure 1B and 1C, also between the first and second group of Figure 1E). We observed apoptotic neurons in the right dLGN with more inactivated synapses but not in the left dLGN with fewer inactivated synapses. The second control is between TeTxLC-injected animals (Figure 1B) and mock-injected animals (Figure 1D). We observed apoptotic relay neurons in the dLGN of TeTxLC-injected animals (Figure 1B) but not mock-injected animals (Figure 1D). Both these controls show that the observed apoptosis of dLGN relay neurons is caused by synapse inactivation.

      In addition, in our synapse inactivation experiment (Figure 1), AAV-hSyn-TeTxLC is injected into the right eye and expressed only in RGCs, not in dLGN relay neurons. Since dLGN relay neurons in this experiment do not receive a perturbation that is independent of synaptic transmission, we conclude that their apoptosis occurs through synapse-dependent mechanisms.

      Furthermore, if the apoptotic neurons are confounding the analysis (as implied by reviewers and editors) and do not occur through synapse-dependent mechanisms, then inhibiting both eyes with TeTxLC (Figure 2C, rightmost group) should cause high levels of caspase-3 activation, like that in the single-inhibition condition. Instead, we observe the opposite (Figure 2C, middle group) – overall caspase-3 activity goes down significantly in the dual-inhibition condition and is closer to the unperturbed condition, which can be explained by a loss of interaction between “strong” and “weak” synapses. Taken together, our data demonstrate that apoptosis of relay neurons in Figure 1 occurs specifically in response to synapse inactivation through synapse-dependent mechanisms, and the activated caspase-3 signal in the neurons should be included in our analysis.

      Why does synaptic caspase-3 activation manifest in different forms: puncta, “blobs”, and cells?  This is not surprising when considering the mechanisms that neurons must utilize to spatially confine caspase-3 activation and the nature of the apoptotic signaling cascade. On one hand, it has been proposed that caspase-3 activity in dendrites can be locally confined by proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ). On the other hand, caspase-3 activation is known to trigger explosive feedback amplification of apoptotic signaling events (McComb et al., DOI: 10.1126/sciadv.aau9433 ). For caspase-3 activation to remain localized to dendrites, the negative regulation must outweigh the positive feedback amplification. By expressing TeTxLC in RGCs of one eye, we create a strong perturbation that silences a large fraction of the synapses in the retinogeniculate pathway, which likely shifts the balance between positive and negative regulation of caspase-3 activity in some relay neurons. To be more specific, if a given dLGN relay neuron receives too many inactivated synapses, which is likely the case in our perturbation, caspase-3 activity that is initially localized can overwhelm the physiological negative regulation mechanisms that act to spatially confine it, resulting in whole cell apoptosis. In fact, previous in vitro evidence (Enturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ) demonstrated that, while caspase-3 activation in a single distal dendrite can be locally contained, activating apoptosis signaling in dendrites proximal to the cell body can result in whole-cell apoptosis. Similarly, a few inactivated retinogeniculate synapses can elicit locally contained caspase-3 activity in dLGN relay neurons, but a large number of inactivated synapses on a single relay neuron may trigger sufficient caspase-3 activity that can lead to whole-cell apoptosis. We discussed how to interpret synapse inactivation-induced apoptosis in dLGN relay neurons both in the main text and in the discussion (line 123-132, and line 411-421).

      (2) Regarding microglia, we did not claim that “microglia direct or instruct synapse elimination”. Our main claim is that caspase-3 activation is important for efficient elimination of weakened synapses by microglia. This claim emphasizes a regulatory role for caspase-3 activation in microglia-mediated synapse elimination, but not a regulatory role of microglia in synapse elimination. To be more specific, our data suggest that lack of synaptic activity induces caspase-3 activity, and caspase-3 activity in turn influences which synapses are preferentially eliminated by microglia. Therefore, the elimination specificity is fundamentally determined (i.e. instructed) by neuronal activity, not by microglia. We also did not presume the manner in which microglia engage in synapse elimination. We specifically address this point in the discussion at line 458 through 465 where we acknowledge that microglia may indirectly mediate synapse elimination by engulfing shed neuronal material. In our title and text, we use the phrase “microglia-mediated synapse elimination”, which is not the same as microglia-instructed synapse elimination and does not presume any instructive/directive role of microglia.

      (3) Regarding whether astrocytes are involved, we did not challenge the notion that astrocytes play important roles in synapse elimination. Rather, our claim is that, unlike what we observed with microglia, the amount of synaptic material engulfed by astrocytes does not robustly depend on whether caspase-3 is present. We acknowledge that there might be a caspase-3 dependent phenotype that we were unable to detect (line 309-310), and that it is plausible that astrocytes mediate activity-dependent synapse elimination through other caspase-3-independent mechanisms. This claim is not central to our study, and we would like to qualify the statements in the manuscript. We will remove the phrase “but not astrocytes” in line 18 of the abstract.

      In summary, using a state-of-the-art method to inactivate retinogeniculate synapses, we discovered a causal link between synapse weakening/inactivation and caspase-3 activation. Coupled with well-established in vivo assays (e.g., segregation analysis, electrophysiology, and engulfment analysis) that are used in many landmark studies we cite, we provide solid evidence supporting our claim that “caspase-3 is essential for synapse elimination driven by both spontaneous and experience-dependent neural activity”, and that “synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia”.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4). 

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. This is not accurate. We show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. This is not accurate. The apoptotic neurons we observed are relay neurons located in the dLGN (confirmed by their morphology and positive staining of NeuN – Figure S4B-C), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that the active caspase-3 signals in apoptotic dLGN relay neurons are not a confounding factor but a bona fide response to synaptic silencing and therefore should be included in the quantification. We have two sets of controls (please also see the general response above), one is between the strongly inactivated dLGN and the weakly inactivated dLGN in each TeTxLC-injected animal, second is between dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGN receiving strong synapse inactivation has these apoptotic dLGN relay neurons, demonstrating that these cells occur as a consequence of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. As mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting a synapse-related mechanism must be responsible. Considering the above, apoptosis of relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation, and active caspase-3 signals in these neurons are true signals that should be included in the quantification.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination. 

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to preferentially eliminate weak synapses.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and this caspase-3 activity in turn determines the substrate preference of microglia-mediated synapse elimination. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Throughout the manuscript, we used the term “microglia-mediated synapse elimination”. This terminology does not assume a directive/instructive role of microglia in synapse elimination and only describes the observed engulfment of synaptic material by microglia. We also did not assume how microglia engage in synapse elimination. We acknowledge in the discussion (line 458 through 465) that microglia may mediate synapse elimination in an indirect, passive way by engulfing shed neuronal material. This topic is a matter of debate in the field (Eyo et al., DOI: 10.1126/science.adh7906 ).

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper. 

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases. 

      Strengths: 

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration. 

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes. 

      Weaknesses: 

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      The experiments presented in Figure S11 aim to determine whether astrocyte-mediated synapse elimination depends on caspase- 3 signaling.  We do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We did observe a small decrease in synaptic material engulfed by astrocytes when caspase-3 is deficient, and we acknowledged that there could be defects that we were not able to detect (line 309-310). The claim that caspase-3 does not regulate astrocyte-mediated synapse elimination is not a central claim of the manuscript and we will qualify our statements in the text. We will remove the phrase “but not astrocytes” in the abstract (line 18).

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN? 

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases microglia-mediated engulfment of presynaptic terminals of inactivated synapses (Figure 6). We did not measure microglia-mediated engulfment of synaptic material while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in astrocyte-mediated engulfment.

    1. eLife Assessment

      This valuable paper provides an unbiased landscape for the cerebellar cortical outputs to the brainstem nuclei. By conducting anatomical and physiological analyses of the axonal terminals of Purkinje cells, the data provide convincing evidence that Purkinje cells innervate brainstem nuclei directly. The results show that in addition to previously known inputs to vestibular and parabrachial nuclei, Purkinje cells synapse onto the pontine central grey nucleus but have little effect on the locus coeruleus and mesencephalic trigeminal neurons.

    2. Reviewer #1 (Public review):

      Summary:

      This paper is an incremental follow-up to the authors' recent paper which showed that Purkinje cells make inhibitory synapses onto brainstem neurons in the parabrachial nucleus which project directly to the forebrain. In that precedent paper, the authors used a mouse line that expresses the presynaptic marker synaptophysin in Purkinje cells to identify Purkinje cell terminals in the brainstem and they observed labeled puncta not only in the vestibular and parabrachial nuclei, as expected, but also in neighboring dorsal brainstem nuclei, prominently the central pontine grey. The present study, motivated by the lack of thorough characterization of PC projections to the brainstem, uses the same mouse line to anatomically map the density and a PC-specific channelrhodopsin mouse line to electrophysiologically assess the strength of Purkinje cell synapses in dorsal brainstem nuclei. The main findings are (1) the density of Purkinje cell synapses is highest in vestibular and parabrachial nuclei and correlates with the magnitude of evoked inhibitory synaptic currents, and (2) Purkinje cells also synapse in the central pontine grey nucleus but not in the locus coeruleus or mesencephalic nucleus.

      Strengths:

      The complementary use of anatomical and electrophysiological methods to survey the distribution and efficacy of Purkinje cell synapses on brainstem neurons in mouse lines that express markers and light-sensitive opsins specifically in Purkinje cells is the major strength of this study. By systematically mapping presynaptic terminals and light-evoked inhibitory postsynaptic currents in the dorsal brainstem, the authors provide convincing evidence that Purkinje cells do synapse directly onto pontine central grey and nearby neurons but do not synapse onto trigeminal motor or locus coeruleus neurons. Their results also confirm previously documented heterogeneity of Purkinje cell inputs to the vestibular nucleus and parabrachial neurons.

      Weaknesses:

      Although the study provides strong evidence that Purkinje cells do not make extensive synapses onto LC neurons, which is a helpful caveat given previous reports to the contrary, it falls short of providing the comprehensive characterization of Purkinje cell brainstem synapses which seemed to be the primary motivation of the study. The main information provided is a regional assessment of PC density and efficacy, which seems of limited utility given that we are not informed about the different sources of PC inputs, variations in the sizes of PC terminals, the subcellular location of synaptic terminals, or the anatomical and physiological heterogeneity of postsynaptic cell types. The title of this paper would be more accurate if "characterization" were replaced by "survey".

      Several of the study's conclusions are quite general and have already been made for vestibular nuclei, including the suggestions in the Abstract, Results, and Discussion that PCs selectively influence brainstem subregions and that PCs target cell types with specific behavioral roles.

    3. Reviewer #2 (Public review):

      Summary:

      While it is often assumed that the cerebellar cortex connects, via its sole output neuron, the Purkinje cell, exclusively to the cerebellar nuclei, axonal projections of the Purkinje cells to dorsal brainstem regions have been well documented. This paper provides comprehensive mapping and quantification of such extracerebellar projections of the Purkinje cells, most of which are confirmed with electrophysiology in slice preparation. A notable methodological strength of this work is the use of highly Purkinje cell-specific transgenic strategies, enabling selective and unbiased visualization of Purkinje terminals in the brainstem. By utilizing these selective mouse lines, the study offers compelling evidence challenging the general assumption that Purkinje cell targets are limited to the cerebellar nuclei. While the individual connections presented are not entirely novel, this paper provides a thorough and unambiguous demonstration of their collective significance. Regarding another major claim of this paper, "characterization of direct Purkinje cell outputs (Title)", however, the depth of electrophysiological analysis is limited to the presence/absence of physiological Purkinje input to postsynaptic brainstem neurons whose known cell types are mostly blinded. Overall, conceptual advance is largely limited to confirmatory or incremental, although it would be useful for the field to have the comprehensive landscape presented.

      Strengths:

      (1) Unsupervised comprehensive mapping and quantification of the Purkinje terminals in the dorsal brainstem are enabled, for the first time, by using the current state-of-the-art mouse lines, BAC-Pcp2-Cre and synaptophysin-tdTomato reporter (Ai34).

      (2) Combinatorial quantification with vGAT puncta and synaptophysin-tdTomato labeled Purkinje terminals clarifies the anatomical significance of the Purkinje terminals as an inhibitory source in each dorsal brainstem region.

      (3) Electrophysiological confirmation of the presence of physiological Purkinje synaptic input to 7 out of 9 dorsal brainstem regions identified.

      (4) Pan-Purkinje ChR2 reporter provides solid electrophysiological evidence to help understand the possible influence of the Purkinje cells onto LC.

      Weaknesses:

      (1) The present paper is largely confirmatory of what is presented in a previous paper published by the author's group (Chen et al., 2023, Nat Neurosci). In this preceding paper, the author's group used AAV1-mediated anterograde transsynaptic strategy to identify postsynaptic neurons of the Purkinje cells. The experiments performed in the present paper are, by nature, complementary to the AAV1 tracing which can also infect retrogradely and thus is not able to demonstrate the direction of synaptic connections between reciprocally connected regions. Anatomical findings are all consistent with the preceding paper. The likely absence of robust physiological connections from the Purkinje to LC has also been evidenced in the preceding paper by examining c-Fos response to Purkinje terminal photoinhibition at the PBN/LC region.

      (2) Although the authors appear to assume uniform cell type and postsynaptic response in each of the dorsal brainstem nuclei (as noted in the Discussion, "PCs likely function similarly to their inputs to the cerebellar nuclei, where a very brief pause in firing can lead to large and rapid elevations in target cell firing"), we know that the responses to the Purkinje cell input are cell type dependent, which vary in neurotransmitter, output targets, somata size, and distribution, in the cerebellar and vestibular nuclei (Shin et al., 2011, J Neurosci; Najac and Raman, 2015, J Neurosci; Özcan et al., 2020, J Neurosci). This consideration impacts the interpretation of two key findings: (a) "Large ... PC-IPSCs are preferentially observed in subregions with the highest densities of PC synapses (Abstract)". For example, we know that the terminal sparse regions reported in the present paper do contain Floccular Targeted Neurons that are sparse yet have dense somatic terminals with profound postinhibitory rebound (Shin et al.). Despite their sparsity, these postsynaptic neurons play a distinct and critical role in proper vestibuloocular reflex. Therefore, associating broad synaptic density with "PC preferential" targets, as written in the Abstract, may not fully capture the behavioral significance of Purkinje extracerebellar projections. (b) "We conclude ... only a small fraction of cell. This suggests that PCs target cell types with specific behavioral roles (Abstract, the last sentence)". Prior research has already established that "PCs target cell types with specific behavioral roles in brainstem regions". Also, whether 23 % (for PCG), for example, is "a small fraction" would be subjective: it might represent a numerically small but functionally important cell type population. The physiological characterization provided in the present cell type-blind analysis could, from a functional perspective, even be decremental when compared to existing cell type-specific analyses of the Purkinje cell inputs in the literature.

      (3) The quantification analyses used to draw conclusions about<br /> (a) the significance of PC terminals among all GABAergic terminals and<br /> (b) the fractions of electrophysiologically responsive postsynaptic brainstem neurons may have potential sampling considerations:.<br /> (b.i) this study appears to have selected subregions from each brainstem nucleus for quantification (Figure 2). However, the criteria for selecting these subregions are not explicitly detailed, which could affect the interpretation of the results.<br /> (b.ii) the mapping of recorded cells (Figure 3) seems to show a higher concentration in terminal-rich regions of the vestibular nuclei.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen and colleagues explores the connections from cerebellar Purkinje cells to various brainstem nuclei. They combine two methods - presynaptic puncta labeling as putative presynaptic markers, and optogenetics, to test the anatomical projections and functional connectivity from Purkinje cells onto a variety of brainstem nuclei. Overall, their study provides an atlas of sorts of Purkinje cell connectivity to the brainstem, which includes a critical analysis of some of their own data from another publication. Overall, the value of this work is to both provide neural substrates by which Purkinje cells may influence the brainstem and subsequent brain regions independent of the deep cerebellar nuclei and also, to provide a critical analysis of viral-based methods to explore neuronal connectivity.

      Strengths:

      The strengths lie in the simplicity of the study, the number of cells patched, and the relationship between the presence of putative presynaptic puncta and electrophysiological results. This type of study is important and should provide a foundation for future work exploring cerebellar inputs and outputs. Overall, I think that the critique of viral-based methods to define connectivity, and a more holistic assessment of what connectivity is and how it should be defined is timely and warranted, as I think this is under-appreciated by many groups and overall, there is a good deal of research being published that do not properly consider the issues that this manuscript raises about what viral-based connectivity maps do and do not tell us.

      Weaknesses:

      While I overall liked the manuscript, I do have a few concerns that relate to interpretation of results, and discussion of technological limitations. The main concerns I have relate to the techniques that the authors use, and an insufficient discussion of their limitations. The authors use a Cre-dependent mouse line that expresses a synaptophysin-tomato marker, which the authors confidently state is a marker of synapses. This is misleading. Synaptophysin is a vesicle marker, and as such, labels axons, where vesicles are present in transit, and likely cell bodies where the protein is being produced. As such, the presence of tdtomato should not be interpreted definitively as the presence of a synapse. The use of vGAT as a marker, while this helps to constrain the selection of putative pre-synaptic sites, is also a vesicle marker and will likely suffer the same limitations (though in this case, the expression is endogenous and not driven by the ROSA locus). A more conservative interpretation of the data would be that the authors are assessing putative pre-synaptic sites with their analysis. This interpretation is wholly consistent with their findings showing the presence of tdtomato in some regions but only sparse connectivity - this would be expected in the event that axons are passing through. If the authors wish to strongly assert that they are specifically assessing synapses, a marker better restricted to synapses and not vesicles may be more appropriate.

      Similarly, while optogenetics/slice electrophysiology remains the state of the art for assessing connectivity between cell populations, it is not without limitations. For example, connections that are not contained within the thickness of the slice (here, 200 um, which is not particularly thick for slice ephys preps) will not be detected. As such, the absence of connections is harder to interpret than the presence of connections. Slices were only made in the coronal plane, which means that if there is a particular topology to certain connections that is orthogonal to that plane, those connections may be under-represented. As such, all connectivity analyses likely are under-representations of the actual connectivity that exists in the intact brain. Therefore, perhaps the authors should consider revising their assessments of connections, or lack thereof, of Purkinje cells to e.g., LC cells. While their data do make a compelling case that the connections between Purkinje cells and LC cells are not particularly strong or numerous, especially compared to other nearby brainstem nuclei, their analyses do indicate that at least some such connections do exist. Thus, rather than saying that the viral methods such as rabies virus are not accurate reflections of connectivity - perhaps a more circumspect argument would be that the quantitative connectivity maps reported by other groups using rabies virus do not always reflect connectivity defined by other means e.g., functional connections with optogenetics. In some cases, the authors do suggest this (e.g."Together, these findings indicate that reliance on anatomical tracing experiments alone is insufficient to establish the presence and importance of a synaptic connection"), but in other cases, they are more dismissive of viral tracing results (e.g. "it further suggests that these neurons project to the cerebellum and were not retrogradely labeled"). Furthermore, some statements are a bit misleading e.g., mentioning that rabies methods are critically dependent on starter cell identity immediately following the citation of studies mapping inputs onto LC cells. While in general, this claim has merit, the studies cited (19-21) use Dbh-Cre to define LC-NE cells which does have good fidelity to the cells of interest in the LC. Therefore, rewording this section in order to raise these issues generally without proximity to the citations in the previous sentence may maintain the authors' intention without suggesting that perhaps the rabies studies from LC-NE cells that identified inputs from Purkinje cells were inaccurate due to poor fidelity of the Cre line. Overall, this manuscript would certainly not be the first report indicating that the rabies virus does not provide a quantitative map of input connections. In my opinion, this is still under-appreciated by the broad community and should be explicitly discussed. Thus, an acknowledgment of previous literature on this topic and how their work contributes to that argument is warranted.

    1. eLife Assessment

      This important study reports that the neurohormone, bursicon, and its receptor, play a role in the seasonal polyphenism of the bug Cacopsylla chinensis. Low temperature activates the bursicon signaling pathway during the transition from the summer to the winter form, affecting cuticle pigment and thickness as well as chitin content. The solid experiments reveal how bursicon signaling, which is modulated by the microRNA miR-6012, regulates features of polyphenism related to the exoskeleton, although it is less clear what the upstream regulatory events are.

    2. Reviewer #1 (Public Review):

      Bursicon is a key hormone regulating cuticle tanning in insects. While the molecular mechanisms of its function are rather well studied--especially in the model insect Drosophila melanogaster, its effects and functions in different tissues are less well understood. Here, the authors show that bursicon and its receptor play a role in regulating aspects of the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment activated the bursicon signaling pathway during the transition from summer form to winter form and affect cuticle pigment and chitin content, and cuticle thickness. In addition, the authors show that miR-6012 targets the bursicon receptor, CcBurs-R, thereby modulating the function of bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of the roles of neuropeptide bursicon action in arthropod biology.

      Reviewer comments on revised version

      (a) Major concerns<br /> (1) The revision did not respond to the major concern regarding the threshold response that defines polyphenism. Therefore, it still falls short of the claims made, since the claims were not revised either. Specifically, the authors now include a time series of tanning at two different temperatures, demonstrating the time points at which the induced tanning proceeds (Fig. S1). However, the appropriate response to that comment would have temperatures on the x-axis, not time. Intermediate temperatures are needed to test whether the induction is a threshold response or simply a continuous norm of reaction.<br /> (2) The authors also did not respond to the major comment regarding environmental induction of miR-6012 expression. Rather, Fig. 5E shows a time series under two temperatures, similar to the tanning time series. To test whether its induction is a threshold response (again, what defines polyphenism), a series of induction conditions is needed. Fig. 5E simply shows changes in expression over time under one induction temperature (25 ºC).<br /> (3) Although the manuscript title has been changed, little to nothing else in the revised text addresses the concern that this study is about tanning in psyllids, not seasonal polyphenism. The other traits making up the polyphenism, as well as their threshold response, were not measured.

      In summary, this revision failed to address most of the chief concerns of the review summary. This manuscript should be reframed as a study of tanning in a species other than Drosophila, and any claims about polyphenism (that is, an environmentally induced threshold trait) still need to be tested.

      Regarding the other concerns raised by the reviewers:

      (4) Issues related to the assignment of the receptor used as a bursicon receptor were satisfactorily addressed.<br /> (5) Experiments regarding the timing of cuticle production presented in Supplementary Figure 1 are valuable, albeit, there are still some inaccuracies: i) the layering of the cuticle is not given accurately as there are more than the 3 layers indicated in the manuscript; ii), the reduced endocuticle in all relevant dsRNA cases suggests a massive molting defect that may underline the involvement of bursicon in molting in general, potentially masking its effect on morph transition. In other words, the phenotype is too strong to allow for the interpretation of its function with respect to morph transition. It would have been necessary to apply different concentrations of dsRNA in order to address this point. iii) The developmental timing at 10oC vs. 25oC seem to be similar, although duration would be expected to be longer at 10oC; iv) It would have been nice to see the days of development also for dsRNA injected animals.<br /> (6) Another unresolved point regards the source and target tissue of bursicon signaling. Admittedly, this problem is difficult to solve in a small insect species.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      Bursicon is a key hormone regulating cuticle tanning in insects. While the molecular mechanisms of its function are rather well studied--especially in the model insect Drosophila melanogaster, its effects and functions in different tissues are less well understood. Here, the authors show that bursicon and its receptor play a role in regulating aspects of the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment activated the bursicon signaling pathway during the transition from summer form to winter form and affect cuticle pigment and chitin content, and cuticle thickness. In addition, the authors show that miR-6012 targets the bursicon receptor, CcBurs-R, thereby modulating the function of bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of the roles of neuropeptide bursicon action in arthropod biology.

      However, the study falls short of its claim that it reveals the molecular mechanisms of a seasonal polyphenism. While cuticle tanning is an important part of the pear psyllid polyphenism, it is not the equivalent of it. First, there are other traits that distinguish between the two morphs, such as ovarian diapause (Oldfield, 1970), and the role of bursicon signaling in regulating these aspects of polyphenism were not measured. Thus, the phenotype in pear psyllids, whereby knockdown bursicon reduces cuticle tanning seems to simply demonstrate the phenotypes of Drosophila mutants for bursicon receptor (Loveall and Deitcher, 2010, BMC Dev Biol) in another species (Fig. 2I, 4H). Second, the study fails to address the threshold nature of cuticular tanning in this species, although it is the threshold response (specifically, to temperature and photoperiod) that distinguishes this trait as a part of a polyphenism. Whereas miR-6012 was found to regulate bursicon expression, there no evidence is provided that this microRNA either responds to or initiates a threshold response to temperature. In principle, miR-6012 could regulate bursicon whether or not it is part of a polyphenism. Thus, the impact of this work would be significantly increased if it could distinguish between seasonal changes of the cuticle and a bona fide reflection of polyphenism.

      Thanks for your valuable suggestion. We concur with the review’s comment that cuticle tanning does not equate to the C. chinensis polyphenism. To better reflect the core focus of our research, we have revised the title to "Neuropeptide Bursicon and its receptor mediated the transition from summer-form to winter-form of Cacopsylla chinensis".

      In response to the reviewer's inquiry regarding the threshold nature of cuticular tanning in C. chinensis, we have included a detailed analysis of the phenotypic changes (including nymph phenotypes, cuticle pigment absorbance, and cuticle thickness) during the transition from summer-form to winter-form in C. chinensis at distinct time intervals (3, 6, 9, 12, 15 days) under different temperature conditions (10°C and 25°C). As shown in Figure S1, nymphs exhibit a light yellow and transparent coloration at 3, 6, and 9 days, while nymphs at 12 and 15 days display shades of yellow-green or blue-yellow under 25°C conditions. At 10°C conditions, the abdomen end turns black at 3, 6, and 9 days. By the 12 days, numerous light black stripes appear on the chest and abdomen of nymphs at 10°C. At 15 days, nymphs exhibit an overall black-brown appearance, featuring dark brown stripes on the left and right sides of each chest and abdominal section. Furthermore, the end of the abdomen and back display a large black-brown coloration at 10°C (Figure S1A). The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Cuticle thicknesses also increased following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). The detailed results (L122-143), materials and methods (L647-652), and discussion (L319-322) have been added in our revised manuscript.

      Regarding the response of miR-6012 to temperature, we have already determined its expression at 3, 6, 10 days under different temperatures in the previous Figure 5E. We now included additional time intervals (9, 12, 15 days) in the updated Figure 5E. Our results indicate a significant decrease in the expression levels of miR-6012 after 10°C treatment for 3, 6, 9, 12, 15 days compared to the 25°C treatment group. Detailed information regarding this has been integrated into the Materials and Methods (Line 608-610) of our revised manuscript.

      Strengths:

      This study convincingly identifies homologs of the genes encoding the bursicon subunits and its receptor, showing an alignment with those of another psyllid as well as more distant species. It also demonstrates that the stage- and tissue-specific levels of bursicon follow the expected patterns, as informed by other insect models, thus validating the identity of these genes in this species. They provide strong evidence that the expression of bursicon and its receptor depend on temperature, thereby showing that this trait is regulated through both parts of the signaling mechanism.

      Several parallel measurements of the phenotype were performed to show the effects of this hormone, its receptor, and an upstream regulator (miR-6012), on cuticle deposition and pigmentation (if not polyphenism per se, as claimed). Specifically, chitin staining and TEM of the cuticle qualitatively show difference between controls and knockdowns, and this is supported by some statistical tests of quantitative measurements (although see comments below). Thus, this study provides strong evidence that bursicon and its receptor play an important role in cuticle deposition and pigmentation in this psyllid.

      The study identified four miRNAs which might affect bursicon due to sequence motifs. By manipulating levels of synthetic miRNA agonists, the study successfully identified one of them (miR-6012) to cause a cuticle phenotype. Moreover, this miRNA was localized (by FISH) to the cuticle, body-wide. To our knowledge, this is the first demonstrated function for this miRNA, and this study provides a good example of using a gene of known function as an entry point to discovering others influencing a trait. Thus, this finding reveals another level of regulation of cuticle formation in insects.

      Weaknesses:

      (1) The introduction to this manuscript does not accurately reflect progress in the field of mechanisms underlying polyphenism (e.g., line 60). There are several models for polyphenism that have been used to uncover molecular mechanisms in at least some detail, and this includes seasonal polyphenisms in Hemiptera. Therefore, the justification for this study cannot be predicated on a lack of knowledge, nor is the present study original or unique in this line of research (e.g., as reviewed by Zhang et al. 2019; DOI: 10.1146/annurev-ento-011118-112448). The authors are apparently aware of this, because they even provide other examples (lines 104-108); thus the introduction seems misleading as framed.

      Thanks for your excellent suggestion. We have added the paper of Zhang et al. 2019 which recommended by reviewer (DOI: 10.1146/annurev-ento-011118-112448) in Line 57 of our revised manuscript. The statement has been revised to “However, the specific molecular mechanism underling temperature-dependent polyphenism still require further clarification” in Line 60-61 of our revised manuscript.

      (2) The data in Figure 2H show "percent of transition." However, the images in 2I show insects with tanned cuticle (control) vs. those without (knockdown). Yet, based on the description of the Methods provided, there appears to be no distinction between "percent of transition" and "percent with tanning defects". This an important distinction to make if the authors are going to interpret cuticle defects as a defect in the polyphenism. Furthermore, there is no mention of intermediate phenotypes. The data in 2H are binned as either present or absent, and these are the phenotypes shown in 2I. Was the phenotype really an all-or-nothing response? Instead of binning, which masks any quantitative differences in the tanning phenotypes, the authors should objectively quantify the degree of tanning and plot that. This would show if and to what degree intermediate tanning phenotypes occurred, which would test how bursicon affects the threshold response. This comment also applies to the data in Figures 4G and 6G. Since cuticle tanning is present in more insect than just those with seasonal polyphenism, showing how this responds as a threshold is needed to make claims about polyphenism.

      We appreciate your insightful comments. As shown in Figure 1 of our published paper (Zhang et al., 2013; doi.org/10.7554/eLife.88744.3) and Figure 2C-2I of the current manuscript, the transition from summer-form to winter-form entails not only external cuticular tanning but also alterations in internal cuticular chitin levels and cuticle thickness. While external cuticular tanning serves as a prominent and easily observable indicator of this transition, it is crucial to acknowledge that internal changes also play a significant role and should be taken into consideration. Therefore, we propose that the term "percent of transition" may be more suitable than "percent with tanning defects" to describe this process accurately.

      In order to provide a more visually comprehensive understanding of the phenotypic changes during the transition from summer-form to winter-form, we have included images at different time points (3, 6, 9, 12, 15 days) under different temperature conditions in Figure S1A of our revised manuscript. Specifically, under the 10°C condition, nymphs exhibit abdomen tanning after 6 and 9 days of treatment, while the thorax remains untanned. By days 12 to 15, both the abdomen and thorax of the nymphs show tanning, resulting in the majority of summer-form nymphs transitioning into winter-form, as depicted in Figure 2I for comparison. This observation indicates the presence of a critical threshold for cuticle tanning of C. chinensis following exposure to 10°C. Nymphs that did not undergo the transition to winter-form succumbed to the cold, highlighting the absence of intermediate phenotypes at 12-15 days under the 10°C condition. The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Additionally, cuticle thickness shows an increase following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). These results highlight the relationship between the threshold of cuticular tanning and the transition process. The detailed description and information have been added in Results (L122-143), Materials and Methods (L647-652), and Discussion (L319-322) of our manuscript.

      (3) This study also does not test the threshold response of cuticle phenotypes to levels of bursicon, its receptor, or miR-6012. Hormone thresholds are the most widespread and, in most systems where polyphenism has been studied, the defining characteristic of a polyphenism (e.g., Nijhout, 2003, Evol Dev). Quantitative (not binned) measurements of a polyphenism marker (e.g., chitin) should be demonstrated to result as a threshold titer (or in the case of the receptor, expression level) to distinguish defects in polyphenism from those of its component trait.

      Thanks for your valuable feedback. We have supplemented additional data on the phenotypes (Figure S1A), cuticle pigment absorbance (Figure S1B), cuticle thickness (Figure S1C), expression levels of bursicon (Figure 1E and 1F), its receptors (Figure 3G), and miR-6012 (Figure 5E) corresponding to nymphs treated over different time periods (3, 6, 9, 12, 15 days) under both 10°C and 25°C conditions in our revised manuscript.

      While all these identified markers exhibit a strong correlation with the transition from summer-form to winter-form, it is important to note that they are not suitable as definitive thresholds due to the nature of relative gene expression quantification and chitin content assessment, rather than absolute quantitation. Further, given that tanning hormones are neuropeptides present in trace amounts in insects, unlike steroid hormones, determining their titers poses a considerable challenge.

      (4) Cuticle issue:

      (a) Unlike Fig. 6D and F, Figs. 2D and F do not correspond to each other. Especially the lack and reduction of chitin in ds-a+b! By fluorescence microscopy there is hardly any signal, whereas by TEM there is a decent cuticle. Additionally, the dsGFP control cuticle in 2D is cut obliquely with a thick and a thin chitin layer. This is misleading.

      Thanks for your insightful feedback. We have replaced the previous WGA chitin staining images in the dsCcbursα+β treatment of Figure 2D with new representative images aligning with Figure 2F. Furthermore, the presence of both thin and thick chitin layers observed in the dsEGFP treatment of Figure 2D could potentially be ascribed to the chitin content in the insect midgut or fat body as previously discussed (Zhu et al., 2016). It is notable that during the process of cuticle staining, the chitin located in the midgut and fat body of C. chinensis may exhibit green fluorescence, leading to the appearance of a thin chitin layer. A detailed analysis and elucidation of these observations have been added in the discussion section (Lines 347-352) of our revised manuscript.

      Zhu KY, Merzendorfer H, Zhang W, Zhang J, Muthukrishnan S. Biosynthesis, Turnover, and Functions of Chitin in Insects. Annu Rev Entomol. 2016;61:177-196. doi:10.1146/annurev-ento-010715-023933.

      (b) In Figs. 2F and 4F, the endocuticle appears to be missing, a portion of the procuticle that is produced post-molting. As tanning is also occurring post-molting, there seems to be a general problem with cuticle differentiation at this time point. This may be a timing issue. Please clarify.

      Thank you for your suggestion. The insect cuticle typically comprises three distinct layers (endocuticle, exocuticle, and epicuticle), with the thickness of each layer varying among different insect species. Cuticle differentiation is closely linked to the molting cycle of insects (Mrak et al., 2017). In our study, nymphal cuticles exhibited normal differentiation patterns, characterized by a thin epicuticle and comparable widths of the endocuticle and exocuticle following dsEGFP treatment, as illustrated in Figure 2F and 4F. Conversely, nymphs treated with dsCcBurs-α, dsCcBurs-β, and dsCcburs-R displayed impaired development, manifesting only the exocuticle without a discernible endocuticle layer. These findings suggest that bursicon genes and their receptor play a pivotal role in regulating insect cuticle development (Costa et al., 2016). We have added some discussion about these results in Lines 356-367 of our revised manuscript.

      Mrak, P., Bogataj, U., Štrus, J., & Žnidaršič, N. (2017). Cuticle morphogenesis in crustacean embryonic and postembryonic stages. Arthropod structure & development, 46(1), 77–95. https://doi.org/10.1016/j.asd.2016.11.001

      Costa, C. P., Elias-Neto, M., Falcon, T., Dallacqua, R. P., Martins, J. R., & Bitondi, M. (2016). RNAi-mediated functional analysis of Bursicon genes related to adult cuticle formation and tanning in the Honeybee, Apis mellifera. PloS one, 11(12), e0167421. https://doi.org/10.1371/journal.pone.0167421

      (c) To provide background information, it would be useful analyze cuticle formation in the summer and winter morphs of controls separately by light and electron microscopy. More baseline data on these two morphs is needed.

      Thanks for your valuable feedback. To provide more background information about cuticle formation, we supplied the results of nymph phenotypes, cuticle pigment absorbance, and cuticle thickness at distinct time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C in Figure S1 of our revised manuscript. Hope these results can help better understand the baseline data on these two morphs.

      (d) For the TEM study, it is not clear whether the same part of the insect's thorax is being sectioned each time, or if that matters. There is not an obvious difference in the number of cuticular layers, but only the relative widths of those layers, so it is difficult to know how comparable those images are. This raises two questions that the authors should clarify. First, is it possible that certain parts of the thoracic cuticle, such as those closer to the intersegmental membrane, are naturally thinner than other parts of the body? Second, is the tanning phenotype based on the thickness or on the number of chitin layers, or both? The data shown later in Figure 4I, J convincingly shows that the biosynthesis pathway for chitin is repressed, but any clarification of what this might mean for deposition of chitin would help to understand the phenotypes reported. Also, more details on how the data in Fig. 2G were collected would be helpful. This also goes for the data in Fig. 4 (bursicon receptor knockdowns).

      Thanks for your great comment. The TEM investigation adhered to a standardized protocol was used as previous description (Zhang et al., 2023), Initially, insect heads were uniformly excised and then fixed in 4% paraformaldehyde. Subsequently, a consistent cutting and staining procedure was executed at a uniform distance above the insect's thorax. The dorsal region of the thorax was specifically chosen for subsequent fluorescence imaging or transmission electron microscopy assessments with the specific objective of quantifying cuticle thickness. Regarding the measurement of cuticle thickness, use the built-in measuring ruler on the software to select the top and bottom of the same horizontal line on the cuticle. Measure the cuticle of each nymph at two close locations. Six nymphs were used for each sample. Randomly select 9 values and plot them. The related description has been added in the Materials and Methods (Line 660-668) of our revised manuscript.

      Zhang, S.D., Li, J.Y., Zhang, D.Y., Zhang, Z.X., Meng, S.L., Li, Z., & Liu, X.X. (2023). MiR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis. eLife, 12. https://doi.org/10.7554/eLife.88744

      (5) Tissue issue:

      The timed experiments shown in all figures were done in whole animals. However, we know from Drosophila that Bursicon activity is complex in different tissues. There is, thus, the possibility, that the effects detected on different days in whole animals are misleading because different tissues--especially the brain and the epidermis, may respond differentially to the challenge and mask each other's responses. The animal is small, so the extraction from single tissue may be difficult. However, this important issue needs to be addressed.

      Thanks for your excellent suggestion. We express our heartfelt appreciation to the reviewer for their valuable input regarding the challenges involved in dissecting various tissue sections from the diminutive early instar nymphs of C. chinensis. In light of the metamorphic transition of C. chinensis across developmental stages, this study concentrated on examining the extensive phenotypic alterations. Consequently, intact samples of C. chinensis were specifically chosen for for qPCR analysis. The related descriptions have been added in the Materials and Methods (Line 513, 517, 553, 555, and 613) and Discussion (Line 327-329) of our revised manuscript.

      (6) No specific information is provided regarding the procedure followed for the rescue experiments with burs-α and burs-β (How were they done? Which concentrations were applied? What were the effects?). These important details should appear in the Materials and Methods and the Results sections.

      Thanks for your excellent suggestion. For the rescue experiments, the dsRNA of CcBurs-R and proteins of burs α-α, burs β-β homodimers, or burs α-β heterodimer (200 ng/μL) were fed together. The concentration of heterodimer protein of CcBurs-α+β was 200 ng/μL. The heterodimer protein of CcBurs-α+β fully rescued the effect of RNAi-mediated knockdown on CcBurs-R expression, while α+α or β+β homodimers did not (Figure 3F). Feeding the α+β heterodimer protein fully rescued the defect in the transition percent and morphological phenotype after CcBurs-R knockdown (Figure 4G-4H). We have added the detailed methods of rescued experiments and specific concentrations in the Materials and Methods (Line 561-563), and Results (Line 263) of our revised manuscript.

      (7) Pigmentation

      (a) The protocol used to assess pigmentation needs to be validated. In particular, the following details are needed: Were all pigments extracted? Were pigments modified during extraction? Were the values measured consistent with values obtained, for instance, by light microscopy (which should be done)?

      Thanks for your excellent comment. Our protocol for pigment extracted as detailed in Bombyx mori, the cuticles were pulverized in liquid nitrogen and then dissolved in 30 milliliters of acidified methanol (Futahashi et al., 2012; Osanai-Futahashi et al., 2012). Thus, all cuticle pigments were dissected and treated with acidified methanol. Pigments were not modified during extraction.. The details description have been integrated into the Materials and Methods (Line 630-633) of our revised manuscript.

      Futahashi, R., Kurita, R., Mano, H., & Fukatsu, T. (2012). Redox alters yellow dragonflies into red. Proceedings of the National Academy of Sciences of the United States of America, 109(31), 12626–12631. https://doi.org/10.1073/pnas.1207114109

      Osanai-Futahashi, M., Tatematsu, K. I., Yamamoto, K., Narukawa, J., Uchino, K., Kayukawa, T., Shinoda, T., Banno, Y., Tamura, T., & Sezutsu, H. (2012). Identification of the Bombyx red egg gene reveals involvement of a novel transporter family gene in late steps of the insect ommochrome biosynthesis pathway. The Journal of biological chemistry, 287(21), 17706–17714. https://doi.org/10.1074/jbc.M111.321331

      (b) In addition, pigmentation occurs post-molting; thus, the results could reflect indirect actions of bursicon signaling on pigmentation. The levels of expression of downstream pigmentation genes (ebony, lactase, etc) should be measured and compared in molting summer vs. winter morphs.

      Thanks for your valuable suggestion. Actually, we already studied the function of some downstream pigmentation genes, including ebony, Lactase, Tyrosine hydroxylase, Dopa decarboxylase, and Acetyltransferase. The variations in the expression patterns of these genes are closely tied to the molting dynamics of nymphs undergoing transitions between summer-form and winter-form. These findings will put in another manuscript currently being prepared for submission, thus detailed outcomes are not suitable for inclusion in the current manuscript.

      (8) L236: "while the heterodimer protein of CcBurs α+β could fully rescue the effect of CcBurs-R knockdown on the transition percent (Figure 4G 4H)". This result seems contradictory. If CcBurs-R is the receptor of bursicon, the heterodimer protein of CcBurs α+β should not be able to rescue the effect of CcBurs-R knockdown insects. How can a neuropeptide protein rescue the effect when its receptor is not there! If these results are valid, then the CcBurs-R would not be the (sole) receptor for CcBurs α+β heterodimer. This is a critical issue for this manuscript and needs to be addressed (also in L337 in Discussion).

      Thanks for your insightful suggestion. Following the administration of dsCcBur-R to C. chinensis, the expression of CcBurs-R exhibited a reduction of approximately 66-82% as depicted in Figure 4A, rather than complete suppression. Activation of endogenous CcBurs-R through feeding of the α+β heterodimer protein results in an increase in CcBurs-R expression, with the effectiveness of the rescue effect contingent upon the dosage of the α+β heterodimer protein. Consequently, the capacity of the α+β heterodimer protein to effectively mitigate the impacts of CcBurs-R knockdown on the conversion rate is clearly demonstrated. We have added additional discussion in Line 396-403 of our revised manuscript.

      (9) Fig. 5D needs improvement (the magnification is poor) and further explanation and discussion. mi6012 and CcBurs-R seem to be expressed in complementary tissues--do we see internal tissues also (see problem under point 2)? Again, the magnification is not high enough to understand and appreciate the relationships discussed.

      Thanks for your valuable suggestion. In order to enhance the resolution of the magnified images, we conducted FISH co-localization of miR-6012 and CcBurs-R in 3rd instar nymphs and obtained detailed zoomed-in images. As shown in the magnified view of Figure 5D, miR-6012 and CcBurs-R appear to exhibit complementary expression patterns in tissues. During the FISH assays, epidermis transparency of C. chinensis was achieved via decolorization treatment. Noteworthy observations from Figure 3G and Figure 5E reveal an inverse correlation in the expression profiles of CcBurs-R and miR-6012. Consequently, the FISH results distinctly highlight a significant disparity in the expression levels of CcBurs-R and miR-6012 within the same tissue. We have added related explanation and discussion in Line 291-293 of our revised manuscript.

      (10) The schematic in Fig. 7 is a useful summary, but there is a part of the logic that is unsupported by the data, specifically in terms of environmental influence on cuticle formation (i.e., plasticity). What is the evidence that lower temperatures influence expression of miR-6012? The study measures its expression over life stages, whether with an agonist or not, over a single temperature. Measuring levels of expression under summer form-inducing temperature is necessary to test the dependence of miR-6012 expression on temperature. Otherwise, this result cannot be interpreted as polyphenism control, but rather the control of a specific trait.

      Thanks for your great suggestion. We actually conducted the assessment of miR-6012 expression at specific time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C. As depicted in Figure 5E, the expression levels of miR-6012 were notably reduced at 10°C compared to 25°C. Additionally, the evaluation of agomir-6012 expression level of C. chinensis under 25°C conditions at various time points (3, 6, 9, 12, 15 days) revealed no significant changes. Hence, we suggest that the impact of miR-6012 on the seasonal morphological transition is influenced upon temperature.

      Recommendations for the authors:

      The authors report a novel role of Bursicon and its receptor in regulating the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment (10°C) activated the Bursicon signaling pathway during the transition from summer-form to winter-form, which influences cuticle pigment content, cuticle chitin content, and cuticle thickness. Moreover, the authors identified miR-6012 and show that it targets CcBurs-R, thereby modulating the function of Bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of multiple roles of neuropeptide bursicon action in arthropod biology. However, the m

      anuscript does have several major weaknesses, described under "Public review", which the authors need to address.

      Major issues:

      (1) L152-154 Fig S2E and S2F: Bursicon has been shown to be expressed in the CNS in a specific set of neurons. For example, In the larval CNS of Manduca sexta, bursicon expression is restricted to the subesophageal ganglion (SG), thoracic ganglia, and first abdominal ganglion. Pharate pupae and pharate adults show expression of this heterodimer in all ganglia. In Drosophila larvae, expression of a bursicon heterodimer is confined to abdominal ganglia. The additional neurons in the ventral nerve cord express only burs. In pharate adults, bursicon is produced by neurons in the SG and abdominal ganglia. I am wondering where bursicon subunits are expressed in the C. chinensis CNS? Since the authors have the antibodies, it would be useful to include immunocytochemical staining of bursicon alpha and beta in the CNS. The qPCR results from head or other tissues (Fig S2E and S2F) is not the most informative way to document localization of gene expression. Regarding the qPCR results, they show that the cuticle and the fat body express CcBurs-α and CcBurs-β. Can the authors confirm this unexpected results independently?

      Thanks for your insightful comment. In this study, we did not directly used antibodies targeting bursicon subunits, instead, the bursicon subunits along with a histidine tag were integrated into the expression vector pcDNA3.1 using homologous recombination. The experimental procedures were executed as follows: initially, the histidine tag was fused to the pcDNA3.1-mCherry vector through homologous recombination to generate the recombinant plasmid pcDNA3.1-his-mCherry. Subsequently, the amino acid sequences of the two bursicon subunits were introduced into the pcDNA3.1-his-mCherry vector via homologous recombination to produce the recombinant plasmids pcDNA3.1-CcBurs-α-his-mCherry and pcDNA3.1-CcBurs-β-his-mCherry. Finally, the P2A sequence was incorporated into the vector using reverse PCR to yield the recombinant plasmids pcDNA3.1-CcBurs-α-his-P2A-mCherry and pcDNA3.1-CcBurs-β-his-P2A-mCherry. Consequently, the bursicon subunits, along with the histidine tag, were capable of generating fusion proteins with the histidine tag. Western blot analysis was conducted using antibodies targeting the histidine tag, enabling the detection of histidine expression, which corresponds to the expression of the bursicon subunits. However, they are not suitable to conduct the in vivo immunocytochemical staining of bursicon alpha and beta in the CNS.

      Due to the diminutive size of the C. chinensis nymphs, dissection of the central nervous system (CNS) was unfeasible, precluding specific assessment of bursicon expression in the CNS. Prior literature has documented the expression of bursicon subunits in the epidermis and fat body of C. chinensis. Studies suggest that bursicon subunits not only play a role in the melanization and sclerotization processes of insect epidermis but also have significant roles in insect immunity (An et al., 2012). The presence of bursicon subunits in the epidermis, gut, and fat body of C. chinensis may indicate their crucial roles in the immune functions of these tissues. Further investigation is required to elucidate the specific immune functions they perform, hinting at the potential expression of these bursicon subunits in these two tissues.

      An, S., Dong, S., Wang, Q., Li, S., Gilbert, L. I., Stanley, D., & Song, Q. (2012). Insect neuropeptide bursicon homodimers induce innate immune and stress genes during molting by activating the NF-κB transcription factor Relish. PloS one, 7(3), e34510. https://doi.org/10.1371/journal.pone.0034510

      (2) L222: "CcBurs-R is the Bursicon receptor of C. chinensis". Is this statement supported by affinity binding assay results?

      Thanks for your excellent suggestion. We employed a fluorescence-based assay to quantify calcium ion concentrations and investigate the binding affinities of bursicon heterodimers and homodimers to the bursicon receptor across varying concentrations. Our findings suggest that activation of the receptor by the burs α-β heterodimer leads to significant alterations in intracellular calcium ion levels, whereas stimulation with burs α-α and burs β-β homodimers, in conjunction with Adipokinetic hormone (AKH), maintains consistent intracellular calcium ion levels. Consequently, this research definitively identifies CcBurs-R as the bursicon receptor. For further details, please refer to the Materials and Methods (Lines 493-504), Results (Lines 231-239), and Discussion (Lines 377-384) of our revised manuscript.

      (3) L245 Figure 4I-4J: Since knockdown of bursicon and its receptor cause a decrease pigment accumulation in the cuticle, it would be useful to examine 1-2 rate limiting enzyme-encoding genes in the bursicon regulated cuticle darkening process if possible (as was done for genes involved in cuticle thickening).

      Thanks for your excellent comment. Following the further study, a thorough analysis was conducted to evaluate the impact of bursicon and its receptor on the expression levels of Lactase, Tyrosine hydroxylase, Dopa decarboxylase, Acetyltransferase, and the effects of RNA interference targeting these genes on the seasonal morphological transition. The findings underscored their role in the bursicon-mediated cuticle darkening process. However, as this section is slated for inclusion in an upcoming manuscript intended for submission, it is deemed unsuitable for incorporation into the current manuscript.

      Minor issues:

      (1) L75 "stronger resistance (Ge et al., 2019; Tougeron et al., 2021)". Stronger resistance to what? Stronger resistance to environmental stress or weather condition? Please clarify.

      Thanks for your excellent suggestion. We have changed the statement to “stronger resistance to weather condition” in Line 75 of our revised manuscript.

      (2) L132 Figure 1A and 1B: Bursicon sequence was first identified and functionally characterized in Drosophila melanogaster: is there any reason why Drosophila bursicon sequences were not included in the comparison?

      Thanks for your excellent comment. We have added the sequence of Burs-α and Burs-β of D. melanogaster in the sequence alignment results of Figure 1A and 1B of our revised manuscript.

      (3) Although the authors clearly identify and validate the function for the bursicon genes and its receptor's, there is no mention of whether duplicates of this gene are also present in the pear psyllid. This has been known to happen in otherwise conserved hormone pathways (e.g., insulin receptor in some insects), so a formal check of this should be done.

      Thanks for your excellent comment. As shown in Figure S2A-S2B and 3B, there are two bursicon subunit genes and only one bursicon receptor gene in our selected insect species, for examples Drosophila melanogaster, Diaphorina citri, Bemisia tabaci, Nilaparvata lugens, and Sogatella furcifera. In our transcriptome database of C. chinensis, we also only identified two bursicon subunit genes and only one bursicon receptor gene.

      (4) Line 41: Here, as in the title, "fascinating" is a subjective judgement that does not improve a study's presentation.

      Thanks for your great comment. We have changed "fascinating" to "transformation" in Line 41 and also revised the title of our revised manuscript.

      (5) Line 44: What makes some fields "cutting-edge" and others not?

      Thanks for your excellent suggestion. The expression of "in cutting-edge fields" has been deleted in Line 44 of our revised manuscript.

      (6) Line 97: This is a peculiar choice of reference for the concept of slower development in cold temperatures. The concept of degree-days and growth rates is old and widespread in entomology.

      Thanks for your insightful comment. The reference of Nyamaukondiwa et al., 2011 in Line 95 has been deleted in our revised manuscript.

      (7) Lines 149-150: What justifies the assumption that higher levels of expression mean a more important role? This gene might be just as necessary for development of the summer form, even if expressed at lower levels.

      Thanks for your excellent suggestion. This sentence has been revised to “Increased gene expression levels may potentially contribute to the transition from summer-form to winter-form in C. chinensis.” in Line 168-169 of our revised manuscript.

      (8) The blue arrow in Fig. 7 is confusing.

      Thanks for your excellent suggestion. In Figure 7, the blue arrow represents the down-regulated expression of miR-6012. We have added a description about the blue arrow in Figure 7 of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Over the last decade, numerous studies have identified adaptation signals in modern humans driven by genomic variants introgressed from archaic hominins such as Neanderthals and Denisovans. One of the most classic signals comes from a beneficial haplotype in the EPAS1 gene in Tibetans that is evidently of Denisovan origin and facilitated high altitude adaptation (HAA). Given that HAA is a complex trait with numerous underlying genetic contributions, in this paper Ferraretti et al. asked whether additional HAA-related genes may also exhibit a signature of adaptive introgression. Specifically, the authors considered that if such a signature exists, they most likely are only mild signals from polygenic selection, or soft sweeps on standing archaic variation, in contrast to a strong and nearly complete selection signal like in the EPAS1. Therefore, they leveraged two methods, including a composite likelihood method for detecting adaptive introgression and a biological networkbased method for detecting polygenic selection, and identified two additional genes that harbor plausible signatures of adaptive introgression for HAA.

      Strengths: 

      The study is well motivated by an important question, which is, whether archaic introgression can drive polygenic adaptation via multiple small effect contributions in genes underlying different biological pathways regulating a complex trait (such as HAA). This is a valid question and the influence of archaic introgression on polygenic adaptation has not been thoroughly explored by previous studies.

      The authors reexamined previously published high-altitude Tibetan whole genome data and applied a couple of the recently developed methods for detecting adaptive introgression and polygenic selection. 

      Weaknesses: 

      My main concern with this paper is that I am not too convinced that the reported genomic regions putatively under polygenic selection are indeed of archaic origin. Other than some straightforward population structure characterizations, the authors mainly did two analyses with regard to the identification of adaptive introgression: First, they used one composite likelihood-based method, the VolcanoFinder, to detect the plausible archaic adaptive introgression and found two candidate genes (EP300 and NOS2). Next, they attempted to validate the identified signal using another method that detects polygenic selection based on biological network enrichments for archaic variants.

      In general, I don't see in the manuscript that the choice of methods here are well justified. VolcanoFinder is one among the several commonly used methods for detecting adaptive introgression (eg. the D, RD, U, and Q statistics, genomatnn, maldapt etc.). Even if the selection was mild and incomplete, some of these other methods should be able to recapitulate and validate the results, which are currently missing in this paper. Besides, some of the recent papers that studied the distribution of archaic ancestry in Tibetans don't seem to report archaic segments in the two gene regions. These all together made me not sure about the presence of archaic introgression, in contrast to just selection on ancestral variation.

      Furthermore, the authors tried to validate the results by using signet, a method that detects enrichments of alleles under selection in a set of biological networks related to the trait. However, the authors did not provide sufficient description on how they defined archaic alleles when scoring the genes in the network. In fact, reading from the method description, they seemed to only have considered alleles shared between Tibetans and Denisovans, but not necessarily exclusively shared between them. If the alleles used for scoring the networks in Signet are also found in other populations such as Han Chinese or Africans, then that would make a substantial difference in the result, leading to potential false positives.

      Overall, given the evidence provided by this article, I am not sure they are adequate to suggest archaic adaptive introgression. I recommend additional analyses for the authors to consider for rigorously testing their hypothesis. Please see the details in my review to the authors. 

      Reviewer #2 (Public Review):

      In Ferrareti et al. they identify adaptively introgressed genes using VolcanoFinder and then identify pathways enriched for adaptively introgressed genes. They also use a signet to identify pathways that are enriched for Denisovan alleles. The authors find that angiogenesis and nitric oxide induction are enriched for archaic introgression.

      Strengths: 

      Most papers that have studied the genetic basis of high altitude (HA) adaptation in Tibet have highly emphasized the role of a few genes (e.g. EPAS1, EGLN1), and in this paper, the authors look for more subtle signals in other genes (e.g EP300, NOS2) to investigate how archaic introgression may be enriched at the pathway level.

      Looking into the biological functions enriched for Denisovan introgression in Tibetans is important for characterizing the impact of Denisovan introgression.

      Weaknesses: 

      The manuscript lacks details or justification about how/why some of the analyses were performed. Below are some examples where the authors could provide additional details.

      The authors made specific choices in their window analysis. These choices are not justified or there is no comment as to how results might change if these choices were perturbed. For example, in the methods, the authors write "Then, the genome was divided into 200 kb windows with an overlap of 50 kb and for each of them we calculated the ratio between the number of significant SNVs and the total number of variants." 

      Additional information is needed for clarity. For example, "we considered only protein-protein interactions showing confidence scores {greater than or equal to} 0.7 and the obtained protein frameworks were integrated using information available in the literature regarding the functional role of the related genes and their possible involvement in high-altitude adaptation." What do the confidence scores mean? Why 0.7?

      In the method section (Identifying gene networks enriched for Denisovan-like derived alleles), the authors write "To validate VolcanoFinder results by using an independent approach". Does this mean that for signet the authors do not use the regions identified as adaptively introgressed using volcanofinder? I thought in the original signet paper, the authors used a summary describing the amount of introgression of a given region.

      Later, the authors write "To do so, we first compared the Tibetan and Denisovan genomes to assess which SNVs were present in both modern and archaic sequences. These loci were further compared with the ancestral reconstructed reference human genome sequence (1000 Genomes Project Consortium et al., 2015) to discard those presenting an ancestral state (i.e., that we have in common with several primate species)." It is not clear why the authors are citing the 1000 genomes project. Are they comparing with the reference human genome reference or with all populations in the 1000 genomes project? Also, are the authors allowing derived alleles that are shared with Africans? Typically, populations from Africa are used as controls since the Denisovan introgression occurred in Eurasia.

      The methods section for Figures 4B, 4C, and 4D is a little hard to understand. What is the x-axis on these plots? Is it the number of pairwise differences to Denisovan? The caption is not clear here. The authors mention that "Conversely, for non-introgressed loci (e.g., EGLN1), we might expect a remarkably different pattern of haplotypes distribution, with almost all haplotype classes presenting a larger proportion of non-Tibetan haplotypes rather than Tibetan ones." There is clearly structure in EGLN1. There is a group of non-Tibetan haplotypes that are closer to Denisovan and a group of Tibetan haplotypes that are distant from Denisovan...How do the authors interpret this? 

      In the original signet paper (Guoy and Excoffier 2017), they apply signet to data from Tibetans. Zhang et al. PNAS (2021) also applied it to Tibetans. It would be helpful to highlight how the approach here is different. 

      We thank the Reviewers for having appreciated the rationale of our study and to have identified potential issues that deserve to be addressed in order to better focus on robust results specifically supported by multiple approaches.

      First, we agree with the Reviewers that clarification and justification for the methodologies adopted in the present study should be deepened with respect to what done in the original version of the manuscript, with the purpose of making it more intelligible for a broad range of scientists. As reported thoroughly in the revised version of the text, the VolcanoFinder algorithm, which we used as the primary method to discover new candidate genomic regions affected by events of adaptive introgression, was chosen among several approaches developed to detect signatures ascribable to such an evolutionary process according to the following reasons: i) VolcanoFinder is one of the few methods that can test jointly events of both archaic introgression and adaptive evolution (e.g., the D statistic cannot formally test for the action of natural selection, having been also developed to provide genome wide estimates of allele sharing between archaic and modern groups rather than to identify specific genomic regions enriched for introgressed alleles); ii) the model tested by the VolcanoFinder algorithm remarkably differs from those considered by other methods typically used to test for adaptive introgression, such as the RD, U and Q statistics, which are aimed at identifying chromosomal segments showing low divergence with respect to a specific archaic sequence and/or enriched in alleles uniquely shared between the admixed group and the source population, as well as characterized by a frequency above a certain threshold in the population under study, thus being useful especially to test an evolutionary scenario conformed to that expected in the case that adaptation was mediated by strong selective sweeps rather than weak polygenic mechanisms (see answer to comment #1 of Reviewer #1 for further details); iii) VolcanoFinder relies on less demanding computational efforts respect to other algorithms, such as genomatnn and Maladapt, which also require to be trained on large genomic simulations built specifically to reflect the evolutionary history of the population under study, thus increasing the possibility to introduce bias in the obtained results if the information that guides simulation approaches is not accurate.

      Despite that, we agree with Reviewer #2 that some criteria formerly implemented during the filtering of VolcanoFinder results (e.g., normalization of LR scores, use of a sliding windows approach, and implementation of enrichment analysis based on specific confidence scores) might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods for details). 

      Moreover, to further reduce the use of potential arbitrary filtering thresholds we decided to do not implement functional enrichment analysis to prioritize results from the VolcanoFinder method. To this end, although a STRING confidence score (i.e., the approximate probability that a predicted interaction exists between two proteins belonging to the same functional pathway according to information stored in the KEGG database) above 0.7 is generally considered a high confidence score (string-db.org, Szklarczyk et al. 2014), we replaced such a prioritization criterion by considering as the most robust candidates for adaptive introgression only those genomic regions that turned out to be supported by all the approaches used (i.e., VolcanoFinder, Signet, LASSI and Haplostrips analyses).

      According to the Reviewers’ comments on the use of the Signet algorithm, we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier 2020 by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations but not in an outgroup population of African ancestry. Accordingly, we used the Signet method as an independent approach to obtain a first validation of introgressed (but not necessarily adaptive) loci pointed out by VolcanoFinder results. 

      In detail, in response to the question by Reviewer #2 about which genomic regions have been considered in the Signet analysis, it is necessary to clarify that to obtain the input score associated to each gene along the genome, as required by the algorithm, we calculated average frequency values per gene by considering all the archaic-derived alleles included in the Tibetan dataset but not in the outgroup one. Therefore, we did not take into account only those loci identified as significant by VolcanoFinder analysis, but we performed an independent genome scan. Then, we crosschecked significant results from VolcanoFinder and Signet approaches and we shortlisted the genomic regions supported by both. This approach thus differs from that of Zhang et al. 2021 in which the input scores per gene were obtained by considering only those loci previously pointed out by another method as putatively introgressed. Moreover, as mentioned in the previous paragraph, our approach differs also from that implemented by Guoy et al. 2017, in which the input scores assigned to each gene were represented by the variants showing the smallest P-value associated to a selection statistic, being thus informative about putative adaptive events but not introgression ones.

      However, as correctly pointed out by both the Reviewers, we formerly performed Signet analysis by considering derived alleles shared between Tibetans and the Denisovan species, without filtering out those alleles that are observed also in other modern human populations. We agree with the Reviewers that this approach cannot rule out the possibility of retaining false positive results ascribable to ancestral polymorphisms rather than introgressed alleles. According to the Reviewers’ suggestion, we thus repeated the Signet analysis by removing derived alleles observed also in an outgroup population of African ancestry (i.e., Yoruba), by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. In detail, we considered only those alleles that: i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles); ii) were assumed to be derived according to the comparison with the ancestral reconstructed reference human genome sequence; iii) were completely absent (i.e., present frequency equal to zero) in the Yoruba population sequenced by the 1000 Genomes Project. Despite the comment of Reviewer #1 seems to propose the possible use of Han Chinese as a further control population, we decided to do not filter out Denisovan-like derived alleles present also in this human group because evidence collected so far suggest that Denisovan introgression in the gene pool of East Asian ancestors predated the split between low-altitude and high-altitude populations (Lu et al. 2016; Hu et al. 2017) and, as mentioned before, we aimed at using the Signet algorithm to validate introgression events rather than adaptive ones (see the answer to comment #6 of Reviewer #1 for further details). Moreover, we would like to remark that we decided to maintain the Signet analysis as a validation method in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that goes beyond the simple identification of single putative introgressed alleles, by instead enabling us to point out those biological functions that might have been collectively shaped by gene flow from Denisovans.

      In addition to validate genomic regions putatively affected by archaic introgression by crosschecking results from the VolcanoFinder and Signet analyses, according to the suggestion by Reviewer #1 we implemented a further validation procedure aimed at formally testing for the adaptive evolution of the identified candidate introgressed loci. For this purpose, we applied the LASSI likelihood haplotype based method (Harris & DeGiorgio 2020) to Tibetan whole genome data. Notably, we choose this approach mainly for the following reasons: i) because it is able to detect and distinguish genomic regions that have experienced different types of selective events (i.e. strong and weak ones); ii) it has been demonstrated to have increased power in identifying them with respect to other selection statistics (e.g., H12 and nSL) (Harris & DeGiorgio 2020). Again, we performed an independent genome scan using the LASSI algorithm and then we crosschecked the obtained significant results with those previously supported by VolcanoFinder and Signet approaches in order to shortlist genomic regions that have plausibly experienced both archaic introgression and adaptive evolution.

      Moreover, we maintained a final validation step represented by Haplostrips analysis, which was instead specifically performed on chromosomal segments supported by results from both VolcanoFinder, Signet, and LASSI approaches. This enabled us to assess the similarity between Denisovan haplotypes and those observed in Tibetans (i.e., the population under study in which archaic alleles might have played an adaptive role in response to high-altitude selective pressures), Han Chinese (i.e., a sister group whose common ancestors with Tibetans have experienced Denisovan admixture, but have then evolved at low altitude), and Yoruba (i.e., an outgroup that is assumed to have not received gene flow from Denisovans). 

      In conclusion, we believe that the substantial changes incorporated in the manuscript according to the Reviewers’ suggestions strongly improved the study by enabling us to focus on more solid results with respect to those formerly presented. Interestingly, although the single candidate loci supported by all the approaches now implemented for validating the obtained results have attained higher prioritization with respect to previous ones (which are supported by some but not all the adopted methods), angiogenesis still stands out as the one of the main biological functions that have been shaped by events of adaptive introgression in human groups of Tibetan ancestry. This provides new evidence for the contribution of introgressed Denisovan alleles other than the EPAS1 ones in modulating the complex adaptive responses evolved by Himalayan populations to cope with selective pressures imposed by high altitudes.

      Responses to Recommendations For The Authors:

      Reviewer #1:

      The authors mainly relied on one method, VolcanoFinder (VF), to detect adaptive introgression signals. As one of the recently developed methods, VF indeed demonstrated statistical power at detecting mild selection on archaic variants, as well as detecting soft sweeps on standing variations. However, compared to other commonly used methods for detecting adaptive introgression, such as the U and Q stats (Racimo et al. 2017), genomatnn (Gower et al. 2021), or MaLAdapt (Zhang et al. 2023),

      VF doesn't seem to have better power at capturing mild and incomplete sweeps. And it makes me wonder about the justification for choosing VF over other methods here, which is not clearly explained in the manuscript. If these adaptive introgression candidates are legitimate, even if the signals are mild, at least some of the other methods should be able to recapitulate the signature (even if they don't necessarily make it through the genome-wide significance thresholds). I would be more convinced about the archaic origin of these regions if the authors could validate their reported findings using some of the aforementioned other methods. 

      According to the Reviewer’s suggestion, in the revised version of the manuscript we have expanded the considerations reported as concern the rationale that guided the choice of the adopted methods. In particular, in the Materials and methods section (see page 12) we have specificed the reasons for having used the VolcanoFinder algorithm. 

      First, it represents one of the few approaches that relies on a model able to test jointly the occurrence of archaic introgression and the adaptive evolution of the genomic regions affected by archaic gene flow, without the need for considering the putative source of introgression. This was a relevant aspect for us, beacuse we planned to adopt at least two main independent (and possibly quite different in terms of the underlying approaches) methods to validate the identified candidate intregressed loci and the other algorithm we used (i.e., Signet) was explicitly based on the comparison of modern data with the archaic sequence. Accordingly, the model tested by VolcanoFinder differs from those considered by the RD, U and Q statistics. In fact, RD statistic is aimed at identifying regions of the genome with low divergence with respect to a given archaic reference, while the U/Q statistics can detect those chromosomal segments enriched in alleles that are i) uniquely shared between the admixed group (e.g., Tibetans) and the source population (e.g., Denisovans), and ii) that present a frequency above a specific threshold in the admixed population (Racimo et al. 2016). For instance, all the loci considered as likely involved in adaptive introgression events by Racimo et al. 2016 presented remarkable frequencies, with most of them showing values above 50%. That being so, we decided to do not implement these methods because we believe that they are more suitable for the detection of adaptive introgression events involving few variants with a strong effect on the phenotype, which comport a substantial increase in frequency in the population subjected to the selective pressure (i.e., cases such as that of  EPAS1), while it appears challenging to choose an arbitrary frequency threshold appropriate for the detection of weak and/or polygenic selective events. 

      As regards the possible use of Maladapt or genomatnn approaches as validation methods, we believe that they rely on more demanding computational efforts with respect to the Signet algorithm and, above all, they have the disadvantage of requiring to be trained on simulated genomic data. This makes them more prone to the potential bias introduced in the obtained results by simulations that do not carefully reflect the evolutionary history of the population under study.

      Overall, we do not agree with the Reviwer’s statement about the fact that we mainly relied on a single method to detect adaptive introgression signals because, as mentioned above, the Signet algorithm was specifically used to identify genomic regions putatively affected by introgression. This method relies on assumptions very similar to those described above for the U/Q statistics (e.g. it considers alleles uniquely shared between Tibetans and Denisovans), but avoids the necessity to select a frequency threshold to shortlist the most likely adaptive intregressed loci. In addition, according to another suggestion by the Reviewer we have now implemented a further approach to provide evidence for the adaptive evolution of the candidate introgressed loci (see response to comment #3).  

      As regards the use of Signet, based on comments from both the Reviewers we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier (2020) by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations. That being so, we used the Signet method as an independent approach to obtain a first validation of VolcanoFinder results. However, by following suggestions from both the Reviweres, we modified the criteria adopted to filter for archaic-derived variants, by excluding those alleles in common between Denisovan and the Yoruba outgroup population (see response to comment #6 for further information regarding this aspect). 

      To sum up, we think that the combination of VolcanoFinder and Signet+LASSI approaches offered a good compromise between required computational efforts to shortlist the most robust candidates of adaptive introgressed loci and the typologies of model tested (i.e. that does not diascard a priori genomic signatures ascribable to weak and/or polygenic selective events). Morevoer, we would like to remark that we decided to maintain the Signet method as a validation approach in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that can be used to perform both single-locus validation analysis and to search for those biological functions that have been collectively much more impacted by archaic introgression, allowing to test a more realistic approximation of the polygenic model of adaptation involving introgressed alleles. In fact, although the single candidate loci supported by all the approaches now implemented for validating the obtained results  (see responses to comments #3 and #7 for further details) have attained higher prioritization with respect to previous ones (i.e., EP300 and NOS2, which are now supported by some but not all the adopted methods), angiogenesis still stands out as one of the main biological functions that have been shaped by events of adaptive introgression in the ancestors of Tibetan populations. 

      Besides, I am a little surprised to see that in Supplementary Figure 2, VF didn't seem to capture more significant LR values in the EPAS1 region (positive control of adaptive introgression) than in the negative control EGLN1 region. The author explained this as the selection on EPAS1 region is "not soft enough", which I find a bit confusing. If there is no major difference in significant values between the positive and negative controls, how would the authors be convinced the significant values they detected in their two genes are true positives? I would like to see more discussion and justification of the VF results and interpretations.

      In the light of such a Reviewer’s observation and according to the Reviewer #2 overall comment on the procedures implemented for filtering VolcanoFinder results, we realized that both normalization of  LR scores and the use of a sliding windows approach might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods, page 13 lines 4 -16 for further details).

      By following this approach, we indeed observed a pattern clearer than that previously described, in which the distribution of LR scores in the EPAS1 genomic region is remarkably different with respect to that obtained for the EGLN1 gene (Figure 2 – figure supplement 1). More in detail, we identified a total of 19 EPAS1 variants showing scores within the top 5% of LR values, in contrast to only three EGLN1 SNVs. Moreover, LR values were collectively more aggregated in the EPAS1 genomic region and showed a higher average value with respect to what observed for EGLN1. We reported LR values, as well as -log (a) scores calculated for these control genes in Supplement tables 3 and 4.

      Nevertheless, we agree with the Reviewer that results pointed out by VolcanoFinder require to be confirmed by additional methods, which is was what we have done to define both new candidate adaptive intregressed loci and the considered positive/negative controls. In fact, validation analyses performed to confirm signatures of both archaic introgression and adaptive evolution (i.e., Signet, LASSI and Haplostrips) converged in indicating that Tibetan variability at the EGLN1 gene does not seem to have been shaped by archaic introgression events but only by the action of natural selection (see Results, page 5 lines 3-9, page 6 lines 23-25, page 7 lines 29-36; Discussion page 14 lines 33-36; Figure 2 – figure supplement 1B and Figure 4 – figure supplement 1B, 3B and 3D), also according to what was previously proposed (Hu et al., 2017). On the other hand, results from all validation analyses confirmed adaptive introgression signatures at the EPAS1 genomic region (see Results page 4 lines 32-37, page 5 lines 1-2 and 30-34, page 6 lines 23-29; Figure 3A, 3B and Figure 4 – figure supplement 1A, 3A and 3C). 

      Finally, as already reported in the former version of the manuscript, our choice of considering EPAS1 and EGLN1 respectively as positive and negative controls for adaptive introgression was guided by previous evidence suggesting these loci as targets of natural selection in high-altitude Himalayan populations (Yang et al., 2017; Liu et al., 2022), although only EPAS1 was proved to have been involved also in an adaptive introgression event (Huerta-Sanchez et al., 2014; Hu et al., 2017). 

      With that being said, I suggest the authors try to first validate the signal of positive selection in the two gene regions using methods such as H2/H1 (Garud et al. 2015), iHS (Voight et al. 2006) etc. that have demonstrated power and success at detecting mild sweeps and soft sweeps, regardless of if these are adaptive introgression.

      According to the Reviewer’s suggestion, we validated the new candidate adaptive introgressed loci by using also a method to formally test for the action of natural selection. In particular, we decided to use the LASSI (Likelihood-based Approach for Selective Sweep Inference) algorithm developed by Harris & DeGiorgio (2020) mainly for the following reasons: i) it is able to identify both strong and weak genomic signatures of positive selection similarly to others approaches, but additionally it can distinguish these signals by explicitly classifying genomic windows affected by hard or soft selective sweeps; ii) when applied on simulated data generated under different demographic models and by setting a range of different values for the parameters that describe a selective event (e.g., the time at which the beneficial mutation arose, the selection coefficient s) it has been proved to have an increased power with respect to traditional selection scans, such as nSL, H2/H1 and H12 (see Harris & DeGiorgio 2020 for further details).  

      According to such an approach, we were able to recapitulate signatures of natural selection previously observed in Tibetans for both EPAS1 and EGLN1 (Figure 4 – figure supplement 1 and 3C – 3D).  We also obtained comparable patterns for our previous candidate adaptive introgressed loci (i.e., EP300 and NOS2), as well as for the new ones that have been instead prioritized in the revised version of the manuscript according to consistent results also from VolcanoFinder, Signet and Haplostrips analyses (see Results, page 6 lines 30-35; Figure 4C, 4D, Figure 4 – figure supplement 2C and 2D).    

      With regard to the plausible archaic origin of the haplotypes under selection in these gene regions, my concern comes from the fact that other recent studies characterizing the archaic ancestry landscape in Tibetans and East Asians (eg. SPrime reports from Browning et al. 2018, as well as ArchaicSeeker reports from Yuan et al. 2021) didn't report archaic segments in regions overlapping with EP300 and NOS2. So how would the authors explain the discrepancy here, that adaptive introgression is detected yet there is little evidence of archaic segments in the regions? 

      We thank the Reviewer for the comment and the references provided. However, we read the suggested articles and in both of them it does not seem that genomes from individuals of Tibetan ancestry have been analysed. Moreover, in the study by Yuan et al. 2021 we were not able to find any table or supplementary table reporting the genomic segments showing signatures of Denisovan-like introgression in East Asian groups, with only findings from enrichment analyses performed on significant results being described for the Papuan population. Anyway, as reported below in the response to comment #5, in line with what observed by the Reviwer as concerns the original version of the manuscript, according to the additional validation analyses implemented during this revison EP300 and NOS2 received lower prioritization with respect to other loci showing more robust signatures supporting introgression of Denisovan alleles in the gene pool of Tibetan ancestors (i.e., TBC1D1, PRKAG2, KRAS and RASGRF2). Three out of four of these genes are in accordance also with previously published results supporting introgression of Denisovan alleles in the ancestors of present-day Han Chinese (Browning et al. 2018) or directly in the Tibetan genomes (Hu et al. 2017) (see Results, page 5 lines 10-21 and Supplement table 5). Despite that, the reason why not all the candidate adaptive introgression regions detected by our analyses are found among results from Browning et al. 2018 can be represented by the fact that in Han Chinese this archaic variation could have evolved neutrally after the introgression events, thus preventing the identification of chromosomal segments enriched in putative archaic introgressed variants according to VolcanoFinder and LASSI approaches (which consider also the impact of natural selection). In fact, the Sprime method implemented by Browning et al. 2018 focuses only on introgression events rather than adaptive introgression ones. For instance, the Denisovan-like regions identified with Sprime in Han Chinese by such a study do not comprise at all the EPAS1 region. 

      Additionally, looking at Figure 4 and Supplementary Figure 4, the authors showed haplotype comparisons between Tibetans, Denisovan, and Han Chinese for EP300 and NOS2 regions. However, in both figures, there are about equal number of Tibetans and Han Chinese that harbor the haplotype with somewhat close distance to the Denisovan genotype. And this closest haplotype is not even that similar to the Denisovan. So how would the authors rule out the possibility that instead of adaptive introgression, the selection was acting on just an ancestral modern human haplotype?

      We agree with the Reviewer that according to the analyses presented in the original version of the manuscript haplotype patterns observed at EP300 and NOS2 loci by means of the Haplostrips approach cannot ruled out the possibility that their adaptative evolution involved ancestral modern human haplotypes. In fact, after the modifications implemented in the adopted pipeline of analyses based on the Reviewers’ suggestions, their role in modulating complex adaptations to high-altitudes was confirmed also by results obtained with the LASSI algorithm (in addition to results from previous studies Bigham et al., 2010; Zheng et al., 2017; Deng et al., 2019; X. Zhang et al., 2020), but their putative archaic origin received lower prioritization with respect to other loci, being not confirmed by all the analyses performed.

      Furthermore, I have a question about how exactly the authors scored the genes in their network analysis using Signet. The manuscript mentioned they were looking for enrichment of archaic-like derived alleles, and in the methods section, they mentioned they used SNPs that are present in both Denisovan and Tibetan genomes but are not in the chimp ancestral allele state. But are these "derived" alleles also present in Han Chinese or Africans? If so, what are the frequencies? And if the authors didn't use derived alleles exclusively shared between Tibetans and Denisovans, that may lead to false positives of the enrichment analysis, as the result would not be able to rule out the selection on ancestral modern human variation.

      As mentioned in the response to comment #1, by following the suggestions of both the Reviewers we have modified the criteria adopted for filtering archaic derived variants exclusively shared between Denisovans and Tibetans. In particular, we retained as input for Signet analysis only those alleles that i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles) ii) were in their derived state and iii) were completely absent (i.e., show frequency equal to zero) in the Yoruba population sequenced by the 1000 Genome Project and used here as an outgroup by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. We instead decided to do not filter out potential Denisovan-like derived alleles present also in the Han Chinese population because multiple evidence agreed at indicating that gene flow from Denisovans occurred in the ancestral East Asian gene pool no sooner than 48–46 thousand years ago (Teixeira et al. 2019; Zhang et al. 2021; Yuan et al. 2021), thus predating the split between low-altitude and high-altitude groups, which occurred approximately 15 thousand years ago (Lu et al. 2016; Hu et al. 2017). In fact, traces of such an archaic gene-flow are still detectable in the genomes of several low-altitude populations of East Asian ancestry (Yuan et al. 2021).

      Concerning the above, I would also suggest the authors replot their Figure 4 and Figure S4 by adding the African population (eg. YRI) in the plot, and examine the genetic distance among the modern human haplotypes, in contrast to their distance to Denisovan.

      According to the Reviewer’s suggestion, after having identified new candidate adaptive introgressed loci according to the revised pipeline of analyses, we run the Haplostrips algorithm by including in the dataset 27 individuals (i.e., 54 haplotypes) from the Yoruba population sequenced by the 1000 Genomes Project (Figure 4A, 4B, Figure 4 - figure supplement 2A, 2B, 3A).

      Reviewer #2:

      In the methods the authors write "Since composite likelihood statistics are not associated with pvalues, we implemented multiple procedures to filter SNVs according to the significance of their LR values." What does significance mean here?

      After modifications applied to the adopted pipeline of analyses according to the Reviewers’ suggestions (see responses to public reviews and to comments #1, #3, #6, #7 of Reviewer #1), new candidate adaptive introgressed loci have been identified specifically by focusing on variants showing LR values falling in the top 5% of the genomic distribution obtained for such a statistic in order to adhere more strictly to the VolcanoFinder approach developed by Setter et al. 2020. Therefore, the related sentence in the materials and methods section was modified accordingly.

      Signet should be cited the first time it appears in the manuscript. The citation in the references is wrong. It lists R. Nielsen as the last author, but R. Nielsen is not an author of this paper.

      We thank the Reviewer for the comment. We have now mentioned the article by Gouy and Excoffier (2020) in the Results section where the Signet algorithm was first described and we have corrected the related reference.

      I could not find Figure 5 which is cited in the methods in the main text. I assume the authors mean Supplementary Figure 5, but the supplementary files have Figure 4.

      We thank the Reviewer for the comment. We have checked and modified figures included in the article and in the supplementary files to fix this issue.

      I didn't see a table with the genes identified as adaptatively introgressed with VolcanoFinder. This would be useful as I believe this is the first time VolcanoFinder is being used on Tibetan data?

      According to the Reviewer suggestion, we have reported in Supplement table 2 all the variants showing LR scores falling in the top 5% of the genomic distribution obtained for such a statistic, along with the associated α parameters computed by the VolcanoFinder algorithm.

      It is easier for the reviewer if lines have numbers.

      According to the Reviewer suggestion, we have included line numbers in the revised version of the manuscript.

    2. eLife Assessment

      This study presents valuable findings on what networks of genes were impacted by introgression from Denisovans, to identify the biological functions involved in high-altitude adaptation in Tibet. This study applies solid and previously validated methodology to identify genes with signatures of both introgression and positive selection. This paper would be of interest to population geneticists, anthropologists, and scientists studying the genetic basis underlying high-altitude adaptation.

    3. Reviewer #1 (Public review):

      Over the last decade, numerous studies have identified adaptation signals in modern humans driven by genomic variants introgressed from archaic hominins such as Neanderthals and Denisovans. One of the most classic signals comes from a beneficial haplotype in the EPAS1 gene in Tibetans that is evidently of Denisovan origin and facilitated high altitude adaptation (HAA). Given that HAA is a complex trait with numerous underlying genetic contributions, in this paper Ferraretti et al. asked whether Denisovan introgression facilitated HAA in other ways by contributing to additional HAA-related genetic variants. Specifically, the authors considered that if such signature exists, they most likely are only mild signals from polygenic selection, or soft sweeps on standing archaic variation, in contrast to a strong and nearly complete selection signal like the EPAS1. They leveraged a few recently developed methods, including a composite likelihood method for detecting adaptive introgression and a biological network-based method for detecting polygenic selection, and identified compelling evidence of additional genes that exhibit Denisovan-like adaptive introgression signature and contributed to the polygenic adaptation at high altitude in Tibetans.

      Strength:

      The study is well motivated by an important question, which is, whether archaic introgression can drive polygenic adaptation via multiple small effect contributions in genes underlying different biological pathways regulating a complex trait (such as HAA). This is a valid question and the influence of archaic introgression on polygenic adaptation has not been thoroughly explored by previous studies

      The authors reexamined previously published high-altitude Tibetan whole genome data and detected new evidence of adaptive introgression and polygenic selection. Specifically, by applying VolcanoFinder, they confirmed previously identified adaptive introgression alleles such as EPAS1 and PPARA. By applying signet, they identified subsets of biological pathways enriched for archaic variants that contributed to HAA polygenic selection. They also leveraged additional methods such as LASSI and haplotype plotting to help confirm the signature of natural selection on their newly discovered adaptive introgression candidate genes.

      Weakness:

      The manuscript also improved substantially since the initial review, and the new candidate genes presented here now harbor compelling and convincing evidence of both adaptive introgression and HAA polygenic selection. There are no notable weaknesses in the revised manuscript and updated results.

    4. Reviewer #2 (Public review):

      Summary:

      In Ferrareti et al. they identify adaptively introgressed genes using VolcanoFinder and then identify pathways enriched for adaptively introgressed genes. They use signet to identify pathways that are enriched for Denisovan alleles. The authors find that angiogenesis is one of the biological functions enriched for introgression.

      Strengths:

      Most papers that have studied the genetic basis of high altitude (HA) adaptation in Tibet have highly emphasized the role of a few genes (e.g. EPAS1, EGLN1), and in this paper the authors look for more subtle signals of selection in other genes to investigate how archaic introgression may be enriched at the pathway level. A couple of methods are used to confirm the consistency of the results.

      Looking into the biological functions enriched for Denisovan introgression in Tibetans is important for characterizing the impact of Denisovan introgression in facilitating high altitude adaptations.

      Weaknesses:

      I thank the authors for providing an improved version of their manuscript.

    1. eLife Assessment

      ProtSSN is a valuable approach that generates protein embeddings by integrating sequence and structural information, demonstrating improved prediction of mutation effects on thermostability compared to competing models. The evidence supporting the authors' claims is solid, with well-executed comparisons. This work will be of particular interest to researchers in bioinformatics and structural biology, especially those focused on protein function and stability.

    2. Reviewer #1 (Public review):

      After revisions:

      My concerns have been addressed.

      Prior to revisions:

      Summary:<br /> The authors introduce a denoising-style model that incorporates both structure and primary-sequence embeddings to generate richer embeddings of peptides. My understanding is that the authors use ESM for the primary sequence embeddings, take resolved structures (or use structural predictions from AlphaFold when they're not available), then develop an architecture to combine these two with a loss that seems reminiscent of diffusion models or masked language model approaches. The embeddings can be viewed as ensemble-style embedding of the two levels of sequence information, or with AlphaFold, an ensemble of two methods (ESM+AlphaFold). The authors also gather external datasets to evaluate their approach and compare it to previous approaches. The approach seems promising, and appears to out-compete previous methods at several tasks. Nonetheless, I have strong concerns about a lack of verbosity as well as exclusion of relevant methods and references.

      Advances:<br /> I appreciate the breadth of the analysis and comparisons to other methods. The authors separate tasks, models, and sizes of models in an intuitive, easy-to-read fashion that I find valuable for selecting a method for embedding peptides. Moreover, the authors gather two datasets for evaluating embeddings' utility for predicting thermostability. Overall, the work should be helpful for the field as more groups choose methods/pretraining strategies amenable to their goals, and can do so in an evidence-guided manner.

      Considerations:<br /> Primarily, a majority of the results and conclusions (e.g., Table 3) are reached using data and methods from ProteinGym, yet the best-performing methods on ProteinGym are excluded from the paper (e.g., EVE-based models and GEMME). In the ProteinGym database, these methods outperform ProtSSN models. Moreover, these models were published over a year---or even 4 years in the case of GEMME---before ProtSSN, and I do not see justification for their exclusion in the text.

      Secondly, related to comparison of other models, there is no section in the methods about how other models were used, or how their scores were computed. When comparing these models, I think it's crucial that there are explicit derivations or explanations for the exact task used for scoring each method. In other words, if the pre-training is indeed the important advance of the paper, the paper needs to show this more explicitly by explaining exactly which components of the model (and previous models) are used for evaluation. Are the authors extracting the final hidden layer representations of the model, treating these as features, then using these features in a regression task to predict fitness/thermostability/DDG etc.? How are the model embeddings of other methods being used, since, for example, many of these methods output a k-dimensional embedding of a given sequence, rather than one single score that can be correlated with some fitness/functional metric. Summarily, I think the text is lacking an explicit mention of how these embeddings are being summarized or used, as well as how this compares to the model presented.

      I think the above issues can mainly be addressed by considering and incorporating points from Li et al. 2024[1] and potentially Tang & Koo 2024[2]. Li et al.[1] make extremely explicit the use of pretraining for downstream prediction tasks. Moreover, they benchmark pretraining strategies explicitly on thermostability (one of the main considerations in the submitted manuscript), yet there is no mention of this work nor the dataset used (FLIP (Dallago et al., 2021)) in this current work. I think a reference and discussion of [1] is critical, and I would also like to see comparisons in line with [1], as [1] is very clear about what features from pretraining are used, and how. If the comparisons with previous methods were done in this fashion, this level of detail needs to be included in the text.

      To conclude, I think the manuscript would benefit substantially from a more thorough comparison of previous methods. Maybe one way of doing this is following [1] or [2], and using the final embeddings of each method for a variety of regression tasks---to really make clear where these methods are performing relative to one another. I think a more thorough methods section detailing how previous methods did their scoring is also important. Lastly, TranceptEVE (or a model comparable to it) and GEMME should also be mentioned in these results, or at the bare minimum, be given justification for their absence.

      [1] Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models Francesca-Zhoufan Li, Ava P. Amini, Yisong Yue, Kevin K. Yang, Alex X. Lu bioRxiv 2024.02.05.578959; doi: https://doi.org/10.1101/2024.02.05.578959<br /> [2] Evaluating the representational power of pre-trained DNA language models for regulatory genomics Ziqi Tang, Peter K Koo bioRxiv 2024.02.29.582810; doi: https://doi.org/10.1101/2024.02.29.582810

    3. Reviewer #2 (Public review):

      Summary:

      To design proteins and predict disease, we want to predict the effects of mutations on the function of a protein. To make these predictions, biologists have long turned to statistical models that learn patterns that are conserved across evolution. There is potential to improve our predictions however by incorporating structure. In this paper the authors build a denoising auto-encoder model that incorporates sequence and structure to predict mutation effects. The model is trained to predict the sequence of a protein given its perturbed sequence and structure. The authors demonstrate that this model is able to predict the effects of mutations better than sequence-only models.

      As well, the authors curate a set of assays measuring the effect of mutations on thermostability. They demonstrate their model also predicts the effects of these mutations better than previous models and make this benchmark available for the community.

      Strengths:

      The authors describe a method that makes accurate mutation effect predictions by informing its predictions with structure.

      Weaknesses:

      In the review period, the authors included a previous method, SaProt, that similarly uses protein structure to predict the effects of mutations, in their evaluations.<br /> They see that SaProt performs similarly to their method.

      Readers should note that methods labelled as "few-shot" in comparisons do not make use of experimental labels, but rather use sequences inferred as homologous; these sequences are also often available even if the protein has never been experimentally tested.

      ProteinGym is largely made of deep mutational scans, which measure the effect of every mutation on a protein. These new benchmarks contain on average measurements of less than a percent of all possible point mutations of their respective proteins. It is unclear what sorts of protein regions these mutations are more likely to lie in; therefore it is challenging to make conclusions about what a model has necessarily learned based on its score on this benchmark. For example, several assays in this new benchmark seem to be similar to each other, such as four assays on ubiquitin performed in pH 2.25 to pH 3.0.

      The authors state that their new benchmarks are potentially more useful than those of ProteinGym, citing Frazer 2021; readers should be aware that the mutations from the later source are actually mutations whose impact on human health has been determined through multiple sources, including population genetics, clinical evidence and some experiment.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1

      Summary:

      The authors introduce a denoising-style model that incorporates both structure and primary-sequence embeddings to generate richer embeddings of peptides. My understanding is that the authors use ESM for the primary sequence embeddings, take resolved structures (or use structural predictions from AlphaFold when they're not available), and then develop an architecture to combine these two with a loss that seems reminiscent of diffusion models or masked language model approaches. The embeddings can be viewed as ensemble-style embedding of the two levels of sequence information, or with AlphaFold, an ensemble of two methods (ESM+AlphaFold). The authors also gather external datasets to evaluate their approach and compare it to previous approaches. The approach seems promising and appears to out-compete previous methods at several tasks. Nonetheless, I have strong concerns about a lack of verbosity as well as the exclusion of relevant methods and references.

      Thank you for the comprehensive summary. Regarding the concerns listed in the review below, we have made point-to-point response. We also modified our manuscript in accordance. 

      Advances:

      I appreciate the breadth of the analysis and comparisons to other methods. The authors separate tasks, models, and sizes of models in an intuitive, easy-to-read fashion that I find valuable for selecting a method for embedding peptides. Moreover, the authors gather two datasets for evaluating embeddings' utility for predicting thermostability. Overall, the work should be helpful for the field as more groups choose methods/pretraining strategies amenable to their goals, and can do so in an evidence-guided manner.

      Thank you for recognizing the strength of our work in terms of the notable contributions, the solid analysis, and the clear presentation.

      Considerations:

      (1) Primarily, a majority of the results and conclusions (e.g., Table 3) are reached using data and methods from ProteinGym, yet the best-performing methods on ProteinGym are excluded from the paper (e.g., EVEbased models and GEMME). In the ProteinGym database, these methods outperform ProtSSN models. Moreover, these models were published over a year---or even 4 years in the case of GEMME---before ProtSSN, and I do not see justification for their exclusion in the text.

      We decided to exclude the listed methods from the primary table as they are all MSA-based methods, which are considered few-shot methods in deep learning (Rao et al., ICML, 2021). In contrast, the proposed ProtSSN is a zero-shot method that makes inferences based on less information than few-shot methods. Moreover, it is possible for MSA-based methods to query aligned sequences based on predictions. For instance, Tranception (Notin et al., ICML, 2022) selects the model with the optimal proportions of logits and retrieval results according to the average correlation score on ProteinGym (Table 10, Notin et al., 2022).

      With this in mind, we only included zero-shot deep learning methods in Table 3, which require no more than the sequence and structure of the underlying wild-type protein when scoring the mutants. In the revision, we have added the performance of SaProt to Table 3, and the performance of GEMME, TranceptEVE, and SaProt to Table 5. Furthermore, we have released the model's performance on the public leaderboard of ProteinGym v1 at proteingym.org.

      (2) Secondly, related to the comparison of other models, there is no section in the methods about how other models were used, or how their scores were computed. When comparing these models, I think it's crucial that there are explicit derivations or explanations for the exact task used for scoring each method. In other words, if the pre-training is indeed an important advance of the paper, the paper needs to show this more explicitly by explaining exactly which components of the model (and previous models) are used for evaluation. Are the authors extracting the final hidden layer representations of the model, treating these as features, and then using these features in a regression task to predict fitness/thermostability/DDG etc.? How are the model embeddings of other methods being used, since, for example, many of these methods output a k-dimensional embedding of a given sequence, rather than one single score that can be correlated with some fitness/functional metric? Summarily, I think the text lacks an explicit mention of how these embeddings are being summarized or used, as well as how this compares to the model presented.

      Thank you for the suggestion. Below we address the questions in three points. 

      (1) The task and the scoring for each method. We followed your suggestion and added a new paragraph titled “Scoring Function” on page 9 to provide a detailed explanation of the scoring functions used by other deep learning zero-shot methods.

      (2) The importance of individual pre-training modules. The complete architecture of the proposed ProtSSN model has been introduced on page 7-8. Empirically, the influence of each pre-training module on the overall performance has been examined through ablation studies on page 12. In summary, the optimal performance is achieved by combining all the individual modules and designs.

      (3) The input of fitness scoring. For a zero-shot prediction task, the final score for a mutant will be calculated by wildly-used functions named log-odds ratio (for encoder models, including ours) or loglikelihood (for autoregressive models or inverse folding models. In the revision, we explicitly define these functions in sections “Inferencing” (page 7) and “Scoring Function” (page 9). 

      (3) I think the above issues can mainly be addressed by considering and incorporating points from Li et al. 2024[1] and potentially Tang & Koo 2024[2]. Li et al.[1] make extremely explicit the use of pretraining for downstream prediction tasks. Moreover, they benchmark pretraining strategies explicitly on thermostability (one of the main considerations in the submitted manuscript), yet there is no mention of this work nor the dataset used (FLIP (Dallago et al., 2021)) in this current work. I think a reference and discussion of [1] is critical, and I would also like to see comparisons in line with [1], as [1] is very clear about what features from pretraining are used, and how. If the comparisons with previous methods were done in this fashion, this level of detail needs to be included in the text.

      The initial version did not include an explicit comparison with the mentioned reference due to the difference in the learning task. In particular, [1] formulates a supervised learning task on predicting the continuous scores of mutants of specific proteins. In comparison, we make zero-shot predictions, where the model is trained in a self-supervised learning manner that requires no labels from experiments. In the revision, we added discussions in “Discussion and Conclusion” (lines 476-484):

      Recommendations For The Authors:

      Comment 1

      I found the methods lacking in the sense that there is never a simple, explicit statement about what is the exact input and output of the model. What are the components of the input that are required by the user (to generate) or supply to the model? Are these inputs different at training vs inference time? The loss function seems like it's trying to de-noise a modified sequence, can you make this more explicit, i.e. exactly what values/objects are being compared in the loss?

      We have added a more detailed description in the "Model Pipeline" section (page 7), which explains the distinct input requirements for training and inference, as well as the formulation of the employed loss function. To summarize:

      (1) Both sequence and structure information are used in training and inference. Specifically, structure information is represented as a 3D graph with coordinates, while sequence information consists of AA-wise hidden representations encoded by ESM2-650M. During inference, instead of encoding each mutant individually, the model encodes the WT protein and uses the output probability scores relevant to the mutant to calculate the fitness score. This is a standard operation in many zero-shot fitness prediction models, commonly referred to as the log-odds-ratio.

      (2) The loss function compares the differences between the noisy input sequence and the output (recovered) AA sequence. Noise is added to the input sequences, and the model is trained to denoise them (see “Ablation Study” for the different types of noise we tested). This approach is similar to a one-step diffusion process or BERT-style token permutation. The model learns to recover the probability of each node (AA) being one of 33 tokens. A cross-entropy loss is then applied to compare this distribution with the ground-truth (unpermuted) AA sequence, aiming to minimize the difference.

      To better present the workflow, we revised the manuscript accordingly.

      Comment 2

      Related to the above, I'm not exactly sure where the structural/tertiary structure information comes from. In the methods, they don't state exactly whether the 3D coordinates are given in the CATH repository or where exactly they come from. In the results section they mention using AlphaFold to obtain coordinates for a specific task---is the use of AlphaFold limited only to these tasks/this is to show robustness whether using AlphaFold or realized coordinates?

      The 3D coordinates of all proteins in the training set are derived from the crystal structures in CATH v4.3.0 to ensure a high-quality input dataset (see "Training Setup," Page 8). However, during the inference phase, we used predicted structures from AlphaFold2 and ESMFold as substitutes. This approach enhances the generalizability of our method, as in real-world scenarios, the crystal structure of the template protein to be engineered is not always available. The associated descriptions can be found in “Training Setup” (lines 271-272) and “Folding Methods” (lines 429-435).

      Comment 3

      Lines 142+144 missing reference "Section establishes", "provided in Section ."

      199 "see Section " missing reference

      214 missing "Section"

      Thank you for pointing this out. We have fixed all missing references in the revision.

      Comment 4

      Table 2 - seems inconsistent to mention the number of parameters in the first 2 methods, then not in the others (though I see in Table 3 this is included, so maybe should just be omitted in Table 2).

      In Table 2, we present the zero-shot methods used as baselines. Since many methods have different versions due to varying hyperparameter settings, we decided to list the number of parameters in the following tables.

      We have double-checked both Table 3 and Table 5 and confirm that there is no inconsistency in the reported number of parameters. One potential explanation for the observed difference in the comment could be due to the differences in the number of parameters between single and ensemble methods. The ensemble method averages the predictions of multiple models, and we sum the total number of parameters across all models involved. For example, RITA-ensemble has 2210M parameters, derived from the sum of four individual models with 30M, 300M, 680M, and 1200M parameters.

      Comment 5

      In general, I found using the word "type" instead of "residue" a bit unnatural. As far as I can tell, the norm in the field is to say "amino acid" or "residue" rather than "type". This somewhat confused me when trying to understand the methods section, especially when talking about injecting noise (I figured "type" may refer to evolutionarily-close, or physicochemically-close residues). Maybe it's not necessary to change this in every instance, but something to consider in terms of ease of reading.

      Thank you for your suggestion. The term "type" we used is a common expression similar to "class" in the NLP field. To avoid further confusion to the biologists, we have revised the manuscript accordingly. 

      Comment 6

      197 should this read "based on the kNN "algorithm"" (word missing) or maybe "based on "its" kNN"?

      We have corrected the typo accordingly. It now reads “the 𝑘-nearest neighbor algorithm (𝑘NN)” (line 198).

      Comment 7

      200 weights of dimension 93, where does this number come from?

      The edge features are derived by Zhou et al., 2024. We have updated the reference in the manuscript for clarity (lines 201-202).

      Comment 8

      210-212 "representations of the noisy AA sequence are encoded from the noisy input" what is the "noisy AA sequence?" might be helpful to exactly defined what is "noisy input" or "noisy AA sequence". This sentence could potentially be worded to make it clearer, e.g. "we take the modified input sequence and embed it using [xyz]."

      We have revised the text accordingly. In the revised see lines 211-212:

      Comment 9

      In Table 3

      Formatting, DTm (million), (million) should be under "# Params" likely?

      Also for DDG this is reported on only a few hundred mutations, it might be worth plotting the confidence intervals over the Spearman correlation (e.g. by bootstrapping the correlation coefficient).

      We followed the suggestion and added “million” under the "# Params". We have added the bootstrapped results for DDG and DTm to Table 6. For each dataset, we randomly sampled 50% of the data for ten independent runs. ProtSSN achieves the top performance with a considerably small variance.

      Comment 10

      The paragraph in lines 319 to lines 328 I feel may lack sufficient evidence.

      "While sequence-based analysis cannot entirely replace the role of structure-based analysis, compared to a fully structure-based deep learning method, a protein language model is more likely to capture sufficient information from sequences by increasing the model scale, i.e., the number of trainable parameters."

      This claim is made without a citation, such as [1]. Increasing the scale of the model doesn't always align with improving out-of-sample/generalization performance. I don't feel fully convinced by the claim that worse prediction is ameliorated by increasing the number of parameters. In Table 3 the performance is not monotonic with (nor scales with) the number of parameters, even within a model. See ProGen2 Expression scores, or ESM-2 Stability scores, as a function of their model sizes. In [1], the authors discuss whether pretraining strategies are aligned with specific tasks. I think rewording this paragraph and mentioning this paper is important. Figure 3 shows that maybe there's some evidence for this but I don't feel entirely convinced by the plot.

      We agree that increasing the number of learnable parameters does not always result in better performance in downstream tasks. However, what we intended to convey is that language models typically need to scale up in size to capture the interactions among residues, while structure-based models can achieve this more efficiently with lower computational costs. We have rephrased this paragraph in the paper to clarify our point in lines 340-342.

      Comment 11

      Line 327 related to my major comment, " a comprehensive framework, such as ProtSSN, exhibits the best performance." Refers to performance on ProteinGym, yet the best-performing methods on ProteinGym are excluded from the comparison.

      The primary comparisons were conducted using zero-shot models for fairness, meaning that the baseline models were not trained on MSA and did not use test performance to tune their hyperparameters. It's also worth noting that SaProt (the current SOTA model) had not been updated on the leaderboard at the time of submitting this paper. In the revised manuscript, we have included GEMME and TranceptEVE in Table 5 and SaProt in Tables 3, 5, and 6. While ProtSSN does not achieve SOTA performance in every individual task, our key argument in the analysis is to highlight the overall advantage of hybrid encoders compared to single sequence-based or structure-based models. We made clearer statement in the revised manuscript (line 349):

      Comment 12

      Line 347, line abruptly ends "equivariance when embedding protein geometry significantly." (?).

      We have fixed the typo, (lines 372-373): 

      Comment 13

      Figure 3 I think can be made clearer. Instead of using True/false maybe be more explicit. For example in 3b, say something like "One-hot encoded" or "ESM-2 embedded".

      The labels were set to True/False with the title of the subfigures so that they can be colored consistently.

      Following the suggestion, we have updated the captions in the revised manuscript for clarity.

      Comment 14

      Lines 381-382 "average sequential embedding of all other Glycines" is to say that the score is taken as the average score in which Glycine is substituted at every other position in the peptide? Somewhat confused by the language "average sequential embedding" and think rephrasing could be done to make things clearer.

      We have revised the related text accordingly a for clearer presentation (lines 406-413). 

      Comment 15

      Table 5, and in mentions to VEP, if ProtSSN is leveraging AlphaFold for its structural information, I disagree that ProtSSN is not an MSA method, and I find it unfair to place ProtSSN in the "non-MSA" categories. If this isn't the case, then maybe making clearer the inputs etc. in the Methods will help.

      Your response is well-articulated and clear, but here is a slight revision for improved clarity and flow:

      We respectfully disagree with classifying a protein encoding method based solely on its input structure. While AF2 leverages MSA sequences to predict protein structures, this information is not used in our model, and our model is not exclusive to AF2-predicted structures. When applicable, the model can encode structures derived from experimental data or other folding methods. For example, in the manuscript, we compared the performance of ProtSSN using proteins folded by both AF2 and ESMFold.

      However, we would like to emphasize that comparing the sensitivity of an encoding method across different structures or conformations is not the primary focus of our work. In contrast, some methods explicitly use MSA during model training. For instance, MSA-Transformer encodes MSA information directly into the protein embedding, and Tranception-retrieval utilizes different sets of MSA hyperparameters depending on the validation set's performance.

      To avoid further confusion, we have revised the terms "MSA methods" and "non-MSA methods" in the manuscript to "zero-shot methods" and "few-shot methods."

      Comment 16

      Table 3 they're highlighted as the best, yet on ProteinGym there's several EVE models that do better as well as GEMMA, which are not referenced.

      The comparison in Table 3 focuses on zero-shot methods, whereas GEMME and EVE are few-shot models. Since these methods have different input requirements, directly comparing them could lead to

      unfair conclusions. For this reason, we reserved the comparisons with these few-shot models for Table 5, where we aim to provide a more comprehensive evaluation of all available methods.            

      Response to Reviewer 2

      Summary:

      To design proteins and predict disease, we want to predict the effects of mutations on the function of a protein. To make these predictions, biologists have long turned to statistical models that learn patterns that are conserved across evolution. There is potential to improve our predictions however by incorporating structure. In this paper, the authors build a denoising auto-encoder model that incorporates sequence and structure to predict mutation effects. The model is trained to predict the sequence of a protein given its perturbed sequence and structure. The authors demonstrate that this model is able to predict the effects of mutations better than sequence-only models.

      Thank you for your thorough review and clear summary of our work. Below, we provide a detailed, pointby-point response to each of your questions and concerns. 

      Strengths:

      The authors describe a method that makes accurate mutation effect predictions by informing its predictions with structure.

      Thank you for your clear summary of our highlights.

      Weaknesses:

      Comment 1

      It is unclear how this model compares to other methods of incorporating structure into models of biological sequences, most notably SaProt.

      (https://www.biorxiv.org/content/10.1101/2023.10.01.560349v1.full.pdf).

      In the revision, we have updated the performance of SaProt single models (with both masked and unmasked versions with the pLDDT score) and ensemble models in the Tables 3, 5, and 6.

      In the revised manuscript, we have updated the performance results for SaProt's single models (both masked and unmasked versions with the pLDDT score) as well as the ensemble models. These updates are reflected in Tables 3, 5, and 6.

      Comment 2

      ProteinGym is largely made of deep mutational scans, which measure the effect of every mutation on a protein. These new benchmarks contain on average measurements of less than a percent of all possible point mutations of their respective proteins. It is unclear what sorts of protein regions these mutations are more likely to lie in; therefore it is challenging to make conclusions about what a model has necessarily learned based on its score on this benchmark. For example, several assays in this new benchmark seem to be similar to each other, such as four assays on ubiquitin performed at pH 2.25 to pH 3.0.

      We agree that both DTm and DDG are smaller datasets, making them less comprehensive than ProteinGym. However, we believe DTm and DDG provide valuable supplementary insights for the following reasons:

      (1) These two datasets are low-throughput and manually curated. Compared to datasets from highthroughput experiments like ProteinGym, they contain fewer errors from experimental sources and data processing, offering cleaner and more reliable data.

      (2) Environmental factors are crucial for the function and properties of enzymes, which is a significant concern for many biologists when discussing enzymatic functions. Existing benchmarks like ProteinGym tend to simplify these factors and focus more on global protein characteristics (e.g., AA sequence), overlooking the influence of environmental conditions.

      (3) While low-throughput datasets like DTm and DDG do not cover all AA positions or perform extensive saturation mutagenesis, these experiments often target mutations at sites with higher potential for positive outcomes, guided by prior knowledge. As a result, the positive-to-negative ratio is more meaningful than random mutagenesis datasets, making these benchmarks more relevant for evaluating model performance.

      We would like to emphasize that DTm and DDG are designed to complement existing benchmarks rather than replace ProteinGym. They address different scales and levels of detail in fitness prediction, and their inclusion allows for a more comprehensive evaluation of deep learning models.

      Recommendations For The Authors:

      Comment 1

      I recommend including SaProt in your benchmarks.

      In the revision, we added comparisons with SaProt in all the Tables (3, 5 and 6). 

      Comment 2

      I also recommend investigating and giving a description of the bias in these new datasets.

      The bias of the new benchmarks could be found in Table 1, where the mutants are distributed evenly at different level of pH values.

      In the revision, we added a discussion regarding the new datasets in “Discussion and Conclusion” (lines 496-504 of the revised version).

      Comment 3

      I also recommend reporting the model's ability to predict disease using ClinVar -- this experiment is conspicuously absent.

      Following the suggestion, we retrieved 2,525 samples from the ClinVar dataset available on ProteinGym’s website. Since the official source did not provide corresponding structure files, we performed the following three steps:

      (1) We retrieved the UniProt IDs for the sequences from the UniProt website and downloaded the corresponding AlphaFold2 structures for 2,302 samples.

      (2) For the remaining proteins, we used ColabFold 1.5.5 to perform structure prediction.

      (3) Among these, 12 proteins were too long to be folded by ColabFold, for which we used the AlphaFold3 server for prediction.

      All processed structural data can be found at https://huggingface.co/datasets/tyang816/ClinVar_PDB. Our test results are provided in the following table. ProtSSN achieves the top performance over baseline methods.

      Author response table 1.

    1. eLife assessment

      This fundamental work provides creative and thoughtful analysis of rodent foraging behavior and its dependence on body reference frame-based vs world reference frame-based cues. Compelling evidence demonstrates that a robust map, capable of supporting taking novel shortcuts, is learned primarily if not exclusively based on self-motion cues, which has rarely if ever been reported outside of the human literature. The work, which will be of interest to a broad audience of neuroscientists, provides a rich discussion about the role of the hippocampus in supporting the behavior that should guide future neurophysiological investigations.

    1. eLife Assessment

      This fundamental work provides creative and thoughtful analysis of rodent foraging behavior and its dependence on body reference frame-based vs world reference frame-based cues. Compelling evidence demonstrates that a robust map, capable of supporting taking novel shortcuts, can be learned primarily if not exclusively based on self-motion cues, which has rarely if ever been reported outside of the human literature. The work, which will be of interest to a broad audience of neuroscientists, provides a rich discussion about the role of the hippocampus in supporting the behavior that should guide future neurophysiological investigations.

    2. Reviewer #1 (Public review):

      Assessment:

      This fundamental work advances our understanding of navigation and path integration in mammals by using a clever behavioral paradigm. The paper provides compelling evidence that mice are able to create and use a cognitive map to find "short cuts" in an environment, using only the location of rewards relative to the point of entry to the environment and path integration, and need not rely on visual landmarks.

      Summary:

      The authors have designed a novel experimental apparatus called the 'Hidden Food Maze (HFM)' and a beautiful suite of behavioral experiments using this apparatus to investigate the interplay between allothetic and idiothetic cues in navigation. The results presented provide a clear demonstration of the central claim of the paper, namely that mice only need a fixed start location and path integration to develop a cognitive map. The experiments and analyses conducted to test the main claim of the paper -- that the animals have formed a cognitive map -- are conclusive and include many thoughtfully designed control experiments to eliminate alternatives.

      Strengths:

      The 90 degree rotationally symmetric design and use of 4 distal landmarks and 4 quadrants with their corresponding rotationally equivalent locations (REL) lends itself to teasing apart the influence of path integration and landmark-based navigation in a clever way. The authors use a complete set of experiments and associated controls to show that mice can use a start location and path integration to develop a cognitive map and generate shortcut routes to new locations.

      Weaknesses:

      There were no major weaknesses identified that were not addressed during revisions.

    3. Reviewer #3 (Public review):

      Summary:

      How is it that animals find learned food locations in their daily life? Do they use landmarks to home in on these learned locations or do they learn a path based on self-motion (turn left, take ten steps forward, turn right, etc.). This study carefully examines this question in a well-designed behavioral apparatus. A key finding is that to support the observed behavior in the hidden food arena, mice appear to not use the distal cues that are present in the environment for performing this task. Removal of such cues did not change the learning rate, for example. In a clever analysis of whether the resulting cognitive map based on self-motion cues could allow a mouse to take a shortcut, it was found that indeed they are. The work nicely shows the evolution of the rodent's learning of the task, and the role of active sensing in the targeted reduction of uncertainty of food location proximal to its expected location.

      Strengths:

      A convincing demonstration that mice can synthesize a cognitive map for the finding of a static reward using body frame-based cues. Showing that uncertainty of final target location is resolved by an active sensing process of probing holes proximal to the expected location. Showing that changing the position of entry into the arena rotates the anticipated location of the reward in a manner consistent with failure to use distal cues.

      Weaknesses:

      Weaknesses: The Reviewing Editor felt that previously identified weaknesses from Reviewer #3 were adequately addressed in the final manuscript.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      I have added a paragraph that addresses the issue of how landmarks might be used and why they are not. The suggestions made in the "Weaknesses" paragraph were concise and excellent and have directly incorporated them into my revised manuscript. This text appears on Page 21 and is shown below. I hope that this is what the editors and reviewers were looking.

      The requested revision is the second paragraph.

      The first paragraph was not written in response to reviews but inspired by a recent paper by Mahdev et al (2024) - https://doi.org/10.1038/s41593-024-01681-9.  I had already requested to add this reference and was encouraged to do so by the Editors. The Mahdev et al paper was very surprising in that it showed that path integration is not constant but that its "gain" can be recalibrated by selfmotion signals. I wondered whether this unexpected capacity extended to path integration also recalibrating the cognitive map and thereby generating the shortcutting behavior we observe. I suggested that, at an abstract level, this would correspond to "coordinate transformation" of the cognitive map. I realize that this is entirely speculative. If the Editors feel that it does not add much to the manuscript and that the speculation goes to far, I will remove the first paragraph and re-submit.

      Added text. P21 and just before the heading: " Implications for theories of hippocampal representations of spatial maps" There were no other changes made in the paper.

      "Path integration uses self-motion signals to update the animal's estimated location on its internal cognitive map. Path integration gain has been shown to be plastic and regulated by landmarks (52). Remarkably, a recent study has revealed that path integration gain can also be directly recalibrated by self-motion signals alone (53), albeit not as effectively as by landmarks (52, 53). An interesting question for future research is whether self-motion signals can also recalibrate the coordinates of a cognitive map. From this perspective, the Target B to Target A shortcut requires a transformation of the cognitive map coordinates so that the start point is now Target B.

      Extensive research has shown that external cues can control hippocampal neuron place fields (11, 12, 54) and the gain of the path integrator (52), making the failure of mice in our study to use such cues puzzling. The failure to use landmarks may be related to our task being low stakes and our pretraining procedure teaching the mouse that such cues are not necessary. Our results may not generalize to more natural conditions where many reliable prominent cues are available, and where there is urgency to find food or water while avoiding predation (55). Under these more naturalistic conditions the use of distal cues to rapidly find a food reward is more likely to be observed."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. Utilizing single-nucleus RNA sequencing (snRNA-seq), the study explores how CLA supplementation alters cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles.

      Thanks!

      Strengths:

      Innovative approach: The use of snRNA-seq provides a high-resolution insight into the cellular heterogeneity of pig skeletal muscle, enhancing our understanding of the intricate cellular dynamics influenced by nutritional regulation strategy.

      Robust validation: The study utilizes multiple pig models, including Heigai and Laiwu pigs, to validate the differentiation trajectories of adipocytes and the effects of CLA on muscle fiber type transformation. The reproducibility of these findings across different (nutritional vs genetic) models enhances the reliability of the results.

      Advanced data analysis: The integration of pseudotemporal trajectory analysis and cell-cell communication analysis allows for a comprehensive understanding of the functional implications of the cellular changes observed.

      Practical relevance: The findings have significant implications for improving meat quality, which is valuable for both the agricultural and food industry.

      Thanks!

      Weaknesses:

      Model generalizability: While pigs are excellent models for human physiology, the translation of these findings to human health, especially in diverse populations, needs careful consideration.

      Thanks!

      Reviewer #2 (Public Review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs).  The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Thanks!

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Thanks!

      Weaknesses:

      While the authors generated a sizeable comprehensive dataset, cellular and molecular validation needed to be improved. For example, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, yet these data are not validated by other methodologies. Similarly, the authors suggest that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, no cellular and molecular analysis was performed to reveal if these trajectories indeed apply. Attempts to identify JNK signaling pathways appear superficial and do not delve deeper into mechanistic action or transcriptional regulation. Notably, a variety of single cell studies have been performed on mouse/human skeletal muscle and adipose tissues. Yet, the authors need to discuss how the populations they have identified support the existing literature on cell-type populations in skeletal muscle.Moreover, the authors nicely incorporate the two pig models into their results, but the authors only examine one muscle group. It would be interesting if other muscle groups respond similarly or differently in response to linoleic acid supplementation.Further, it was unclear whether Heigai and Laiwu pigs were both fed conjugated linoleic acid or whether the comparison between Heigai-fed linoleic acid and Laiwu pigs (as a model of high intramuscular fat). With this in mind, the authors do not discuss how their results could be implicated in human and pig nutrition, such as desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets. Moreover, the experimental designs do not denote the conjugated linoleic acid supplementation duration. Several immunostainings performed could be quantified to validate statements. This reviewer also found the Nile Red staining hard to interpret visually and did not appear to support the conclusions convincingly. Within Figure 7, several letters (assuming they represent statistical significance) are present on the graphs but are not denoted within the figure legend.

      Thanks for your suggestions! We accepted your suggestion to revised our manuscript.

      For changes in myofiber type, we performed qPCR to verify the changes of muscle fiber type related gene expression after CLA treatment (Figure 2E); for changes of adipocyte and preadipocyte populations, we also performed immunofluorescence staining, qPCR, and western blotting in LDM tissues and FAPs to verify the alterations of cell types after feeding with CLA (Figure 3D, 3E, 6G, 7C, and 7D). Hence, we think these cellular and molecular results could support our conclusions.

      For JNK signaling pathway, we selected this signaling pathway based on snRNA-seq dataset and verified by activator in vitro experiment. However, we did not explore the mechanistic action and the downstream transcriptional regulators need to be further discussed. We have added these in the discussion part (line 443-448).

      We have added the comparation between different cell-type populations in skeletal muscles (line 362-368 and 385-390).

      For changes in myofiber type of Laiwu pigs, we have discussed in our previous study(Wang et al., 2023). Interestingly, we also found in high IMF content Laiwu pigs, the percentage of type IIa myofibers had an increased tendency (29.37% vs. 23.95%) while the percentage of type IIb myofibers had a decreased tendency (38.56% vs. 43.75%) in this study. We also added this discussion in the discussion part (line 392-395).

      We have supplied the information of treatment in the materials and methods part (line 469-478). We also added the discussion about significance of our study for human and pig nutrition in the discussion part (line 375-376 and 446-447).

      Our data will be made available on reasonable request (line 574-576).

      We have supplied the information of the CLA supplementation duration in the materials and methods part (line 465).

      Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A). In Figure 7, the Nile Red staining could be quantified and we have the quantification of Oil Red O staining (Figure 7B and 7J). We also added the statistical significance in figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses

      Cross-species analysis: To strengthen the generalizability of the results, it would be beneficial to include a comparative analysis with other species, such as human, bovine, or rodent models, using publicly available snRNA-seq datasets.

      Thanks! Our previous study has compared the conserved and unique signatures in fatty skeletal muscles between different species(Wang, Zhou, Wang, & Shan, 2024). We mainly focused on the regulatory mechanism of CLAs in regulating intramuscular fat deposition. However, there is still a blank in the snRNA-seq or scRNA-seq datasets about the effects of CLAs on regulating fat deposition in muscles across other species, including human, bovine or rodent models. Hence, we only analyze the regulatory mechanisms of CLAs influencing intramuscular fat deposition in pigs.

      Functional link: the authors should discuss in the manuscript how the muscles differ in terms of texture, flavor, aroma, etc. before and after CLA administration or between Heigai and Laiwu to provide context and help readers better understand how the observed high-resolution cellular changes relate to these functional properties of meat.

      Thanks! We have added these in the introduction part (line 90-98).

      Improve figures: some figures, particularly those involving Oil Red O and Nail Red, could be improved by including higher magnification images to assess the organization of lipid droplets of individual adipocytes (Figure 7A, I, and K).

      Thanks! Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A).

      Reviewer #2 (Recommendations For The Authors):

      All of my comments are above. However, I would recommend improving the writing as several areas throughout the results needed clarity.

      Thanks! We have revised our manuscript carefully after accepting your revisions.

      Wang, L., Zhao, X., Liu, S., You, W., Huang, Y., Zhou, Y., . . . Shan, T. (2023) Single-nucleus and bulk RNA sequencing reveal cellular and transcriptional mechanisms underlying lipid dynamics in high marbled pork NPJ Sci Food 7: 23. https://doi.org/10.1038/s41538-023-00203-4

      Wang, L., Zhou, Y., Wang, Y., & Shan, T. (2024) Integrative cross-species analysis reveals conserved and unique signatures in fatty skeletal muscles Sci Data 11: 290. https://doi.org/10.1038/s41597-024-03114-5

    2. eLife Assessment

      This revised study provides valuable information on the single nucleus RNA sequencing transcriptome, pathways, and cell types in pig skeletal muscle in response to conjugated linoleic acid (CLA) supplementation. Based on the comprehensive data analyses, the data are considered compelling and provide new insight into the mechanisms underlying intramuscular fat deposition and muscle fiber remodeling. The revised study clarifies major aspects of its methodology and analysis, addresses previous reviewer concerns, and contributes significantly to the understanding of nutritional strategies for fat infiltration in pig muscle.

    3. Reviewer #1 (Public review):

      In this revised manuscript, the authors aim to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. They have utilized single-nucleus RNA sequencing (snRNA-seq) to explore the effects of CLA supplementation on cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles. Notably, the authors have made significant efforts in addressing the previous concerns raised by the reviewers, clarifying key aspects of their methodology and data analysis.

      Strengths:

      (1) Thorough validation of key findings: The authors have addressed the need for further validation by including qPCR, immunofluorescence staining, and western blotting to verify changes in muscle fiber types and adipocyte populations, which strengthens their conclusions.

      (2) Improved figure presentation: The authors have enhanced figure quality, particularly for the Oil Red O and Nile Red staining images, which now better depict the organization of lipid droplets (Figure 7A). Statistical significance markers have also been clarified (Figure 7I and 7K).

      Weaknesses:

      (1) Cross-species analysis and generalizability of the results: Although the authors could not perform a comparative analysis across species due to data limitations, they acknowledged this gap and focused on analyzing regulatory mechanisms specific to pigs. Their explanation is reasonable given the current availability of snRNA-seq datasets on muscle fat deposition in other human and mouse.

      (2) Mechanistic depth in JNK signaling pathway: While the inclusion of additional experiments is a positive step, the exploration of the JNK signaling pathway could still benefit from deeper analysis of downstream transcriptional regulators. The current discussion acknowledges this limitation, but future studies should aim to address this gap fully.

      (3) Limited exploration of other muscle groups: The authors did not expand their analysis to additional muscle groups, leaving some uncertainty regarding whether other muscle groups might respond differently to CLA supplementation. Further studies in this direction could enhance the understanding of muscle fiber dynamics across the organism.

    4. Reviewer #2 (Public review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs).  The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Weaknesses:

      Although the authors compiled a substantial and comprehensive dataset, the scope of cellular and molecular-level validation still needs to be expanded. For instance, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, but these findings need more thorough validation. Further histological and physiological assessments are necessary to address fiber types and oxidative potential. Similarly, the authors propose that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, there are limited cellular and molecular analyses to confirm these findings. The identified JNK signaling pathways require additional follow-ups on the molecular mechanism or transcriptional regulation. However, these issues are discussed as potential areas for future exploration. While various individual studies have been conducted on mouse/human skeletal muscle and adipose tissues, these have only been briefly discussed, and further investigation is warranted. Additionally, the authors incorporate two pig models into their results, but they only examine one muscle group. Exploring whether other muscle groups respond similarly or differently to linoleic acid supplementation would be valuable. Furthermore, the authors should discuss how their results translate to human and pig nutrition, such as the desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while the single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets.

    1. eLife Assessment

      This valuable study addresses potential roles of the master regulator of X chromosome inactivation, the Xist long non-coding RNA, in autosomal gene regulation. Using data from mouse cells, the authors propose that Xist can coat specific autosomal promoters, which in turn leads to the attenuation of their transcriptional activity, complementing recently published results from humans. While the evidence from individual genes is suggestive, shortcomings in the data and statistical analyses leave the evidence currently incomplete. The work would be of interest to anyone studying gene regulation and noncoding RNAA

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yao S. and colleagues aims to monitor the potential autosomal regulatory role of the master regulator of X chromosome inactivation, the Xist long non-coding RNA. It has recently become apparent that in the human system, Xist RNA can not only spread in cis on the future inactive X chromosome but also reach some autosomal regions where it recruits transcriptional repression and Polycomb marking. Previous work has also reported that Xist RNA can show a diffused signal in some biological contexts in FISH experiments.

      In this study, the authors investigate whether Xist represses autosomal loci in differentiating female mouse embryonic stem cells (ESCs) and somatic mouse embryonic fibroblasts (MEFs). They perform a time course of ESC differentiation followed by Capture Hybridization of Associated RNA Targets (CHART) on both female and male ESCs, as well as pulldowns with sense oligos for Xist. The authors also examine transcriptional activity through RNA-seq and integrate this data with prior ChIP-seq experiments. Additional experiments were conducted in MEFs and Xist-ΔB repeat mutants, the latter fails to recruit Polycomb repressors.

      Based on this experimental design, the authors make several bold claims:

      (1) Xist binds to about a hundred specific autosomal regions.<br /> (2) This binding is specific to promoter regions rather than broad spreading.<br /> (3) Xist autosomal signal is inversely correlated with PRC1/2 marks but positively correlated with transcription.<br /> (4) Xist targeting results in the attenuation of transcription at autosomal regions.<br /> (5) The B-repeat region is important for autosomal Xist binding and gene repression.<br /> (6) Xist binding to autosomal regions also occurs in somatic cells but does not lead to gene repression.

      Together, these claims suggest that Xist might play a role in modulating the expression of autosomal genes in specific developmental and cellular contexts in mice.

      Strengths:

      This paper deals with an interesting hypothesis that Xist ncRNA can also function at autosomal loci.

      Weaknesses:

      The claims reported in this paper are largely unsubstantiated by the data, with multiple misinterpretations, lacking controls, and inadequate statistics. Fundamental flaws in the experimental design/analysis preclude the validity of the findings. Major concerns are listed below:

      (1) The entire paper is based on the CHART observation that Xist is specifically targeted to autosomal promoters. Overall, the data analysis is flawed and does not support such conclusions. Importantly the sense WT and the 0h controls are not used, nor are the biological replicates. Data is typically visualized without quantification, and when quantified, control loci/gene sets are erroneously selected. Firstly, CHART validation on the X in FigS1 is misleading and not based on any quantifications (e.g., see the scale on Kdm6a (0-190) compared to Cdkl5 (0-40)). If scaled appropriately, there is Xist signal on the escapee. All X-linked loci should have been quantified and classified based on escape status; sense control should also be quantified, and biological replicates should be shown separately. Secondly, and most importantly, Figure 1 does not convincingly show specific Xist autosomal binding. Panel A quantification is on extremely variable y-scales and actually shows that Xist is recruited globally to nearly all autosomal genes, likely indicating an unspecific signal. Again, the sense and 0h controls should have been quantified along with biological replicates. Upon inspecting genome browser tracks of all regions reported in the manuscript (Rbm14, Srp9, Brf1, Cand2, Thra, Kmt2c, Kmt2e, Stau2, and Bcl7b), the signal is unspecific on all sites with the possible exception of Kmt2e. On all other loci, there is either a strong signal in the 0h ESC controls or more signal in some of the sense controls. This implies that peak calling is picking up false positive regions. How many peaks would have been picked up if the sense or the 0h controls were used for peak calling? It is likely that there would be a lot since there are also possible "peaks" (e.g., Fzd9) in control tracks. Further inspection of the data was not possible as the authors did not provide access to the raw fastq files. When inspecting results from past published experiments {Engreitz, 2013 #1839} reported regions were not bound by Xist. Thirdly, contrary to the authors' claim, deleting the B repeat does not lead to a loss of autosomal signal. Indeed, comparing Fig1A and Fig2B side by side clearly shows no difference in the autosomal signal, likely because the autosomal signal is CHART background. Properly quantifying the signal with separate replicates as well as the sense and 0h controls is vital. Overall current data together with published results indicate that CHART peak calling on autosomes is due to technical noise or artefacts.

      (2) The RNA-seq analysis is also flawed and precludes strong statements. Firstly, the analysis frequently lacks statistical analysis (Fig3B, FigS2B-C) and is often based on visualizations (Fig 3D-G) without quantifications. Day 4 B-repeat deletion does not lead to a significant change in the expression of genes close to Xist signal (Fig3H, d14 does not fully show). Secondly, for all transcriptional analysis, it is important to show autosomal non-target genes, which is not always done. Indeed, both males and B repeat deletion will lead to transcriptional changes on autosomes as a secondary effect from different X inactivation status. The control set, if used, is inappropriate as it compares one randomly selected set of ~100 genes. This introduces sampling error and compares different classes of genes. Since Xist signal targets more active genes, it is important to always compare autosomal target genes to all other autosomal genes with similar basal expression patterns.

      (3) The ChIP-seq analysis also has some problems. The authors claim that there is no positive correlation between genes close to Xist autosomal binding (10kb) compared to those 50kb away (Fig 3C, S2D); however, this analysis is based entirely on metagene visualization. Signal within the Xist binding sites should be quantified (not genes close by) and compared to other types of genomic loci and promoters. Focusing on the 50kb group only as controls is misleading. Secondly, the authors only look at PRC mark signal upon differentiation; what about the 0h timepoint, i.e., is there pre-marking? Most worryingly, the data analysis is not consistent between figures (see Fig3C vs 5H-I). In Fig5, the group of Xist targets was chosen as those within 100kb of Xist binding, which would encompass all the control regions from Fig3C. In this analysis, the authors report that there is Xist-dependent H3K27me3 deposition, and in fact, here the Xist autosomal targets have more of it than the controls. Overall, all of this analysis is misleading, and clear conclusions cannot be made.

      All in all, because the fundamental observation is not robust (see point 1), all subsequent analyses are also affected. There are also multiple other inconsistencies within the analysis; however, they have not been included here for brevity.

    3. Reviewer #2 (Public review):

      Summary:

      To follow-up on recent reports of Xist-autosome interaction the authors examine female (and male transgenic) mESCs and MEFs by CHARTseq. Upon finding that only 10% of reads map to X, they sought to identify reproducible alternative sites of Xist-binding, and identify ~100 autosomal Xist-binding sites and show a transient impact on expression.

      Strengths:

      The authors address a topical and interesting question with a series of models including developmental timepoints and utilize unbiased approaches (CHARTseq, RNAseq). For the CHARTseq they have controls of both sense probes and male cells; and indeed do detect considerable background with their controls. The use of deletions emphasizes that intact functional Xist is involved. The use of 'metagene' plots provides a visual summation of genic impact.

      Weaknesses:

      Overall, the result presentation has many 'sample' gene presentations (in contrast to the stronger 'metagene' summation of all genes). The manuscript often relies on discussion of prior X chromosomal studies, while the data generated would allow assessment of the X within this study to confirm concordance with prior results using the current methodology/cell lines. Many of the 'follow-up' analyses are in fact reprocessing and comparison of published datasets. The figure legends are limited, and sample size and/or source of control is not always clear. While similar numbers of autosomal Xist-binding sites were often observed, the presented data did not clarify how many were consistent across time-points/cell types. While there were multiple time points/lines assessed, only 2 replicates were generally done.

      Aim achievement:

      The authors do identify autosomal sites with enrichment of chromatin marks and evidence of silencing. More details regarding sample size and controls (both treatment, and most importantly choice of 'non-targets' - discussed in comments to authors) are required to determine if the results support the conclusions.

      Specific scenarios for which I am concerned about the strength of evidence underlying the conclusion:

      I found the conclusion "Thus, RepB is required not only for Xist to localize to the X- chromosome but also for its localization to the ~100 autosomal genes " (p5) in constrast to the statement 2 lines prior: "A similar number of Xist peaks across autosomes in ΔRepB cells was observed and the autosomal targets remained similar". Some quantitative statistics would assist in determining impact, both on autosomes and also X; perhaps similar to the quintile analysis done for expression.

      It is stated that there is a significant suppression of X-linked genes with the autosomal transgenes; however, only an example is shown in Figure 4B. To support this statement, a full X chromosomal geneset should be shown in panels F and G, which should also list the number of replicates. As these are hybrid cells, perhaps allelic suppression could be monitored? Is Med14 usually subject to X inactivation in the Ctrl cells, and is the expression reduced from both X chromosomes or preferentially the active (or inactive) X chromosome?

      The expression change for autosomes after transgene induction is barely significant; and it was not clear what was used as the Ctrl? This is a critical comparator as doxycycline alone can change expression patterns.

      In the discussion there is the statement. "Genetic analysis coupled to transcriptomic analysis showed that Xist down-regulates the target autosomal genes without silencing them. This effect leads to clear sex difference - where female cells express the ~100 or so autosomal genes at a lower level than male cells (Figure 7H)." This sweeping statement fails to include that in MEFs there is no significant expression difference, in transgenics only borderline significance, and at d14 no significant expression difference. The down-regulation overall seems to be transient during development while targeting is ongoing?

      Finally, I would have liked to see discussion of the consistency of the identified genes to support the conclusion that the autosomal sites are not merely the results of Xist diffusion.

      The impact of Xist on autosomes is important for consideration of impact of changes in Xist expression with disease (notably cancers). Knowing the targets (if consistent) would enable assessment of such impact.

    4. Reviewer #3 (Public review):

      Summary:

      Yao et al use CHART to identify chromatin associated with Xist in female mouse ESCs, and, as control, male ESCs at various timepoints of differentiation. Besides binding of Xist to X chromosome regions they found significant binding to autosomes, concentrating mostly on promoter regions of around 100 autosomal genes, as elucidated by MACS. The authors went on to show that the RepB repeat is mostly responsible for these autosomal interactions using a female ESC line in which RepB is deleted. Evidence is provided that Xist interacts with active autosomal genes containing lower coverage of repressive marks H3K27me3 and H2AK119ub and that RepB dependent Xist binding leads to dampening of expression, but not silencing of autosomal genes. These results were confirmed by overexpression studies using transgenic ESCs with doxycycline-inducible Xist as well as via a small molecule inhibitor of Xist (X1), inducing/inhibiting the dampening of autosomal genes, respectively. Finally, using MEFs and Xist mutants RepB or RepE the authors provide evidence that Xist is bound to autosomal genes in cells after the XCI process but appears not to affect gene expression. The data presented appear generally clear and consistent and indicate some differences between human and mouse autosomal regulation by Xist.

      Strengths:

      Regulation of autosomal gene expression by Xist is a "big deal" as misregulation of this lncRNA causes developmental defects and human disease. Moreover, this finding may explain sex-specific developmental differences between the sexes. The results in this manuscript identify specific mouse autosomal genes bound by Xist and decipher critical Xist regions that mediate this binding and gene dampening. The methods used in this study are appropriate, and the overall data presented appear convincing and are consistent, indicating some differences between human and mouse autosomal regulation by Xist.

      Weaknesses:

      (1) The figure legends and/or descriptions of data are often very short lacking detail, and this unnecessarily impedes the reading of the manuscript, in particular the figures would benefit not only from more detailed descriptions/explanations of what has been done but also what is shown. This will facilitate the reading and overall comprehension by the reader. One out of many examples: In Fig S1B in the CHART data at d4 and d7 there is not only signal in female WT Xist antisense but also in female sense control. For a reader that is not an expert in XCI it would be helpful to point out in the legend that this signal corresponds to the lncRNA Tsix (I suppose), that is transcribed on the other strand.

      (2) Different scales are used in the lower panels of Figures 1A and 2A, which makes it difficult to directly compare signals between the different differentiation stages.

      (3) In this study some of the findings on mouse cells contrast previously published results in human ESCs: 1) Xist binding occurs preferentially to promoters in mice, not in human. 2) Binding of Xist is mostly detected in polycomb-depleted regions in mice but there is a positive correlation between Xist RNA and PRC2 marks in human ESCs. These differences are surprising but may be very interesting and relevant. While I am aware that this might be a difficult task, it would be helpful to experimentally address this issue in order to distinguish whether species specific and/or methodological differences between the studies are responsible for these differences.

    1. Joint Public Review:

      Summary:

      The authors present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Comments on revised version from expert Editor #1:

      The authors in the revised manuscript have effectively responded to all of the comments and suggestions raised by both reviewers. Overall, I find the revised version to be an important contribution to the field and the strength of evidence supporting the work's claims to be compelling.

      Comments on revised version from expert Editor #2:

      The authors have responded constructively to all the comments in the first round of reviews and clarified many issues in the manuscript. The current report represents a significant advance.

      Comments on revised version from Reviewer #2:

      The authors should include their clarifications of all concern raised by reviewer #2 (mentioned in the previous weaknesses) in the main text. They should consider including point #2 to point #10 in the main text (discussion section). The should highlight limitations of this study in discussion.

      Also, they should clearly state that deciphering brain area specific behavioural deficits is beyond the scope of the manuscript with appropriate justification mentioned in the rebuttal letter.

      I still do not agree with the author to state that "brain region-specific sensitivities to a defect in Pol III transcription". The changes are global and also not restricted to brain. Authors may consider restating this sentence. It is obvious that transcription defects related to tRNA production will lead to alteration in whole body physiology.

    2. eLife Assessment

      This study provides important insights into the mechanistic basis of neurological manifestations of RNA polymerase III-related disease by creating a mutant mouse to dissect transcriptional changes. The data provide compelling evidence for disease progression initiated by a global reduction in tRNA levels leading to integrated stress and innate immune responses and neuronal loss. The work will be of interest to those engaged in the study of chromosome biology, developmental biology and neurodegeneration.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Moir, Merheb et al. present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Weaknesses:

      The study could have explored identifying the extent of changes in the tRNA transcriptome among different cell types in the cerebrum. Although the authors attempted to show the temporal progression of tRNA transcriptome changes between P42 and P75 mice, the causal link was not established. A subsequent rescue experiment in the future could address this gap.

      Nonetheless, the claims and conclusions are supported by the presented data.

      We thank Reviewer 1 for their thoughtful review and commentary.  We appreciate the reviewer’s finding that our “claims and conclusions are supported by the presented data.”   

      We note that our findings on the temporal progression of transcriptional changes between P42 and P75 apply to both the Pol II and Pol III transcriptomes. Importantly, in the case of Pol III, only precursor and mature tRNAs are affected at P42 whereas at P75, numerous other Pol III transcripts are also changed.  We therefore attribute the changes in tRNA as being causal in disease initiation since this is the earliest  direct consequence of the Polr3a mutation.

      To expand on the evidence demonstrating the progressive nature of Polr3-related disease in our mouse model, the revised manuscript includes new immunofluorescence data showing no change in microglial cell density in the cerebral cortex or the striatum at an early stage in the disease (Supplementary Fig. S6F, G).  This is in striking contrast to the findings at later times (P75) where the number of microglia increased significantly in the Polr3a mutant and exhibit an activated morphology (Fig. 4G,H).   

      We agree with the reviewer that it will be interesting in the future to assess the impact of the Polr3a mutation in different neural cell types and to explore opportunities for suppressing disease phenotypes. 

      Reviewer #2 (Public Review):

      Summary:

      The study "Molecular basis of neurodegeneration in a mouse model of Polr3 related disease" by Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs. Furthermore, their study suggested that RNA pol III mutation leads to behavioural deficits that are commonly observed in neurodegeneration. Although, this study used a mouse model to establish theses aspects, the study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour. They should have used conditional mouse to delete the gene in specific brain area to test their hypothesis. Otherwise, this study shows a more generalized developmental effect rather than specific function of altered tRNA level. This is very evident from their bulk RNA sequencing study. This study provides some discrete information rather than a coherent story. My enthusiasm for publication of this article in eLife is dampened considering following reasons mentioned in the weakness.

      Reviewer 2’s summary contains two misstatements: 

      Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs.

      Our experiments document the effect of a neurodegenerative disease-causing mutation in RNA polymerase III on the Pol III transcriptome with a particular focus on the tRNAome (i.e. the mature tRNA population). Experiments on the maturation and transport of tRNA were not performed as there was no indication that these processes might be negatively impacted at the earliest time point (P42). Additional comments about tRNA maturation and export are provided under points 8 and 9 (see below). 

      The study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour.

      This comment misstates the purpose of our study while overlooking the important results. As stated in the abstract, our goal was to develop “a postnatal whole-body mouse model expressing pathogenic Polr3a mutations to examine the molecular mechanisms by which reduced Pol III transcription results primarily in central nervous system phenotypes.”

      Accordingly, our work provides the first molecular analysis of RNA polymerase III transcription in an animal model of Polr3-related disease. The novelty and importance of the findings, as stated in the abstract, include the discovery that a global reduction in tRNA levels (and not other Pol III transcripts) at an early stage in the disease precedes the frank induction of integrated stress and innate immune responses, activation of microglia and neuronal loss at later times. These later events readily account for the observed neurobehavioral deficits that collectively include risk assessment, locomotor, exploratory and grooming behaviors. 

      Strengths:

      The study created a mouse model to investigate role of RNA PolIII transcription. Furthermore, the study provided RNA seq analysis of the mutant mice and highlighted expression specific transcripts affected by the RNA PolIII mutation.

      Weaknesses:

      (1) The abstract is not clearly written. It is hard to interpret what is the objective of the study and why they are important to investigate. For example: "The molecular basis of disease pathogenesis is unknown." Which disease? 4H leukodystrophy? All neurodegenerative disease?

      We have modified the abstract to more clearly frame the objective of the study and its importance as reflected in the title “Molecular basis of neurodegeneration in a mouse model of Polr3-related disease”. We hope the reviewer will agree that the fourth sentence of the abstract, unchanged from the initial submission, clearly outlines the objective of the study.  

      (2) How cerebral pathology and exocrine pancreatic atrophy are related? How altered tRNA level connects these two axes?

      It is not known how cerebral pathology and exocrine pancreatic atrophy are related beyond their shared Pol III dysfunction in our mouse model of Polr3-related disease. We anticipate that altered tRNA levels connect these two axes. Indeed, the pancreas and the brain are both known to be highly sensitive to perturbations affecting translation (Costa-Mattioli and Walter, 2020 Science doi: 10.1126/science.aat5314). Changes to the tRNA population in the cerebrum and cerebellum of Polr3a mutant mice were extensively documented in the manuscript (e.g. Figs. 3, 5 and 6).  We also found reduced tRNA levels in the pancreas of the mutant mice but did not report these findings due to the absence of a stable reference transcript in total RNA from the atrophied pancreatic tissue, even at the earliest time point examined (P42). 

      (3) Authors mentioned that previously observed reduction mature tRNA level also recapitulated in their study. Why this study is novel then?

      Our study reports the novel finding that a pathogenic Polr3a mutation causes a global reduction in the steady state levels of mature tRNAs, i.e. the levels of all tRNA decoders were reduced with the vast majority these reaching statistical significance (Fig. 6D and 6F). In the introduction we refer to several studies that examined the effect of pathogenic Polr3 mutations on the levels of Pol III-derived transcripts. We noted that these studies examined only a small number of Pol III transcripts in CRISPR-Cas9 engineered cell lines, patient-derived fibroblasts and patient blood. Thus, no study until now has tested for or reported a global defect in the abundance of mature tRNAs in any model of Polr3-related disease. Moreover, no previous study of _Polr3_related disease has analyzed Pol III transcript levels in the brain or in any other tissue. 

      (4) It is very intuitive that deficit in Pol III transcription would severely affect protein synthesis in all brain areas as well as other organs. Hence, growth defect observed in Polr3a mutant mice is not very specific rather a general phenomenon.

      While we agree with the simple assumption that a “deficit in Pol III transcription likely would affect protein synthesis in all brain areas as well as other organs”, this turned out not to be the case. In fact, a novel finding of our study is that not all Polr3a mutant tissues show a translation stress response despite reduced Pol III transcription and reduced mature tRNA levels. This implies that in some tissues the reduction in tRNA levels caused by the Polr3a mutation is not sufficient to affect protein synthesis, at least to a point where the Integrated Stress Response is induced. The underlying basis for the growth deficit has not been defined in this work. However, we noted in the discussion that a growth defect was previously seen in mice where expression of the Polr3a mutation was restricted to the Olig2 lineage.  In the present postnatal whole-body inducible model, we anticipate that the diminished growth of the mice results from a combination of hormonal and nutritional deficits caused by cerebral and pancreatic dysfunction.

      (5) Authors observed specific myelination defect in cortex and hippocampus but not in cerebellum. This is an interesting observation. It is important to find the link between tRNA removal and myelin depletion in hippocampus or cortex? Why is myelination not affected in cerebellum?

      We agree that the specific myelin defect observed in the cortex and hippocampus, but not the cerebellum, is an interesting observation. Pol III dysfunction in this model and reduced tRNA levels are common to both cerebra and cerebella, yet the pathological consequences differ between these regions.  While we do not know why this is the case, the cells that oligodendrocytes support in these regions are functionally different. We suggest in the discussion that subtle defects in oligodendrocyte function in the cerebellum may be uncovered using more sensitive or specific assays than the ones we have employed to date.  In addition, consistent with our findings in other tissues where Pol III transcription and tRNA levels are reduced but phenotypes are lacking, we suggest that oligodendrocytes in the cerebellum may have a different minimum threshold for Pol III activity than in other regions of the brain. 

      (6) How was the locomotor activity measured? The detailed description is missing. Also, locomotion is primarily cerebellum dependent. There is no change in term of growth rate and myelination in cerebellar neurons. I do not understand why locomotor activity was measured.

      We used a behavioral spectrometer with video tracking and pattern-recognition software to quantify ~20 home cage-like behaviors, including locomotor activity, as part of our phenotypic characterization of the mice. This experimenter-unbiased approach reported several metrics of locomotion, specifically, total Track length (the total distance traveled in the instrument), Center Track length and the time spent running (Run Sum) and standing still (Still Sum) in a longitudinal study (Figs. 2A-C and Supplemental Fig. S3A-C). The Materials and Methods section on mouse behavior has been amended to provide a detailed description of these experiments. 

      locomotion is primarily cerebellum dependen_t_

      While we agree that the cerebellum plays a critical role in balance and locomotion, regions of the cerebrum that are affected in our mice, including the primary motor cortex and the basal ganglia (Fig. 4), also have important roles in locomotor activity and control. 

      (7) The correlation with behavioural changes and RNA seq data is missing. There a number of transcripts are affected and mostly very general factors for cellular metabolism. Most of them are RNA Pol II transcribed. How a Pol III mutation influences RNA Pol II driven transcription? I did not find differential expression of any specific transcripts associated with behavioural changes. What is the motivation for transcriptomics analysis? None of these transcripts are very specific for myelination. It is rather a general cellular metabolism effect that indirectly influences myelination.

      The differentially expressed mRNAs identified in our RNAseq analysis at P75 reflect both direct and secondary consequences of dysfunctional Pol III transcription on Pol II transcription. These effects can be achieved by multiple mechanisms. Induction of the Integrated Stress Response (ISR) due to insufficient tRNA can be considered a direct consequence of diminished Pol III transcription on Pol II transcription. An example of a secondary response is the activation of microglia and the innate immune response (which is known to accompany prolonged activation of the ISR), and the loss of neurons and oligodendrocytes. These changes are documented in Figs. 3 and 4. Importantly, loss of neurons, activated microglia and reduced oligodendrocyte numbers are each readily reconciled with changes in behavior.  

      None of these transcripts are very specific for myelination 

      The RNAseq data at P75 indicates only a modest reduction in oligodendrocyte-specific gene expression (as defined by single-cell RNAseq studies of purified cell populations, Mackenzie et al., 2018 Sci. Rep. doi: 10.1038/s41598-018-27293-5). Despite this, some oligodendrocytespecific transcripts with well-known roles in myelination were down-regulated in the Polr3a mutant (e.g. Plp1, Mog and Mobp). In addition, steroid synthesis pathway transcripts involved in the production of cholesterol, an abundant and essential component of myelin, were also downregulated (Supplementary Fig. S4E).

      (8) What genes identified by transcriptomics analysis regulates maturation of tRNA? Authors should at least perform RNAi study to identify possible factor and analyze their importance in maturation of tRNA.

      Of the many proteins involved in the maturation of tRNA (Phizicky and Hopper, 2023 RNA doi: 10.1261/rna.079620.123), RNAseq analysis at P75 identified only amino-acyl tRNA synthetases as being differentially-expressed (fold change >1.5, p adj. < 0.05, Table S1). These genes are canonical indicators of the ATF4-dependent Integrated Stress Response and their upregulation is widely interpreted as an attempt to restore efficient translation. In addition, our analysis of Pol III transcripts at P75 identified a reduction in the level of RppH1 (Fig. 3C), the RNA component of RNase P, which removes the 5’ leader of precursor tRNAs.  However, at P42, there was no effect on RppH1 abundance, or the expression of amino-acyl tRNA synthetase genes (Fig. 5C and Table S3).  Thus, an RNAi study to identify and analyze a possible factor involved in the maturation of tRNA is neither warranted nor relevant to the current body of work.

      (9) What factors are influencing tRNA transport to cytoplasm? It may be possible that Polr3a mutation affect cytoplasmic transport of tRNA. Authors should study this aspect using an imaging experiment.

      Our analysis of tRNA populations in this study employed total cellular RNA and thus reflect the abundance of mature tRNA from all cellular compartments. We have not assessed whether the reduction in tRNA abundance caused by the Polr3a mutation alters the dynamics of tRNA transport from the nucleus to the cytoplasm. However, we consider it highly unlikely that the Polr3a mutation would have a significant effect on cytoplasmic transport of tRNA. Imaging experiments along these lines are beyond the scope of the current study.

      (10) Does alteration of cytoplasmic level of tRNA affects translation? Author should perform translation assay using bio-orthoganal amino acid (AHA) labelling.

      It is not known whether the reduced tRNA levels affect translation globally in the Polr3a mutant, but we predict that this may not be the case. Since tissues (heart and kidney) and brain regions (cerebrum and cerebellum) that share a decrease in tRNA abundance do not share activation of the Integrated Stress Response (a reporter of aberrant translation), we anticipate that effects on translation may be limited to specific regions or cell populations and to specific mRNAs within these cells. The current study provides the foundation for further work to address these questions.

      Reviewer #1 (Recommendations For The Authors):

      Below are a few comments, mostly regarding typographical errors, presentation, and clarity, that we believe would enhance this manuscript:

      On the heatmaps generated, it would be ideal to place "WT" before "KI," with "WT" on the left. This will maintain consistency with the rest of the manuscript, where "WT" conditions precede "KI" conditions, as observed in the bar graphs and dot plots.

      All heatmaps have been remade with WT on the left and KI on the right to maintain consistency throughout the manuscript. 

      Authors mentioned in several instances (Discussion Pg 19 Line 2, for instance) the analysis of changes in the "Pol II transcriptome." Is this a typographical error?

      The reference to the Pol II transcriptome is not a typographical error (Discussion Pg 19 Line2). Here and elsewhere in the manuscript, we are distinguishing between changes to the Pol III transcriptome and the timing of subsequent changes to the Pol II transcriptome. The text has been edited to clarify this relationship in several places.   

      (1) Introduction, Page 4, last paragraph.

      Analysis of the Pol III transcriptome reveals a common decrease in pre-tRNA and mature tRNA populations and few if any changes among other Pol III transcripts across multiple tissues. Analysis of the Pol II transcriptome reveals activation of the integrated stress response in cerebra but not in other surveyed tissues.

      (2) Results, page 8, 2nd paragraph

      To investigate the molecular changes to Pol III transcript levels caused by the Polr3a mutation and any secondary effects on the Pol II transcriptome, we initially focused on the cerebra of adult mice at P75.

      (3) Discussion, Page 19, second paragraph

      Pol III dysfunction and the reduction in the cerebral tRNA population at P42 coincides with behavioral deficits and precedes substantial downstream alterations in the Pol II transcriptome, which include induction of an innate immune response (IR) and an ISR, and indicators of neurodegeneration (i.e., activation of cell death pathways and loss of mitochondrial DNA). These findings suggest a causal role for the lower tRNA abundance and/or altered tRNA profile in disease progression.

      In supplementary figure 1, authors validated the expression of their systems using flow cytometry and observed a high level of recombination frequency in different tissue types. Can the flow cytometry data distinguish between cell types within the cerebrum (neurons/microglia/astrocytes)?

      The flow cytometry experiments reported in Supplementary Fig. S1 used a dual tdTomato-EGFP reporter to assess recombination. The cerebral and cerebellar samples were gated on fluorescence from endogenous expression of tdTomato (red), EGFP (green) and DAPI (blue) staining. In principle, flow cytometry could be used to distinguish between cell types within the cerebrum (neurons/microglia/astrocytes). However,  this would require (i) an antibody to a cell surface marker on the cell type of interest and (ii) a fluorescent probe conjugated to the primary antibody or a fluorescent secondary antibody that is spectrally well resolved from the emission spectra of tdTomato, eGFP and DAPI.

      Results section 1: Is there any particular reason why P28 was chosen as the commencement of tamoxifen injection?

      P28 was chosen so that any effect of the Polr3a mutation on development and differentiation would be limited in the tissues we examined. 

      Fig 1C: The number of asterisks does not match between the graph and the figure legend.

      Fig. 1C has been corrected to match the number of asterisks in the graph and figure legend.

      Results section 3:

      This section seemed a little brief, especially when compared to the depth of the succeeding sections. Authors can state in greater detail which behaviors were quantified. In S3A-C, my understanding is that the animals were placed in an open-field test. This procedure can be briefly mentioned in the methods, as well as in the main manuscript text.

      In the legends of S3, a bracket is missing for "(D-F)" on line 5. Additionally, the alignment of legends for each bar graph could be consistent for all graphs except under the condition of spatial constraint.

      Detailed methods pertaining to the measurement and calculation of home cage-like behaviors reported by the behavioral spectrometer have been added to the Methods section on Mouse Behavior. 

      In the Results, Figs. S3A-C show anxiety-like behaviors which measure the number and duration of visits and the distance traveled  in a 15 cm2  central area of the arena. Figs. 2A-C show locomotor behaviors including Tracklength, Run sum and Still sum. The open field-like behavior is reported as total Tracklength in the behavioral spectrometer, i.e. the total distance travelled in the arena. This is now more clearly described in  the main manuscript and the Methods section. “overall locomotor activity was decreased in Polr3a-tamKI mice as indicated by the reduced track length at P42, P49, P56 and P63 (Fig. 2A).” 

      The legend of S3, now has the missing bracket "(D-F)" on line 5. 

      The legends within each bar graph are now consistent and aligned as much as spatial constraints allow.

      Results section 4:

      Similar to our earlier questions for S1, is it possible to distinguish samples derived from different cell types (neurons/glia)? In figure 4, this is mainly done post-hoc, based on the known gene expression. Maybe the authors could discuss this small limitation? In Fig S4C, the color contrast for the heatmap legend needs to be corrected.

      It is not possible to accurately distinguish different neural cell sub-types, such as different types of neurons, or different types of oligodendrocytes in bulk RNAseq. Hence, we have reported only high confidence correlations based on known gene expression signatures (Fig. 4). We discuss only the data for which we can draw confident conclusions. The heatmap and legend in Fig. S4C has been amended. 

      Results section 5:

      In figure S5A, the alignment of asterisk significance markers could be adjusted.

      Asterisks have been realigned in Fig. S5A

      Reviewer #2 (Recommendations For The Authors):

      Methods Section should include detailed procedure.

      A detailed description of the methods pertaining to the measurement and calculation of behaviors using the behavioral spectrometer has been added to the Methods section.

      Statistical tests should have detailed information

      Statistical tests are detailed in the Methods section “Statistical Analysis”. Additional details pertaining to calculations of behavioral data have been added to the “Mouse behavior” section of the Methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      Figures 1 and 2. How do the authors know that the lysine mutations are specific to constitutive activity and not because it is causing the channel to be now voltage sensitive? 

      As shown in the revised Figs. 1b, S2a, and 3b, TMEM16F I521K/M522K, TMEM16F I521E, and TMEM16A I546K/I547K spontaneously expose PS, respectively. Neither membrane depolarization nor calcium stimulation was introduced under these conditions and the cells were grown in calcium-free media after transfection to limit calcium-dependent activation. Our new experiments further demonstrate that TMEM16F T526K (Fig. 1b) and TMEM16A E551K (Fig. 3b), which are further away from the activation gate, exhibit either strongly attenuated or lack spontaneous lipid scrambling activity. According to these results, the gain-of-function mutants (TMEM16F

      I521K/M522K/I521E and TMEM16A I546K/I547K) are indeed constitutively active. This constitutive scramblase activity is not due to a gain of voltage sensitivity as ion channel activity is also minimal around the resting membrane potential of a HEK cell (Fig. 1d, e and Fig. 3d, e).

      The authors see very large currents of 5 -10 nA in their electrophysiology experiments in Figures 2D and 3D. I understand that Figure 2D are whole-cell recordings but are the authors confident that the currents that they are recordings from the mutants are indeed specific to TMEM16A. More importantly, in Figure 3D they see 3-5nA currents in insideout patches, which is huge. They have no added divalent in their bath solution, which could lead to larger single-channel amplitudes, but 3-5nA seems excessive. Some control to demonstrate that these are indeed OSCA1.2 currents is important. 

      TMEM16A and TMEM16F are well-known for their high cell surface expression. Therefore, the current amplitude is usually huge even in excised inside-out or outside-out patches—please see our previous publications for details: 1) 10.1016/j.cell.2012.07.036, 2) 10.7554/eLife.02772, 3) 10.1038/s41467-019-11784-8, 4) 10.1038/s41467-019-09778-7, 5) 10.1016/j.celrep.2020.108570, 6) 10.1085/jgp.202012704, and 7) 10.1085/jgp.202313460. 

      HEK293 cells do not have endogenous TMEM16A (https://doi.org/10.1038/nature07313, 10.1016/j.cell.2008.09.003 , DOI: 10.1126/science.1163518). It therefore serves as a widely used cell line for studying TMEM16A biophysics. As overexpressing the WT control barely elicited any obvious current in 0 Ca2+ (Fig. 3d), there is no doubt that the large outward-rectifying current (hallmark of CaCC) in the revised Fig. 3d (previous Fig. 2D) was elicited from the mutant TMEM16A channels. The strong outward rectification also rules out the possibility of this being leak current.

      Regarding Fig. 4d (previous Fig. 3D), OSCA1.2 has excellent surface expression as shown in Fig. 4b. OSCA1.2 also has much higher single channel conductance (121.8 ± 3.4 pS, 10.7554/eLife.41844) than TMEM16A (~3-8 pS) and TMEM16F (<1 pS). Therefore, recording nA OSCA1.2 current from excised patches is normal given larger OSCA1.2 current at depolarized voltages than the current recorded at hyperpolarized voltages (please see our explanation in the next response). As the reviewer pointed out, lack of divalent ions in our experimental conditions may also partially contribute to the large conductance. To further verify, we conducted mock transfection recordings (please see Author response image 1 below). WT- but not mock (GFP)transfected cells gave rise to large current, further supporting that the recorded current was indeed through OSCA1.2. 

      Author response image 1.

      Representative inside-out currents for mock (GFP)- and OSCA1.2 WT-transfected cells. OSCA1.2 is responsible for nA currents elicited by the pressure and voltage protocols shown.

      Figure 3D and 5D. Most of the traces and current quantification is done at positive potentials and is outward current. Do the authors observe inward currents? It is difficult to judge by the figures since currents are so large. OSCA/TMEM63s are cationic channels and all published data on these channels have demonstrated robust inward currents at negative, physiologically relevant potentials. The lack of inward currents but only large outward currents suggests that these mutations could be doing something else to the channel. 

      Yes. We indeed observe inward current at negative holding potentials under pressure clamp (Author response image 2). However, mechanosensitive OSCA and TMEM63A channels are also voltage dependent. Their outward current is an order of magnitude larger at depolarized voltages (e.g., Author response image 2, also 10.7554/eLife.41844, see Fig. 1H). 

      Author response image 2.

      Voltage-dependent rectification of OSCA1.2 current. a. Representative OSCA1.2 trace (bottom) elicited by a voltage-ramp under -50 mmHg (top). b. The difference in inward and outward current amplitudes. 

      We found that quantifying the OSCA1.2 outward current has advantages over the inward current. Usually, using the gold standard pressure clamp protocol at negative holding voltages, peak inward current amplitude is quantified. However, OSCA inward current quickly inactivates (10.7554/eLife.41844, see Fig. 1C). This makes robust quantification and comparison with mutant channels difficult. Holding the membrane at a constant pressure and measuring OSCA1.2 G-V overcomes these issues associated with the classical inward current measurements. The large depolarization-driven outward current does not inactivate, and robust tail current (Response Fig. 1, 2) allows us to construct G-V relationships. We found quantifying mutants’ voltage dependence at constant pressure is more consistent than quantifying pressure dependence at constant voltage. These advantages make our new protocol preferable to the commonly used gold standard pressure clamp protocol for characterizing and comparing the gating mutations identified in this manuscript. 

      Figure 3 and 5. Why are mechanically activated currents being recorded at random pressure stimuli (-50 mmHg for OSCA) and (-80 mmHg for Tmem63a)? The gold standard in the field is to run an entire pressure response curve. Given that only outward currents are observed at membrane potentials +120mV and above at 0mmHg, this questions whether they are indeed constitutively active. 

      As we explained in the previous response, both voltage and membrane stretch activate OSCA/TMEM63A channels. We found measuring voltage dependence under constant pressure provided more consistent quantification than the gold standard pressure response protocol. This may be due to the variability of applied membrane tension under repeated stretches versus the more consistent applied voltage. Additionally, we chose -50 mmHg and -80 mmHg to reflect the reported differences in half-maximal pressures between OSCA1.2 and TMEM63A (e.g., P50 ~55 mmHg for 1.2 and ~61 mmHg for 63A in 10.7554/eLife.41844 versus ~86 mmHg for 1.2 and -123 mmHg for 63A in 10.1016/j.neuron.2023.07.006).

      We also used higher pressure in cell attached mode to increase TMEM63A current amplitudes, which are usually tiny.  We have updated our method section (Lines 329334) to further clarify why we used these protocols. 

      Please note that in TMEM16 proteins, ions and lipids might not always co-transport.

      This means that under certain conditions, only one type of substrate may go through. For instance, in WT TMEM16F, Ca2+ stimulation can easily trigger PS exposure at resting membrane potential. No ionic currents are elicited until strong depolarization is applied. Similarly, the TMEM16F GOF mutations spontaneously transport lipids, leading to loss of lipid asymmetry (Fig. 1b, c). However, in 0 Ca2+, these TMEM16F mutant channels still need strong depolarization for ion conduction (Fig. 1d, e). Although the detailed mechanism still needs to be further investigated, the OSCA1.2 and TMEM63A GOF mutations share similar features with TMEM16 proteins, exhibiting ion conduction under high pressures and depolarizing voltages, yet constitutively active scrambling.  

      Some clarity is needed for their choice of residues. I understand that a lot of this is also informed by the structures of these ion channels. According to the alignment shown in Supplementary Figure 1, they chose LA for OSCA1.2, which is in line with the IM (TMEM16F) and II(TMEM16A) residues but for Tmem63a they chose the hydrophobic gate residue W and S. Was the A476 tested? Also, OSCA1.2 already has a K in the hydrophobic gating residue region. How do the authors reconcile this with their model? 

      We appreciate this critical comment. We have included the characterization of TMEM63A A476K (Fig. 6, corresponding to M522 in 16F, I547 in 16A, and A439 in OSCA1.2). Interestingly, A476K transfected cells did not show obvious spontaneous PS exposure yet exhibited a modest shift in V50 comparable to W472K and S475K. These differences may reflect the high-tension activated nature of the TMEM63 proteins (10.1016/j.neuron.2023.07.006) as compared to OSCA1.2, where the corresponding mutation (A439K, Fig. 4b, c) showed very little spontaneous activity and required hypotonic stimulation to promote more robust PS exposure (Fig. 5). 

      Furthermore, as we showed in Figs. 1b-c and 3b-c, there is a lower limit (towards the Cterminus) of the TM 4 lysine mutation effect, which becomes insufficient to cause a constitutively open pore for spontaneous lipid scrambling. It is possible that TMEM63A A476K represents the lower limit of TM 4 mutations that can convert TMEM63A into a spontaneous lipid scramblase.  

      Regarding OSCA1.2 K435 and TMEM63A W472, these sites correspond to the hydrophobic gate residues on TM 4 in TMEM16F (F518, Fig. 1a) and TMEM16A (L543, Fig. 3a) so it is unsurprising to us that a lysine mutation at this site causes constitutive scramblase activity in TMEM63A (Fig. 6b, c). For OSCA1.2, it is more intriguing since this residue is already a lysine (K435). In Supplementary Fig. 5 our new experiments show that neutralizing K435 with leucine (K435L) in the background of L438K significantly attenuates spontaneous PS exposure from ~63% PS positive for L438K alone (two lysine residues) to ~31% for K435L/L438K (one lysine). One the other hand, the K435L mutation by itself is also insufficient to induce PS exposure. Therefore, the endogenous lysine at residue 435 has an additive effect on the spontaneous scramblase activity of L438K. We believe the explanation for this result lies in experiments conducted in model transmembrane helices, which have shown that stacking hydrophilic side chains within the membrane interior promotes trans-bilayer lipid flipping (see 10.1248/cpb.c22-00133). 

      These same studies also support our observation (10.1038/s41467-019-09778-7) that highly hydrophilic side chains (such as lysine or glutamic acid) accelerate trans-bilayer lipid flipping more effectively than hydrophobic side chains such as isoleucine or alanine (Author response image 3, see also 10.1021/acs.jpcb.8b00298).

      Author response image 3.

      Trans-bilayer lipid flipping rates (kflip) accelerate with increasing side chain hydropathy for a residue placed in the center of a model transmembrane helical peptide

      How do the authors know that osmotic shock is indeed activating OSCA1.2 and TMEM63A? If they can record from the channels then electrophysiology data that confirms activation of the channel in the presence of hypoosmotic shock will strengthen the osmolarity active scramblase activity demonstrated in Figure 4. So far, there is conclusive data showing that they are mechanically activated but conclusive electrophysiological data for OSCA/TMEM63 osmolarity activation is not described yet, including the reference (38) they indicate in line 132. Although osmotic shock can perturb mechanical properties of the membrane it can also activate volume-regulated anion channels, which are also present in HEK cells. 

      Thank you for raising this important question. While reference 38, (now reference 39) shows direct electrophysiological evidence of hypertonicity-induced current (e.g., Fig. 4 f, g, i, and j in 10.1038/nature13593), direct electrophysiological evidence that OSCA/TMEM63 can be activated by hypotonic stimulation is still missing. To address this question, we conducted whole-cell patch clamp experiments on mocktransfected and OSCA1.2 WT-transfected cells stimulated with 120 mOsm/kg hypotonic solution, comparable to the same conditions as hypotonic-induced scrambling shown in Fig. 5. As shown in Supplementary Fig. 6, our whole-cell recording detected a slowly evolving yet robust outward rectifying current in OSCA1.2-transfected cells, which was not observed in mock transfected cells. 

      To avoid the contamination from endogenous SWELL osmo-/volume-regulated chloride channels, our new experiment used 140 mM Na gluconate to replace NaCl in both the pipette and the bath solution. Because SWELL/VRAC channels are minimally permeable to gluconate anions (e.g., 10.1007/BF00374290), we conclude that hypotonic stimulation can indeed activate OSCA1.2 albeit with perhaps lower efficiency compared to mechanical stimulation.  

      Minor comments 

      What is the timeline for the scramblase assay for all the experiments (except Figure 4)? How long is the AnnexinV incubated before imaging? 

      Thank you for pointing out this point where we have not provided sufficient detail. Cells were imaged in the scramblase assay (including in Fig. 4, now revised Fig. 5) in AnnexinV-containing buffer immediately and without a formal incubation period because AnnexinV binding to exposed PS proceeds rapidly. We have included additional detail in the methods section to eliminate any confusion (Lines 310-312).

      In some places of the document, it says OSCA/TMEM63, and in other places, it is denoted as TMEM63/OSCA. The literature so far has always called the family OSCA/TMEM63- please stay consistent with the field. 

      Thank you for pointing this out, we have corrected these instances to be consistent with the field.   

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors' statement that the channel/scramblase family members have a relatively low "energetic barrier for scramblase" activity needs further support. While mutating the hydrophobic channel gate certainly could destabilize ion conduction to cause a GOF effect on channel activity, it is still not clear why scramblase activity, which is tantamount to altered permeation, happens in the mutant channels. Are permeation and channel gating (opening) coupled in these channels? If so, what is the basis for the coupling? Is scramblase activity only observed when the gating is destabilized or are they separable? 

      We appreciate these great questions. For the question about the ‘energetic barrier’ statement, please see our response to point (3) where we have carried out MD simulations of the OSCA1.2 WT and L438K mutant to provide insight into how the permeation pathway is altered by these mutations. 

      Regarding why TMEM16A can be converted into a scramblase, we use the extensively studied TMEM16 proteins as examples to improve our current understanding of OSCA/TMEM63 proteins. For further details please see our original paper (10.1038/s41467-019-09778-7) and our review (10.3389/fphys.2021.787773), which are summarized as follows: 

      (1) The “neck region”, consisting of the exofacial halves of TMs 3-6, form the poregate region for both ion and lipid permeation (Author response image 4B). In the closed state, the neck region is constricted and TMs 4 and 6 interact with each other, preventing substrate permeation. The hydrophobic inner activation gate that we identified (10.1038/s41467-019-09778-7) resides right underneath the inner mouth of the neck region, controlling both ion and lipid permeation scrambling. 

      (2) Based on our functional observations and the available scramblase structures of TMEM16 proteins in multiple conformations, we proposed a clamshell-like gating model to describe TMEM16 lipid scrambling (Author response image 4D). According to this model, Ca2+-induced conformational changes weaken the TM 4/6 interface. This promotes the separation of the two transmembrane segments, analogous to the opening of a clam shell, allowing a membrane-spanning groove to facilitate permeation of the lipid headgroup.

      (3) For the CaCC, TMEM16A, Ca2+ binding dilates the pore. However, the binding energy likely cannot open the TM 4/6 interface at the neck region so, in the absence of groove formation, only Cl- ions but not lipids can permeate. (Pore dilation model, Author response image  4C). 

      (4) Introducing charged residues near the inner activation gate disrupts the neck region, potentially by weakening the hydrophobic interactions between TMs 4 and 6. This mutational effect results in constitutively active TMEM16F scramblases and enables spontaneous lipid permeation in the TMEM16A CaCC. 

      (5) In our revision, we tested additional mutations with different side chain properties (Supplementary Fig. 2), validating previous findings by us (10.1038/s41467-01909778-7) and others (10.1038/s41467-022-34497-x) that gate disruption increases with the side chain hydropathy of the mutation. 

      (6) We further extended lysine mutations to two helical turns below the inner activation gate on TM 4 and identified a lower limit for mutation-induced spontaneous scramblase activity in TMEM16F and TMEM16A (Figs. 1b, c and 3b, c, respectively). Together, all these points lend additional support to our proposed gating models for TMEM16 proteins, which we postulate may also relate to the OSCA/TMEM63 family based on the evidence provided in our manuscript.

      Author response image 4

      Model of gating (and regulatory) mechanisms in the TMEM16 family. (B) overall architecture and proposed modules, (C) pore-dilation gating model for CaCCs, (D) Clamshell gating model for CaPLSases.

      Regarding the relationship between ion and lipid permeation through TMEM16 scramblases, the following is the summary of our current understanding: 

      (1) Functionally, ion and lipid permeation are not necessarily obligatory to each other. This is evidenced by our previous biophysical characterizations of TMEM16F ion channel and lipid scramblase activities. Ca2+ can trigger TMEM16F lipid scrambling at resting membrane potentials, however, Ca2+ alone is insufficient to record TMEM16F current. Strong membrane depolarization synergistically with elevated intracellular Ca2+ is required to activate ion permeation. Based on these observations, we postulate that ions and lipids may have different extracellular gates, despite sharing an inner activation gate (10.1038/s41467-019-09778-7). Ca2+ alone may sufficiently open the inner gate (and extracellular gate) for lipids, whereas depolarization is likely required to open the extracellular gate and allow ion flux. Further structure-function studies are needed to test this hypothesis. 

      (2) Structurally, the open conformation of TMEM16 scramblases such as the fungal orthologs and human TMEM16K (Supplementary Fig. 1 b-d) are widely open, which allows lipid and ion co-transport. Ion and lipid co-transport has also been demonstrated in various MD simulations (e.g., 10.7554/eLife.28671, 10.3389/fmolb.2022.903972, and 10.1038/s41467-021-22724-w)

      (3) Functionally, we (10.1085/jgp.202012704) and others (10.7554/eLife.06901.001) have measured dual recording of channel and scramblase activities, also demonstrating that ions and lipids are co-transported simultaneously when the proteins are fully activated.

      (4) In this manuscript, we also provide multiple examples (TMEM16F in Fig. 1, TMEM16A in Fig. 3, OSCA1.2 in Fig. 4, and TMEM63A in Fig. 6) of mutations showing spontaneous phospholipid scramblase activities, yet their channel activities require strong depolarization or, in the case of TMEM63A, high pressures to be elicited.

      Together, this new evidence further supports our hypothesis that there might be multiple gates for ion and lipid permeation, in addition to the shared inner gate we previously identified. We hope these detailed explanations help convey the intricacy of these intriguing questions. Of course, future studies are needed to test our hypothesis and elucidate the complex relationship between ion and lipid permeation of these proteins. 

      (2) One weakness in the experimental approach is the very limited number of substitutions used to infer the conclusion regarding the energetic barrier and other conclusions relating to scramblase activity. Additional substitutions of charged and polar amino acids at the hydrophobic gate would be helpful in illuminating the molecular determinants of the GOF phenotype and also reveal varying patterns of lipid permeation which could be enormously informative. These additional mutations for analysis of TMEM16F and OSCA should be added to the study. 

      We appreciate these great suggestions which were shared by multiple reviewers. We have included our duplicated response below.

      “Response to reviewers 2 & 3: In our 2019 paper (10.1038/s41467-019-09778-7), we have systematically tested the side chain properties at the inner activation gate of TMEM16F on lipid scrambling activity (Response Fig. 6) and, since then, these results have been supplemented by others as well (10.1038/s41467-022-34497-x). In summary, mutating the inner activation gate residues to polar or charged residues generally results in constitutively activated scramblases without requiring Ca2+ (Fig 5a in 10.1038/s41467-019-09778-7). Because these residues form a hydrophobic gate, introducing smaller side chains via alanine substitution are also gain-of-function with the Y563A mutant as well as the F518A/Y563A/I612A variant being constitutively active (Fig. 3a in 10.1038/s41467-019-09778-7). Meanwhile, mutating these gate residues to hydrophobic amino acids causes no change for I612W, a slight gain-of-function for F518W, slight loss-of-function of F518L, and complete loss-of-function for Y563W (Fig. 4b in 10.1038/s41467-01909778-7). These findings clearly demonstrate that the side-chain properties are critical for regulating the gate opening. Charged mutations including lysine and glutamic acid are the most effective to promote gate opening (Fig 5a in 10.1038/s41467-019-09778-7).

      Similarly, others have observed that side chain hydropathy at the F518 site in TMEM16F correlates with shifts in the Ca2+ EC50 (Fig. 2 of 10.1038/s41467-022-34497-x). Note that this publication resolved the structure of the TMEM16F F518H mutant, revealing a previously unseen conformation that we have highlighted in Supplementary Fig. 1e and discussed in lines 235-238. Please also see our response to Reviewer #1 above, where we discuss discoveries in model transmembrane helical peptide systems showing that transbilayer lipid flipping rates correlate with side chain hydropathy (Author response image 3), distance between stacked hydropathic residues (schematic in 10.1248/cpb.c22-00133), and even helical angle between stacked side chains (not show). 

      Following the reviewers’ suggestions, we have tested additional mutations in alternative locations and with different side chains.  

      (1) We have added data for TMEM16F I521A and I521E to demonstrate a similar effect of alternative side chains to what has previously been reported by us and others. We found that I521A failed to show spontaneous scrambling activity (Supplementary Fig. 2), yet I521E (Supplementary Fig. 2) is a constitutively active lipid scramblase, similar to I521K (Fig. 1). This further demonstrates that gate disruption correlates with the side chain hydropathy and that this site lines a critical gating interface.

      (2) We also added lysine mutations two helical turns below the conserved inner activation gate for TMEM16F T526 (Fig. 1), TMEM16A E551 (Fig. 3). We found that there is indeed a lower limit for the observed effect in TMEM16, where lysine mutations no longer induce spontaneous lipid scrambling activity. This indicates that when TM 4/6 interaction is weaker toward intracellular side (Figs. 1a, 3a), the TM 4 lysine mutation loses the ability to promoting lipid scrambling by disrupting the TM 4/6 interface to enable clamshell-like opening of the permeation pathway. 

      (3) We added a TMEM16F lysine mutation on TM 6 at residue I611 (Fig. 2). Similar to I612K (Response Fig. 6), I611K also leads to spontaneous lipid scrambling and enhanced channel activity in the absence of calcium (Fig. 2). This shows that charged mutations along TM 6 can also promote lipid scrambling, strengthening our model that hydrophobic interactions along the TM 4/6 interface are critical for gating and lipid permeation.”

      (3) Related to the above point, it would be enormously useful to perform even limited computational modelling to support the "energetic barrier" statement. Specifically, can the authors model waters in the putative pore to examine water occupancy in the WT and mutant channels to better understand how the barrier for ions and lipids is altered in the TMEM16? 

      We appreciate this suggestion and have now conducted atomistic MD simulations of OSCA1.2 WT and L438K mutant for ~1 μs (Supplementary Fig. 4). The simulations revealed, elevated water occupancy in the pore region of the L438K mutant, likely due to a widening at the TM 4/6 interface. Conversely, the WT interface remained constricted, largely disallowing water occupancy. These computational results support our previously proposed clamshell-like gating model for TMEM16 scramblases and provide strong support that the L438K mutation is disrupting the interaction of the TM 4/6 interface, in turn reducing the energetic barrier for both ion and lipid permeation. 

      (4) I am puzzled about the ability of OSCA and the TMEM63 proteins which are cation channels to conduct negatively charged lipids. How can the pore be selective for cations and yet permeate negatively charged molecules when lipids are presented? 

      This is a great question. TMEM16 scramblase (as well as other known scramblases, such as the Xkr and Opsin families) are surprisingly non-selective to phospholipids (all major phospholipid species, not just anionic lipids like PS). It is still debated whether lipid headgroups indeed insert into an open pore or hydrophilic groove (Response Fig. 5), or if they may traverse the bilayer by the so-called ‘out-of-groove’ model. Regardless of the model, the consensus is that Ca2+-induced conformational changes catalyze lipid permeation and the mutations we have introduced are designed to mimic these conformational changes by separating the TM 4/6 interface.

      Additionally, TMEM16F channel activity was first characterized as cation non-selective (10.1016/j.cell.2012.07.036), similar to OSCA/TMEM63s, which may even exhibit some chloride permeability (10.7554/eLife.41844.001). Thus, it appears as though scramblase activity is agnostic to headgroup charge and compatible with both a mutant anion channel (TMEM16A) and mutant cation channels (TMEM16F, OSCA1.2, and TMEM63A), however, more detailed structural, functional, and computational studies are needed to further clarify ion and lipid co-transport mechanisms.  

      (5) Do pore blockers like Gd3+ which block permeation also inhibit the scramblase activity of the mutant channels? This should be tested for the mutant channels. 

      While extracellular Gd3+ has been previously reported as an inhibitor of OSCA1.2 (10.7554/eLife.41844.001), we did not observe this effect (Author response image 5), but instead saw inhibition by intracellular Gd3+ (Author response image 6). Given this discrepancy, we did not test Gd3+ inhibition of the OSCA1.2 scramblases, but instead tested Ani9, a paralog-specific inhibitor of TMEM16A, on the TMEM16A I546K gain-offunction and found it attenuated both ion channel and phospholipid scramblase activities (Supplementary Fig. 3).

      Author response image 5.

      200 µM Gd3+ext fails to inhibit OSCA1.2 currents in cell-attached patches. Pressure-elicited peak currents (n=6 each). Statistical test is an unpaired Student’s t-test.

      Author response image 6.

      200 µM Gd3+int completely inhibits OSCA1.2 currents in inside-out patches. (a) representative traces in before (black), during (red), and after (blue) Gd3+ application. (b) Representative application timecourse. (c) Quantification of peak currents (n=8 each). Statistical test is one-way ANOVA.

      Minor: 

      - Some of the current amplitudes shown in Figures 2 and 3 are enormous. Is liquid junction potential corrected in these experiments? If not, it would be preferable to correct this to avoid voltage errors. 

      Thanks for the question. The large current amplitude is due to 1) great surface expression of the proteins; 2) large single channel conductance of OSCA channels, 3) much larger current at positive voltages for OSCA channels. Our control experiment showed that WT TMEM16A at 0 Ca2+ did not give rise to any current (Fig. 3d), further demonstrating that the large current was not due to liquid junction potential. For the OSCA recordings, we also did not observe current in mock-transfected cells, further excluding the possible interference of liquid junction potential (Response Fig. 1)

      - Related, authors could consider adding some evidence using selective pharmacology to support the conclusions that the observed currents arise from TMEM or OSCA channels. 

      Thanks for the suggestion. As mentioned above, we have added experiments with Ani9, a specific inhibitor of TMEM16A, in Supplementary Fig. 3. We found that Ani9 robustly attenuated both ion channel and phospholipid scramblase activities for the TMEM16A I546K gain-of-function mutant. This is also consistent with our previous publication (10.1038/s41467-019-09778-7), where Ani9 efficiently inhibited the TMEM16A L534K mutant scramblases. Additionally, we have provided mock controls (Response Fig. 1, Fig. 6d, e) to show that the observed currents are indeed attributable to OSCA1.2 and TMEM63A.

      Reviewer #3 (Recommendations For The Authors): 

      Given that the authors postulate that the introduction of a positive charge via the lysine side chain is essential to the constitutive activity of these proteins, additional mutation controls for side chain size (e.g. glutamine/methionine) or negative charge (e.g. glutamic acid), or a different positive charge (i.e. arginine) would have strengthened their argument. To more comprehensively understand the TM4/TM6 interface, mutations at locations one turn above and one turn below could be studied until there is no phenotype. In addition, the equivalent mutations on the TM6 side should be explored to rule out the effects of conformational changes that arise from mutating TM4 and to increase the strength of evidence for the importance of side-chain interactions at the TM6 interface. 

      We appreciate these great suggestions which were shared by multiple reviewers. We have included our previous responses below.

      “Response to reviewers 2 & 3: In our 2019 paper (10.1038/s41467-019-09778-7), we have systematically tested the side chain properties at the inner activation gate of TMEM16F on lipid scrambling activity (Response Fig. 6) and, since then, these results have been supplemented by others as well (10.1038/s41467-022-34497-x). In summary, mutating the inner activation gate residues to polar or charged residues generally results in constitutively activated scramblases without requiring Ca2+ (Fig 5a in 10.1038/s41467-019-09778-7). Because these residues form a hydrophobic gate, introducing smaller side chains via alanine substitution are also gain-of-function with the Y563A mutant as well as the F518A/Y563A/I612A variant being constitutively active (Fig. 3a in 10.1038/s41467-019-09778-7). Meanwhile, mutating these gate residues to hydrophobic amino acids causes no change for I612W, a slight gain-of-function for F518W, slight loss-of-function of F518L, and complete loss-of-function for Y563W (Fig. 4b in 10.1038/s41467-01909778-7). These findings clearly demonstrate that the side-chain properties are critical for regulating the gate opening. Charged mutations including lysine and glutamic acid are the most effective to promote gate opening (Fig 5a in 10.1038/s41467-019-09778-7).

      Similarly, others have observed that side chain hydropathy at the F518 site in TMEM16F correlates with shifts in the Ca2+ EC50 (Fig. 2 of 10.1038/s41467-022-34497-x). Note that this publication resolved the structure of the TMEM16F F518H mutant, revealing a previously unseen conformation that we have highlighted in Supplementary Fig. 1e and discussed in lines 235-238. Please also see our response to Reviewer #1 above, where we discuss discoveries in model transmembrane helical peptide systems showing that transbilayer lipid flipping rates correlate with side chain hydropathy (Author response image 3), distance between stacked hydropathic residues (schematic in 10.1248/cpb.c22-00133), and even helical angle between stacked side chains (not show). 

      Following the reviewers’ suggestions, we have tested additional mutations in alternative locations and with different side chains.  

      (1) We have added data for TMEM16F I521A and I521E to demonstrate a similar effect of alternative side chains to what has previously been reported by us and others. We found that I521A failed to show spontaneous scrambling activity (Supplementary Fig. 2), yet I521E (Supplementary Fig. 2) is a constitutively active lipid scramblase, similar to I521K (Fig. 1). This further demonstrates that gate disruption correlates with the side chain hydropathy and that this site lines a critical gating interface.

      (2) We also added lysine mutations two helical turns below the conserved inner activation gate for TMEM16F T526 (Fig. 1), TMEM16A E551 (Fig. 3). We found that there is indeed a lower limit for the observed effect in TMEM16, where lysine mutations no longer induce spontaneous lipid scrambling activity. This indicates that when TM 4/6 interaction is weaker toward intracellular side (Figs. 1a, 3a), the TM 4 lysine mutation loses the ability to promoting lipid scrambling by disrupting the TM 4/6 interface to enable clamshell-like opening of the permeation pathway. 

      (3) We added a TMEM16F lysine mutation on TM 6 at residue I611 (Fig. 2). Similar to I612K (Response Fig. 6), I611K also leads to spontaneous lipid scrambling and enhanced channel activity in the absence of calcium (Fig. 2). This shows that charged mutations along TM 6 can also promote lipid scrambling, strengthening our model that hydrophobic interactions along the TM 4/6 interface are critical for gating and lipid permeation.”

      The experiments for OSCA1.2 osmolarity effects on gating and scramblase in Figure 4 could be improved by adding different levels of osmolarity in addition to time in the hypotonic solution.

      We thank the reviewer for this excellent suggestion. We extensively tested this idea and found evidence (Response Fig. 10) that intermediate osmolarity (220 and 180 mOso/kg) also can enhance the scramblase activity of the A439K mutant, albeit to a milder extent compared to 120 mOso/kg stimulation. This suggests that swellinginduced membrane stretch may proportionally induce A439K activation and lipid scrambling. Due to the relatively mild sensitivity of OSCA to osmolarity and the variations induced by the experimental conditions, we believe it is better to not include this data to avoid overclaiming. We hope the reviewer would agree. 

      Author response image 7.

      AnV intensities of WT- and A439K-transfected cells after 10 minutes of hypotonic stimulation at the listed osmolarities.

      Some confocal images appear to be rotated relative to each other (e.g. Figures 2b and 3b).

      Thank you for identifying these errors, they are corrected in the revision.

    2. eLife Assessment

      This important study advances our understanding of the mechanisms controlling lipid flux and ion permeation in the TMEM16 and OSCA/TMEM63 family channels. The study provides compelling new evidence indicating that side chains along the TM4/6 interface play a key role in gating lipid and ion fluxes in these channels. The authors suggest that the transmembrane channel/scramblase family proteins may have originally functioned as scramblases but lost this capacity over evolution.

    3. Reviewer #1 (Public review):

      Summary:

      TMEM16, OSCA/TMEM63, and TMC belong to a large superfamily of ion channels where TMEM16 members are calcium activated lipid scramblases and chloride channels, whereas OSCA/TMEM63 and TMCs are mechanically activated ion channels. In the TMEM16 family, TMEM16F is a well characterized calcium activated lipid scramblase that play an important role in processes like blood coagulation, cell death signaling, and phagocytosis. In a previous study the group has demonstrated that lysine mutation in TM4 of TMEM16A can enable the calcium activated chloride channel to permeate phospholipids too. Based on this they hypothesize that the energy barrier for lipid scramblase in these ion channels is low, and that modification in the hydrophobic gate region by introducing a charged side chain between TM4/6 interface in TMEM16 and OSCA/TMEM63 family can allow lipid scramblase. In this manuscript, using scramblase activity via Annexin V binding to phosphatidylserine, and electrophysiology, the authors demonstrate that lysine mutation in TM4 of TMEM16F and TMEM16A can cause constitutive lipid scramblase activity. The authors then go on to show that analogous mutations in OSCA1.2 and TMEM63A can lead to scramblase activity. The revised version does a thorough characterization of residues that form the hydrophobic gate region in TM4/6 of this superfamily of channels. Their results indicated that disrupting the TM4/6 interaction can reduce energy barrier for this channels to scramblase lipids.

      Strengths:

      Overall, the authors introduce an interesting concept that this large superfamily can permeate ions and lipids.

      Weaknesses:

      none noted in the revised version.

    4. Reviewer #2 (Public review):

      This focused study by Lowry and colleagues that identifies a key molecular motif that controls ion permeation vs combined ion permeation and lipid transport in three families of channel/scramblase proteins, in TMEM16 channels, in the plant-expressed and stress-gated cation channel OSCA, and in the mammalian homolog and mechanosensitive cation channel, TMEM63. Between them, these three channels share low sequence similarity and have seemingly differing functions, as anion (TMEM16 channels), or stress-activated cation channels (OSCA/TMEM63). The study finds that in all three families, mutating a single hydrophobic residue in the ion permeation pathway of the channels confers lipid transport through the pores of the channels, indicating that TMEM16 and related OSCA and TMEM63 channels have a conserved potential for both ion and lipid permeation. The authors interpret the findings as revealing that these channel/scramblase proteins have a relatively low "energetic barrier for scramblase" activity. The experiments are done with a high level of rigor and the revised paper is very well written and addresses the previous concerns.

    5. Reviewer #3 (Public review):

      This study was focused on the conserved mechanisms across the Transmembrane Channel/Scramblase superfamily, which includes members of the TMEM16, TMEM63/OSCA, and TMC families. In previous work, the authors have studied the role of the inner activation gate of these proteins. Here, the authors show that the introduction of mutations at the TM4-TM6 interface, which are close to the inactivation gate, can disrupt gating and confer scramblase activity to non-scramblases proteins.

      Overall, the confocal imaging experiments, patch clamping experiments, and data analysis are performed well and in line with standard methods. The molecular dynamics simulation work is focused but adds supportive evidence to their findings. Although there could have been more extensive molecular analysis to bolster the authors' arguments on the role of the TM4-TM6 interface (e.g. evaluate effects of size/hydrophobicity, double mutants, cross-linking, more in-depth simulation data), there is adequate evidence to conclude that certain residues at this interface is critical to ion conduction and phospholipid scramblase activity. The data presented only adds incremental depth of knowledge for each individual channel, but together, they show this to be true for conserved TM4 residues across TMEM16F, TMEM16A, OSCA1.2, and TMEM63A proteins. This breadth of data is a major strength of this paper, and provides strong evidence for a coupled pathway for ion conduction and phospholipid transport, though the underlying biophysical mechanism is still speculative and remains to be elucidated.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Weaknesses:

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The data utilized in our analysis were collected during the first examination or test conducted after the patients' admission. We specifically excluded any patients with a history of epilepsy, ensuring that all cases of epilepsy identified in our study occurred after admission. Therefore, the features we analyzed were collected after the patients' admission but prior to the onset of post-stroke epilepsy.

      Reviewer #3 (Public review):

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      Thank you for your helpful advice.  Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity.   We revised our code and did a 5 fold cross-validation version ,it didn’t have much promote(because our model has reach the auc of 0.99).Considering that we have sufficient quantity of more than 20000 records, we think split the dataset by 7:3 and train the model is enough for us. We have uploaded the code of 5 fold cross-validation version and ploted the 5 fold test roc  on GitHub at https://github.com/conanan/lasso-ml/lasso_ml_cross_validation.ipynb as an external resource. We  trained the 5 fold average model and ploted the 5 fold test roc curves, the results show some improvement, but it is not substantial because the best model are still tree models in the end.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

      Thank you for your valuable feedback regarding the external validation results. We appreciate your concerns about potential bias and overoptimism in our estimations of positive predictive value (PPV) and sensitivity.

      To clarify, we have uploaded the code for external validation on GitHub at https://github.com/conanan/lasso-ml. The results indicate that the PPV is 0.95 and the specificity is 0.98.

      While we focused on collecting more positive cases due to their lower occurrence rate, this approach allows us to better evaluate the model's ability to predict positive samples, which is crucial in clinical settings. We believe that emphasizing positive cases enhances the model's utility for practical applications(So a little overoptimism is acceptable ).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses 1:

      The methodology needs further consideration. The Discussion needs extensive rewriting.

      Thanks for your advice, we have revised the Discussion

      Reviewer #2 (Public Review):

      Weaknesses 2:

      There are many typos and unclear statements throughout the paper.

      There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.

      Thank you for your suggestion that the SHAP analysis is really just a means of interpreting the model.  In our research, we compared the SHAP analysis with traditional statistical methods, such as regression analysis.  We found the SHAP results to be consistent with the statistical results from the regression for variables like white blood cell count (see Table 1). This alignment leads us to believe the SHAP analysis is providing reliable insights in this context

      The Data Collection section is very poorly written, and the methodology is not clear.

      Thanks for your advice, we have revised the Data Collection section.

      There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.

      Thank you for the advices of performing hyperparameter. We used the package of sklearn, xgboost, lightgbm of python 3.10 to construct the model and  didn’t change the default settings before. It is not proper and may lead to  less certain conclusions. Now we carry out grid search to select and optimize hyperparameters and they make the model better. The best model is still RF.

      The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?

      The procedure of selection is in figure1. Total there are 42079 records from the stroke database, 24733 patients were diagnosed as ischemic stroke or lacular stoke with new onset. Then we excluded hemorrage stroke(4565),history of stroke(2154), TIA(3570), unclear cause stroke(561) and records who missed important data(6496). Then we excluded patients whose seizure might be attributed to other potential causes (brain tumor, intracranial vascular malformation, traumatic brain injury,etc)(865). Then we exclude patient who had a seizure history(152) or died in hospital (1444). Then we excluded patients who were lost in follow-up (had no outpatient records and can’t contact by phone )or died within 3 months of the stroke incident(813). Finally 21459 cases are involved in this research.

      There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(samplingstrategy='auto', randomstate=42)

      the SMOTEENN class comes from the imblearn library. The samplingstrategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The randomstate=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      Did the authors achieve their aims? Do the results support their conclusions?

      Yes, we have achieve some of the aims of predicting PSE while still leave some problem.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.

      The data used in our analysis is from the first examination or test conducted after the patients' admission, retrieved from a PostgreSQL database. First, we extracted the initial admission date for patients admitted due to stroke. Then, we identified the nearest subsequent examination data for each of those patients.

      The sql code like follows:

      SELECT TO_DATE(condition_start_date, 'DD-MM-YYYY') AS DATE

      FROM diagnosis

      WHERE person_id ={} and (condition_name like '%梗死%' or condition_name like '%梗塞%') and(condition_name like '%脑%'or condition_name like '%腔隙%'))

      order by DATE limit 1

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice. The external validation is certainly very important, but there have been some difficulties in reaching a perfect solution.  We have tried using open-source databases like the MIMIC database, but the data there does not fit our needs as closely as the records from our own hospital.  The MIMIC database lacks some of the key features we require, and also lacks the detailed patient follow-up information that is crucial for our analysis.   Given these limitations, we have decided to collect newer records from the same hospitals here in Chongqing.  We believe this will allow us to build a more comprehensive dataset to support robust external validation.  While it may not be a perfect solution, gathering this additional data from our local healthcare system is a pragmatic step forward.   Looking ahead, we plan to continue expanding this Chongqing-based dataset and report on the results of the greater external validation in the future.  We are committed to overcoming the challenges around data availability to strengthen the validity and generalizability of our research findings.

      For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote(because our model has reach the auc of 0.99), we may use this great technique in our next study if there is not enough cases.

      Additional context that might help readers

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      Thank you for your helpful advice. It is a great improve for our draft, we have added the explanation that we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute opposite to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Public Review):

      Weaknesses3:

      There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.

      Thanks for your advice, we have revised these flaws.

      The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.

      Thank you for your recommendations. We have made the code available on GitHub at https://github.com/conanan/lasso-ml. While the data is private and belongs to the hospital. Access can be requested by contacting the corresponding author to apply from the hospitals and specifying the purpose of inquiry.

      Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.

      Thank you for your valuable advice. Performing n-fold cross-validation is crucial for ensuring the reliability and robustness of results, especially with limited datasets. However, since we have over 20,000 records, we believe that a 70:30 split for training and testing is sufficient.

      We revised our code and implemented 5-fold cross-validation, which provided minimal improvement, as our model has already achieved an AUC of 0.99. We plan to use this technique in future studies if we encounter fewer cases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My comments include two parts:

      (1) Methodology<br /> a-This study was based on multiple clinical indicators to construct a model for predicting the occurrence of PSE. It involved various multi-class indicators such as the affected cortical regions, locations of vascular occlusion, NIHSS scores, etc. Only using the SHAP index to explain the impact of multi-class variables on the dependent variable seems slightly insufficient. It might be worth considering the use of dummy variables to improve the model's accuracy.

      Thank you for the detailed feedback on the study methodology. The SHAP analysis is really just a means of interpreting the model, which we compared with the combination of SHAP and traditional statistics, so we think SHAP analysis is reliable in this research. We have used the dummy variables, expecially when dealing with the affected cortical regions, locations of vascular occlusion, for example if frontal region is involved the variable is 1. But they have less impact in the machine learning model

      b-The study used Lasso regression to select 20 features to build the model. How was the optimal number of 20 features determined?

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      c-The study indicated that the incidence rate of PSE in the enrolled patients is 4.3%, showing a highly imbalanced dataset. If singly using the SMOTE method for oversampling, could this lead to overfitting?

      Thanks for your remind, singly using the SMOTE method for oversampling is inproper. Now we have find this improvement and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. First, oversampling with SMOTE and then undersampling with ENN to remove possible noise and duplicate samples. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      (2) Clinical aspects:

      Line 8, history of ischemic stroke, this is misexpression, could be: diagnosis of ischemic stroke.

      Line 8, several hospitals, should be more exact; how many?

      Line 74 indicates that the data are from a single centre, this should be clarified.

      Line 4 data collection: The criteria read unclear; please clarify further.

      Thanks for your remind, we have revised the draft and correct these errors.

      Line 110, lab parameters: Why is there no blood glucose?

      Because many patients' blood sugar fluctuates greatly and is easily affected by drugs or diet, we finally consider HBA1c as a reference index by asking experts which is more stable.

      Line 295, The author indicated that data lost; this should be clarified in the results part, and further, the treatment of missing data should be clarified in the method part.

      Thanks for your remind, we have revised the draft and correct these errors.

      I hope to see a table of the cohort's baseline characters. The discussion needs extensive rewriting; the author seems to be swinging from the stoke outcome and the seizure, sometimes losing the target.

      Figure1 is the procedure of the selection of patients. Table1 contains the cohort's baseline characters

      For the swinging from the stoke outcome and the seizure, that is because there are few articles on predicting epilepsy directly by relevant indicators, while there are more articles on prognosis. So we can only take epilepsy as an important factor in prognosis and comprehensively discuss it, or we can't find enough articles and discuss them

      Reviewer #2 (Recommendations For The Authors):

      There are typos and examples of text that are not clear, including:

      "About the nihss score, the higher the nihss score, the more likely to be PSE, nihss score has a third effect just below white blood cell count and D-dimer."

      "and only 8 people made incorrect predictions, demonstratijmng a good predictive ability of the model."

      "female were prone to PSE"

      " Waafi's research"

      "One-heat' (should be one-hot)

      Thanks for your remind, we have revised the draft and correct these errors.

      The Data Collection section is poorly written, and the methodology is not clear. It would be much more appropriate to include a table of all features used and an explanation of what these features involve. It would also be useful to see the mean values of these features to assess whether the feature values are reasonable for the dataset.

      Thanks for your remind. All data are from the first examination or test after admission, presented through the postgresql database . First we extract the first date of the patients who was admitted by stroke ,then we extract informations from the nearest examination from the admission. We extract by the SQL code by computer instead of others who may extract data by manual so we get as much data as possible other than only get the features which was reported before .The table of all features used and their mean±std is in table1.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage. I would need this clarified before believing the authors achieved their claims of building a predictive model.

      All relevant index results were from the first examination after admission, and the mean standard deviation was listed in the statistical analysis section in table1.

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice, the external validation is very important but there are some difficulties to reach a perfect one. We have tried some of the open source database like the mimic database ,but these data don't fit our request because they don't have as much features as our hospital and lack of follow-up of the relevant patients. In the end we collected the newer records in the same hospitals in Chongqing and we will collect more and report a greater external validation in the future.

      For greater certainty on all reported results, It would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits.

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote, we will use this great technique in our next study.

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      It is a great improve for our draft, we have added the explanation we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute lower to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Recommendations For The Authors):

      Abbreviations should not be defined in the abstract )or only in the abstract).

      Please explicit what are the purposes of the study you are referring to in "Currently, most studies utilize clinical data to establish statistical models, survival analysis and cox regression."

      Authors affirm: "there is still a relative scarcity of research 49 on PSE prediction, with most studies focusing on the analysis of specific or certain risk factors ." This statement is especially curious since the current study uses risk factors as predictors.

      It is not clear to me what the authors mean by "No study has proposed or established a more comprehensive and scientifically accurate prediction model." The authors do not summarize the statistical parameters of previously reported model, or other relevant data to assess coverage or validity (maybe including a Table summarizing such information would be appropriate. In any case, I would try to omit statements that imply, to some extent, discrediting previous studies without sufficient foundation.

      "antiepileptic drugs" is an outdated name. Please use "antiseizure medications"

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say regarding missing data that they "filled the data of the remaining indicators with missing values of more than 1000 cases by random forest algorithm". Please clarify what you mean by "of more than 1000 cases." Also, provide details on the RF model used to fill in missing data.

      Thanks for your remind. "of more than 1000 cases" was a wrong sentence and we have corrected it. Here is the procedure, first we counted the values of all laboratory indicators for the first time after stroke admission( everyone who was admitted because of stroke would perform blood routine , liver and kidney function and so on), excluded indicators with missing values of more than 10%, and filled the data of the remaining indicators with missing values by random forest algorithm using the default parameter. First, we go through all the features, starting with the one with the least missing (since the least accurate information is needed to fill in the feature with the least missing). When filling in a feature, replace the missing value of the other feature with 0. Each time a regression prediction is completed, the predicted value is placed in the original feature matrix and the next feature is filled in. After going through all the features, the data filling is complete.

      Please specify what do you mean by negative group and positive group, Avoid tacit assumptions.

      Thanks for your remind, we have revised the draft and correct these errors.

      Please provide more details (and references) on the smote oversampling method. Indicate any relevant parameters/hyperparameters.

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      The methodology is presented in an extremely succinct and non-organic manner (e.g., (Model building) Select the 20 features with the largest absolute value of LASSO." Please try to improve the narrative.

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      Many passages of the text need references. For example, those that refer to Levene test, Welch's t-test, Brier score, Youden index, and many others (e.g., NIHSS score). Please revise carefully.

      Thanks for your remind, we have revised the draft and correct these errors.

      "Statistical details of the clinical characteristics of the patients are provided in the table." Which table? Number?

      Thanks for your remind, we have revised the draft and correct these errors, it is in table1.

      Many abbreviations are not properly presented and defined in the text, e.g., wbc count, hba1c, crp, tg, ast, alt, bilirubin, bua, aptt, tt, d_dimer, ck. Whereas I can guess the meaning, do not assume everyone will. Avoid assumptions.

      ROC is sometimes written "ROC" and others, "roc." The same happens for PPV/ppv, and many other words (SMOTE; NIHSS score, etc.).

      Please rephrase "ppv value of random forest is the highest, reaching 0.977, which is more accurate for the identification of positive patients(the most important function of our models).". PPV always refer to positive predictions that are corroborated, so the sentences seem redundant.

      Thanks for your remind, we have revised the draft and correct these errors.

      What do you mean by "Complex algorithms". Please try to be as explicit as possible. The text looks rather cryptic or vague in many passages.

      Thanks for your remind, "Complex algorithms" is corrected by machine learning.

      The text needs a thorough English language-focused revision, since the sense of some sentences is really misleading. For instance "only 8 people made incorrect predictions,". I guess the authors try to say that the best algorithm only mispredicted 8 cases since no people are making predictions here. Also, regarding that quote... Are the authors still speaking of the results of the random forest model, which was said to be one of the best performances?

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say that they used, as predictors "comprehensive clinical data, imaging data, laboratory test data, and other data from stroke patients". However, the total pool of predictors is not clear to me at this point. Please make it explicit and avoid abbreviations.

      Thanks for your remind, we have revised the draft and correct these errors.

      Although the authors say that their code is available upon request, I think it would be better to have it published in an appropriate repository.

      Thanks for your remind, we showed our code at  https://github.com/conanan/lasso-ml.

    2. eLife Assessment

      This valuable study reports machine learning models derived from large-scale data to predict the risk of post-stroke epilepsy. The evidence supporting the conclusions is convincing, although there are some validation issues (lack of cross-validation, possible bias in external validation results). The study may be of interest in the field of clinical neurology

    3. Reviewer #1 (Public review):

      Summary:

      This is a large cohort of ischemic stroke patients from a single centre. The author successfully set up predictive models for PTS.

      Strengths:

      The design and implementation of the trial are acceptable, with the credibility of the results. It may provide evidence of seizure prevention in the field of stroke treatment.

      Weaknesses:

      My concerns are well responded to.

    4. Reviewer #2 (Public review):

      Summary

      The authors present multiple machine-learning methodologies to predict post-stroke epilepsy (PSE) from admission clinical data.

      Strengths

      The Statistical Approach section is very well written. The approaches used in this section are very sensible for the data in question.

      Typos have now been addressed and improved interpretability has been added to the paper, which is appreciated.

      Weaknesses

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The likely impact of the work on the field

      If this model works as claimed, it will be useful for predicting PSE. This has some direct clinical utility.

      Analysis of features contributing to PSE may provide clinical researchers with ideas for further research on the underlying aetiology of PSE.

    5. Reviewer #3 (Public review):

      Summary:

      The authors report the performance of a series of machine learning models inferred from a large-scale dataset and externally validated with an independent cohort of patients, to predict the risk of post-stroke epilepsy. Some of the reported models have very good explicative performance, and seem to have very good predictive ability.

      Strengths:

      The models have been derived from real-world large-scale data.

      Performances of the best-performing models seem to be very good according to the external validation results.

      Early prediction of risk of post-stroke epilepsy would be of high interest to implement early therapeutic interventions that could improve prognosis.

      Code is publicly available. The authors also stated that the datasets used are available on request.

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

    1. eLife Assessment

      This study aims to analyse the effect of polymorphism on meiotic recombination in subspecies of Saccharomyces. The detection of reciprocal and non-reciprocal events is based on sequencing the haploid products of meiosis, and frequencies are compared between strains having introgressed genomic segments and strains lacking such segments. Unfortunately, the method used are inadequate for quantifying the non-reciprocal events.

    2. Reviewer #1 (Public review):

      Summary:

      The authors explored how the presence of interspecific introgressions in the genome affects the recombination landscape. This research aims to shed light on the genetic phenomena influencing the evolution of introgressed regions. However, it is important to note that the study is based on examining only one generation, which limits the scope for making broad evolutionary conclusions. In this study, yeast hybrids with large introgressions (ranging from several to several dozen percent of the chromosome length) from another yeast species were crossed. The products of meiosis were then isolated and sequenced to examine the genome-wide distribution of both crossovers (COs) and noncrossovers (NCOs). The authors found a significant reduction in the frequency of COs within the introgressed regions, which is a phenomenon well-documented in various systems. They also report that introgressed regions exhibit an increased frequency of NCOs. Unfortunately, this conclusion seems flawed, as there is no accurate method for correcting the detection level of NCOs when the compared regions (introgressed and non-introgressed) differ drastically in SNP density. The authors further confirmed that introgressions significantly limit the local shuffling of genetic information, and while NCOs contribute slightly to this shuffling, they do not compensate for the loss of CO recombination. This is widely known fact.

      In summary, the study makes a limited contribution to the understanding of how polymorphism impacts meiotic recombination. The conclusion regarding the increase in NCO frequency in polymorphic regions is likely incorrect.

    3. Reviewer #3 (Public review):

      When members of two related but diverged species mate, the resulting hybrids can produce offspring where parts of one species' genome replace those of the other. These "introgressions" often create regions with a much greater density of sequence differences than are normally found between members of the same species. Previous studies have shown that increased sequence differences, when heterozygous, can reduce recombination during meiosis specifically in the region of increased difference. However, most of these studies have focused on crossover recombination, and have not measured noncrossovers. The current study uses a pair of Saccharomyces uvarum crosses: one between two natural isolates that, while exhibiting some divergence, do not contain introgressions; the other is between two fermentation strains that,<br /> when combined, are heterozygous for 9 large regions of introgression that have much greater divergence than the rest of the genome. The authors wished to determine if introgressions differently affected crossovers and noncrossovers, and, if so, what impact that would have on the gene shuffling that occurs during<br /> meiosis.

      While both crossovers and noncrossovers were measured, assessing the true impact of increased heterology (inherent in heterozygous introgressions) is complicated by the fact that the increased marker density in heterozygous introgressions also increases the ability to detect noncrossovers. The authors now use a revised correction aimed at compensating for this difference, and based on that correction, conclude that, while as expected crossovers are decreased by increased sequence heterology, noncrossovers neither increase nor decrease substantially. They then show that genetic shuffling overall is substantially reduced in regions of heterozygous introgression, which is not surprising given that one type of event is reduced and the other remains at similar levels. However, the correction currently used remains poorly justified, tests of its validity are not presented. Thus, the only possibly novel conclusion, that noncrossovers are less affected by heterology than crossovers, remains to be adequately tested.

      In conclusion, of the three main conclusions as stated in the abstract, one (that crossovers go down) has been shown in many systems, one (that noncrossovers increase) is wrong, and the third (that allele shuffling is reduced) is obvious. Given this, the impact of this work on the field will be minimal at best, and negative to the extent that readers are led astray.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigated how the presence of interspecific introgressions in the genome affects the recombination landscape. This research was intended to inform about genetic phenomena influencing the evolution of introgressed regions, although it should be noted that the research itself is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. In this work, yeast hybrids with large (from several to several dozen percent of the chromosome length) introgressions from another yeast species were crossed. Then, the products of meiosis were isolated and sequenced, and on this basis, the genome-wide distribution of both crossovers (COs) and noncrossovers (NCOs) was examined. Carrying out the analysis at different levels of resolution, it was found that in the regions of introduction, there is a very significant reduction in the frequency of COs and a simultaneous increase in the frequency of NCOs. Moreover, it was confirmed that introgressions significantly limit the local shuffling of genetic information, and NCOs are only able to slightly contribute to the shuffling, thus they do not compensate for the loss of CO recombination.

      Strengths:

      - Previously, experiments examining the impact of SNP polymorphism on meiotic recombination were conducted either on the scale of single hotspots or the entire hybrid genome, but the impact of large introgressed regions from another species was not examined. Therefore, the strength of this work is its interesting research setup, which allows for providing data from a different perspective.

      - Good quality genome-wide data on the distribution of CO and NCO were obtained, which could be related to local changes in the level of polymorphism.

      Weaknesses:

      (1)  The research is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. Moreover, meiosis is stimulated in hybrids in which introgressions occur in a heterozygous state, which is a very unlikely situation in nature. Therefore, I see the main value of the work in providing information on the CO/NCO decision in regions with high sequence diversification, but not in the context of evolution.

      While we are indeed only examining recombination in a single generation, we respectfully disagree that our results aren't relevant to evolutionary processes. The broad goals of our study are to compare recombination landscapes between closely related strains, and we highlight dramatic differences between recombination landscapes. These results add to a body of literature that seeks to understand the existence of variation in traits like recombination rate, and how recombination rate can evolve between populations and species. We show here that the presence of introgression can contribute to changes in recombination rate measured in different individuals or populations, which has not been previously appreciated. We furthermore show that introgression can reduce shuffling between alleles on a chromosome, which is recognized as one of the most important determinants for the existence and persistence of sexual reproduction across all organisms. As we describe in our introduction and conclusion, we see our experimental exploration of the impacts of introgression on the recombination landscape as complementary to studies inferring recombination and introgression from population sequencing data and simulations. There are benefits and challenges to each approach, but both can help us better understand these processes. In regards to the utility of exploring heterozygous introgression, we point out that introgression is often found in a heterozygous state (including in modern humans with Neanderthal and/or Denisovan ancestry). Introgression will always be heterozygous immediately after hybridization, and depending on the frequency of gene flow into the population, the level of inbreeding, selection against introgression, etc., introgression will typically be found as heterozygous.

      - The work requires greater care in preparing informative figures and, more importantly, re-analysis of some of the data (see comments below).

      More specific comments:

      (1) The authors themselves admit that the detection of NCO, due to the short size of conversion tracts, depends on the density of SNPs in a given region. Consequently, more NCOs will be detected in introgressed regions with a high density of polymorphisms compared to the rest of the genome. To investigate what impact this has on the analysis, the authors should demonstrate that the efficiency of detecting NCOs in introgressed regions is not significantly higher than the efficiency of detecting NCOs in the rest of the genome. If it turns out that this impact is significant, analyses should be presented proving that it does not entirely explain the increase in the frequency of NCOs in introgressed regions.

      We conducted a deeper exploration of the effect of marker resolution on NCO detection by randomly removing different proportions of markers from introgressed regions of the fermentation cross in order to simulate different marker resolutions from non-introgressed regions. We chose proportions of markers that would simulate different quantiles of the resolution of non-introgressed regions and repeated our standard pipeline in order to compare our NCO detection at the chosen marker densities. More details of this analysis have been added to the manuscript (lines 188-199, 525-538). We confirmed the effect of marker resolution on NCO detection (as reported in the updated manuscript and new supplementary figures S2-S10, new Table S10) and decided to repeat our analyses on the original data with a more stringent correction. For this we chose our observed average tract size for NCOs in introgressed regions (550bp), which leads to a far more conservative estimate of NCO counts (As seen in the updated Figure 2 and Table 2). This better accounts for the increased resolution in introgressed regions, and while it's possible to be more stringent with our corrections, we believe that further stringency would be unreasonable. We also see promising signs that the correction is sufficient when counting our CO and NCO events in both crosses, as described in our response to comment 39 (response to reviewer #3).

      (2) CO and NCO analyses performed separately for individual regions rarely show statistical significance (Figures 3 and 4). I think that the authors, after dividing the introgressed regions into non-overlapping windows of 100 bp (I suggest also trying 200 bp, 500 bp, and 1kb windows), should combine the data for all regions and perform correlations to SNP density in each window for the whole set of data. Such an analysis has a greater chance of demonstrating statistically significant relationships. This could replace the analysis presented in Figure 3 (which can be moved to Supplement). Moreover, the analysis should also take into account indels.

      We're uncertain of what is being requested here. If the comment refers to the effect of marker density on NCO detection, we hope the response to comment 2 will help resolve this comment as well. Otherwise, we ask for some clarification so that we may correct or revise as appropriate.

      (3) In Arabidopsis, it has been shown that crossover is stimulated in heterozygous regions that are adjacent to homozygous regions on the same chromosome (http://dx.doi.org/10.7554/eLife.03708.001, https://doi.org/10.1038/s41467-022-35722-3).

      This effect applies only to class I crossovers, and is reversed for class II crossovers (https://doi.org/10.15252/embj.2020104858, https://doi.org/10.1038/s41467-023-42511-z). This research system is very similar to the system used by the authors, although it likely differs in the level of DNA sequence divergence. The authors could discuss their work in this context.

      We thank the reviewer for sharing these references. We have added a discussion of our work in the context of these findings in the Discussion, lines 367-376.

      Reviewer #2 (Public Review):

      Summary:

      Schwartzkopf et al characterized the meiotic recombination impact of highly heterozygous introgressed regions within the budding yeast Saccharomyces uvarum, a close relative of the canonical model Saccharomyces cerevisiae. To do so, they took advantage of the naturally occurring Saccharomyces bayanus introgressions specifically within fermentation isolates of S. uvarum and compared their behavior to the syntenic regions of a cross between natural isolates that do not contain such introgressions. Analysis of crossover (CO) and noncrossover (NCO) recombination events shows both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency. These results strongly support the hypothesis that DNA sequence polymorphism inhibits CO formation, and has no or much weaker effects on NCO formation. Eventually, the authors show that the presence of introgressions negatively impacts "r", the parameter that reflects the probability that a randomly chosen pair of loci shuffles their alleles in a gamete.

      The authors chose a sound experimental setup that allowed them to directly compare recombination properties of orthologous syntenic regions in an otherwise intra-specific genetic background. The way the analyses have been performed looks right, although this reviewer is unable to judge the relevance of the statistical tests used. Eventually, most of their results which are elegant and of interest to the community are present in Figure 2.

      Strengths:

      Analysis of crossover (CO) and noncrossover (NCO) recombination events is compelling in showing both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency.

      Weaknesses:

      The main weaknesses refer to a few text issues and a lack of discussion about the mechanistic implications of the present findings.

      - Introduction

      (1) The introduction is rather long. | I suggest specifically referring to "meiotic" recombination (line 71) and to "meiotic" DSBs (line 73) since recombination can occur outside of meiosis (ie somatic cells).

      We agree and have condensed the introduction to be more focused. We also made the suggested edits to include “meiotic” when referring to recombination and DSBs.

      (2) From lines 79 to 87: the description of recombination is unnecessarily complex and confusing. I suggest the authors simply remind that DSB repair through homologous recombination is inherently associated with a gene conversion tract (primarily as a result of the repair of heteroduplex DNA by the mismatch repair (MMR) machinery) that can be associated or not to a crossover. The former recombination product is a crossover (CO), the latter product is a noncrossover (NCO) or gene conversion. Limited markers may prevent the detection of gene conversions, which erase NCO but do not affect CO detection.

      We changed the language in this section to reflect the reviewer’s suggestions.

      (3) In addition, "resolution" in the recombination field refers to the processing of a double Holliday junction containing intermediates by structure-specific nucleases. To avoid any confusion, I suggest avoiding using "resolution" and simply sticking with "DSB repair" all along the text.

      We made the suggested correction throughout the paper.

      (4) Note that there are several studies about S. cerevisiae meiotic recombination landscapes using different hybrids that show different CO counts. In the introduction, the authors refer to Mancera et al 2008, a reference paper in the field. In this paper, the hybrid used showed ca. 90 CO per meiosis, while their reference to Liu et al 2018 in Figure 2 shows less than 80 COs per meiosis for S. cerevisiae. This shows that it is not easy to come up with a definitive CO count per meiosis in a given species. This needs to be taken into account for the result section line 315-321.

      This is an excellent point. We added this context in the results (lines 180-187).

      (5) In line 104, the authors refer to S. paradoxus and mention that its recombination rate is significantly different from that of S. cerevisiae. This is inaccurate since this paper claims that the CO landscape is even more conserved than the DSB landscape between these two species, and they even identify a strong role played by the subtelomeric regions. So, the discussion about this paper cannot stand as it is.

      We agree with the reviewer's point. We also found that the entire paragraph was unnecessary, so it and the sentence in question have been removed.

      (6) Line 150, when the authors refer to the anti-recombinogenic activity of the MMR, I suggest referring to the published work from Martini et al 2011 rather than the not-yet-published work from Copper et al 2021, or both, if needed.

      Added the suggested citation.

      Results

      (7) The clear depletion in CO and the concomitant increase in NCO within the introgressed regions strongly suggest that DNA sequence polymorphism triggers CO inhibition but does not affect NCO or to a much lower extent. Because most CO likely arises from the ZMM pathway (CO interference pathway mainly relying on Zip1, 2, 3, 4, Spo16, Msh4, 5, and Mer3) in S. uvarum as in S. cerevisiae, and because the effect of sequence polymorphism is likely mediated by the MMR machinery, this would imply that MMR specifically inhibits the ZMM pathway at some point in S. uvarum. The weak effect or potential absence of the effect of sequence polymorphism on NCO formation suggests that heteroduplex DNA tracts, at least the way they form during NCO formation, escape the anti-recombinogenic effect of MMR in S. uvarum. A few comments about this could be added.

      We have added discussion and citations regarding the biased repair of DSB to NCO in introgression, lines 380-386.

      (8) The same applies to the fact that the CO number is lower in the natural cross compared to the fermentation cross, while the NCO number is the same. This suggests that under similar initiating Spo11-DSB numbers in both crosses, the decrease in CO is likely compensated by a similar increase in inter-sister recombination.

      Thank you to the reviewer for this observation. We agree that this could explain some differences between the crosses.

      (9) Introgressions represent only 10% of the genome, while the decrease in CO is at least 20%. This is a bit surprising especially in light of CO regulation mechanisms such as CO homeostasis that tends to keep CO constant. Could the authors comment on that?

      We interpret these results to reflect two underlying mechanisms. First, the presence of heterozygous introgression does reduce the number of COs. Second, we believe the difference in COs reflects variation in recombination rate between strains. We note that CO homeostasis need not apply across different genetic backgrounds. Indeed, recombination rate is appreciated to significantly differ between strains of S. cerevisiae (Raffoux et al. 2018), and recombination rate variation has been observed between strains/lines/populations in many different species including Drosophila, mice, humans, Arabidopsis, maize, etc. We reference S. cerevisiae strain variability in the Introduction lines 128-130, and have added context in the Results lines 180-187, and Discussion lines 343-350.

      (10) Finally, the frequency of NCOs in introgressed regions is about twice the frequency of CO in non-introgressed regions. Both CO and NCO result from Spo11-initiating DSBs.

      This suggests that more Spo11-DSBs are formed within introgressed regions and that such DSBs specifically give rise to NCO. Could this be related to the lack of homolog engagement which in turn shuts down Spo11-DSB formation as observed in ZMM mutants by the Keeney lab? Could this simply result from better detection of NCO in introgressed regions related to the increased marker density, although the authors claim that NCO counts are corrected for marker resolution?

      The effect noted by the reviewer remains despite the more conservative correction for marker density applied to NCO counts (as described in the response to Reviewer 1, comment #2). Given that CO+NCO counts in introgressed regions are not statistically different between crosses, it is likely that these regions are simply predisposed to a higher rate of DSBs than the rest of the genome. This is an interesting observation, however, and one that we would like to further explore in future work.

      (11) What could be the explanation for chromosome 12 to have more shuffling in the natural cross compared to the fermentation cross which is deprived of the introgressed region?

      We added this text to the Results, lines 323-327, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      Technical points:

      (12) In line 248, the authors removed NCO with fewer than three associated markers.

      What is the rationale for this? Is the genotyping strategy not reliable enough to consider events with only one or two markers? NCO events can be rather small and even escape detection due to low local marker density.

      We trust the genotyping strategy we used, but chose to be conservative in our detection of NCOs to account for potential sequencing biases.

      (13) Line 270: The way homology is calculated looks odd to this reviewer, especially the meaning of 0.5 homology. A site is either identical (1 homology) or not (0 homology).

      We've changed the language to better reflect what we are calculating (diploid sequence similarity; see comment #28). Essentially, the metric is a probability that two randomly selected chromatids--one from each parent--will share the same nucleotide at a given locus (akin to calculating the probability of homozygous offspring at a single locus). We average it along a segment of the genome to establish an expected sequence similarity if/when recombination occurs in that segment.

      (14) Line 365: beware that the estimates are for mitotic mismatch repair (MMR). Meiotic MMR may work differently.

      We removed the citation that refers exclusively to mitotic recombination. The statement regarding meiotic recombination is otherwise still reflective of results from Chen & Jinks-Robertson

      (15) Figure 1: there is no mention of potential 4:0 segregations. Did the authors find no such pattern? If not, how did they consider them?

      The program we used to call COs and NCOs (ReCombine's CrossOver program) can detect such patterns, but none were detected in our data.

      Reviewer #3 (Public Review):

      When members of two related but diverged species mate, the resulting hybrids can produce offspring where parts of one species' genome replace those of the other. These "introgressions" often create regions with a much greater density of sequence differences than are normally found between members of the same species. Previous studies have shown that increased sequence differences, when heterozygous, can reduce recombination during meiosis specifically in the region of increased difference. However, most of these studies have focused on crossover recombination, and have not measured noncrossovers. The current study uses a pair of Saccharomyces uvarum crosses: one between two natural isolates that, while exhibiting some divergence, do not contain introgressions; the other is between two fermentation strains that, when combined, are heterozygous for 9 large regions of introgression that have much greater divergence than the rest of the genome. The authors wished to determine if introgressions differently affected crossovers and noncrossovers, and, if so, what impact that would have on the gene shuffling that occurs during meiosis.

      (1) While both crossovers and noncrossovers were measured, assessing the true impact of increased heterology (inherent in heterozygous introgressions) is complicated by the fact that the increased marker density in heterozygous introgressions also increases the ability to detect noncrossovers. The authors used a relatively simple correction aimed at compensating for this difference, and based on that correction, conclude that, while as expected crossovers are decreased by increased sequence heterology, counter to expectations noncrossovers are substantially increased. They then show that, despite this, genetic shuffling overall is substantially reduced in regions of heterozygous introgression. However, it is likely that the correction used to compensate for the effect of increased sequence density is defective, and has not fully compensated for the ascertainment bias due to greater marker density. The simplest indication of this potential artifact is that, when crossover frequencies and "corrected" noncrossover frequencies are taken together, regions of introgression often appear to have greater levels of total recombination than flanking regions with much lower levels of heterology. This concern seriously undercuts virtually all of the novel conclusions of the study. Until this methodological concern is addressed, the work will not be a useful contribution to the field.

      We appreciate this concern. Please see response to comments #2 and #38. We further note that our results depicted in Figure 3 and 4 are not reliant on any correction or comparison with non-introgressed regions, and thus our results regarding sequence similarity and its effect on the repair of DSBs and the amount of genetic shuffling with/without introgression to be novel and important observations for the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 149 - this sentence refers to a mixture of papers reporting somatic or meiotic recombination and as these processes are based on different crossover pathways, this should not be mixed. For example, it is known that in Arabidopsis MSH2 has a pro-crossover function during meiotic recombination.

      Corrected

      (2) What is unclear to me is how the crosses are planned. Line 308 shows that there were only two crosses (one "natural" and one "fermentation"), but I understand that this is a shorthand and in fact several (four?) different strains were used for the "fermentation cross". At least that's what I concluded from Fig. 1B and its figure caption. This needs to be further explained. Were different strains used for each fermentation cross, or was one strain repeated in several crosses? In Figure 1, it would be worth showing, next to the panel showing "fermentation cross", a diagram of how "natural cross" was performed, because as I understand it, panel A illustrates the procedure common to both types of crosses, and not for "natural cross".

      We thank the reviewer for drawing our attention to confusion about how our crosses were created. We performed two crosses, as depicted in Figure 1A. The fermentation cross is a single cross from two strains isolated from fermentation environments. The natural cross is a single cross from two strains isolated from a tree and insect. Table S1 and the methods section "Strain and library construction" describe the strains used in more detail. We modified Figure 1 and the figure legend to help clarify this. See also response to comment #37.

      (3) The authors should provide a more detailed characterization of the genetic differences between chromosomes in their hybrids. What is the level of polymorphism along the S. uvarum chromosomes used in the experiments? Is this polymorphism evenly distributed? What are the differences in the level of polymorphism for individual introgressions? Theoretically, this data should be visible in Figure 2D, but this figure is practically illegible in the present form (see next comment).

      As suggested, we remade Figure 2D to only include chromosomes with an introgression present, and moved the remaining chromosomes to the supplements (Figure S11). The patterns of markers (which are fixed differences between the strains in the focal cross) should be more clear now. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression).

      (4) Figure 2D should be prepared more clearly, I would suggest stretching the chromosomes, otherwise, it is difficult to see what is happening in the introgression regions for CO and NCO (data for SNPs are more readable). Maybe leave only the chromosomes with introgressions and transfer the rest to the supplement?

      See previous comment.

      (5) How are the Y scales defined for Figure 2D?

      Figure 2D now includes units for the y-axis.

      (6) Are increases in CO levels in fermentation cross-observed at the border with introgressions? This would indicate local compensation for recombination loss in the introgressed regions, similar to that often observed for chromosomal inversions.

      We see no evidence of an increase in CO levels at the borders of introgressions, neither through visual inspection or by comparing the average CO rate in all fermentation windows to that of windows at the edges of introgressions. This is included in the Discussion lines 360-366, "While we are limited in our interpretations by only comparing two crosses (one cross with heterozygous introgression and one without introgression), these results are in line with findings in inversions, where heterozygotes show sharp decreases in COs, but the presence of NCOs in the inverted region (Crown et al., 2018; Korunes & Noor, 2019). However, unlike heterozygous inversions where an increase in COs is observed on freely recombining chromosomes (the inter-chromosomal effect), we do not see an increase in COs on the borders flanking introgression or on chromosomes without introgression."

      (7) Line 336 - "We find positive correlations between CO counts..." - you should indicate here that between fermentation and natural crosses, it was quite hard for me to understand what you calculated.

      We corrected the language as suggested.

      (8) The term "homology" usually means "having a common evolutionary origin" and does not specify the level of similarity between sequences, thus it cannot be measured. It is used incorrectly throughout the manuscript (also in the intro). I would use the term "similarity" to indicate the degree of similarity between two sequences.

      We corrected the language as suggested throughout the document.

      (9) Paragraph 360 and Figure 3 - was the "sliding window" overlapping or non-overlapping?

      We added clarifying language to the text in both places. We use a 101bp sliding window with 50bp overlaps.

      (10) Line 369 - what is "...the proportion of bases that are expected to match between the two parent strains..."?

      We clarified the language in this location, and hopefully changes associated with the comment about sequence similarity will make the comment even clearer in context.

      (11) Line 378 - should it refer to Figure S1 and not Figure 4?

      Corrected.

      (12) Line 399 - should refer to Figure 4, not Figure 5.

      Corrected

      (13) Line 444-449 - the analysis of loss of shuffling in the context of the location of introgression on the chromosome should be presented in the result section.

      We shifted the core of the analysis to the results, while leaving a brief summary in the discussion.

      (14) The authors should also take into account the presence of indels in their analyses, and they should be marked in the figures, if possible.

      We filtered out indels in our variant calling. However, we did analyze our crosses for the presence of large insertions and deletions (Table S2), which can obscure true recombination rates, and found that they were not an issue in our dataset.

      Reviewer #2 (Recommendations For The Authors):

      This reviewer suggests that the authors address the different points raised in the public review.

      (1) This reviewer would like to challenge the relevance of the r-parameter in light of chromosome 12 which has no introgression and still a strong depletion in r in the fermentation cross.

      We added this text to the Results, lines 377-381, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      (2) This reviewer insists on making sure that NCO detection is unaffected by the marker density, notably in the highly polymorphic regions, to unambiguously support Figure 1C.

      We've changed our correction for resolution to be more aggressive (see response to comment #2), and believe we have now adequately adjusted for marker density (see response to comment #38).

      Reviewer #3 (Recommendations For The Authors):

      I regret using such harsh language in the public review, but in my opinion, there has been a serious error in how marker densities are corrected for, and, since the manuscript is now public, it seems important to make it clear in public that I think that the conclusions of the paper are likely to be incorrect. I regret the distress that the public airing of this may cause. Below are my major concerns:

      (1) The paper is written in a way that makes it difficult to figure out just what the sequence differences are within the crosses. Part of this is, to be frank, the unusual way that the crosses were done, between more than one segregant each from two diploids in both natural and fermentation cases. I gather, from the homology calculations description, that each of these four diploids, while largely homozygous, contained a substantial number of heterozygosities, so individual diploids had different patterns of heterology. Is this correct? And if so, why was this strategy chosen? Why not start with a single diploid where all of the heterologies are known? Why choose to insert this additional complication into the mix? It seems to me that this strategy might have the perverse effect of having the heterology due to the polymorphisms present in one diploid affect (by correction) the impact of a noncrossover that occurs in a diploid that lacks the additional heterology. If polymorphic markers are a small fraction of total markers, then this isn't such a great concern, but I could not find the information anywhere in the manuscript. As a courtesy to the reader, please consider providing at the beginning some basic details about the starting strains-what is the average level of heterology between natural A and natural B, and what fraction of markers are polymorphic; what is the average level of heterology between fermentation A and fermentation B in non-introgressed regions, in introgressed regions, and what fraction of markers are polymorphic? How do these levels of heterology compare to what has been examined before in whole-genome hybrid strains? It also might be worth looking at some of the old literature describing S. cerevisiae/S. carlsbergensis hybrids.

      We thank the reviewer for drawing our attention to confusion about the cross construction. These crosses were conducted as is typical for yeast genetic crosses: we crossed 2 genetically distinct haploid parents to create a heterozygous diploid, then collected the haploid products of meiosis from the same F1 diploid. Because the crosses were made with haploid parents, it is not possible for other genetic differences to be segregating in the crosses. We have revised Figure 1 and its caption to clarify this. Further details regarding the crosses are in the Methods section "Strain and library construction" and in Supplemental Table S1. We only utilized genetic markers that are fixed differences between our parental strains to call CO and NCO. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression). We additionally revised Figure 2D (and Figure S11) to help readers better visualize differences between the crosses.

      (2) There are serious concerns about the methods used to identify noncrossovers and to normalize their levels, which are probably resulting in an artifactually high level of calculated crossovers in Figure 2. As a primary indication of this, it appears in Figure 2 that the total frequency of events (crossovers + noncrossovers) in heterozygous introgressed regions are substantially greater than those in the same region in non-introgressed strains, while just shifting of crossovers to noncrossovers would result in no net increase. The simplest explanation for this is that noncrossovers are being undercounted in non-introgressed relative to introgressed heterozygous regions. There are two possible reasons for this: i. The exclusion of all noncrossover events spanning less than three markers means that many more noncrossovers in introgressed heterozygous regions than in non-introgressed. Assuming that average non-homology is 5% in the former and 1% in the latter, the average 3-marker event will be 60 nt in introgressed regions and 300 nt in non-introgressed regions - so many more noncrossovers will be counted in introgressed regions. A way to check on this - look at the number of crossover-associated markers that undergo gene conversion; use the fraction that involves < 3 markers to adjust noncrossover levels (this is the strategy used by Mancera et al.). ii. The distance used for noncrossover level adjustment (2kb) is considerably greater than the measured average noncrossover lengths in other studies. The effect of using a too-long distance is to differentially under-correct for noncrossovers in non-introgressed regions, while virtually all noncrossovers in heterozygous introgressed regions will be detected. This can be illustrated by simulations that reduce the density of scored markers in heterozygous introgressed regions to the density seen in non-introgressed regions. Because these concerns go to the heart of the conclusions of the paper, they must be addressed quantitatively - if not, the main conclusions of the paper are invalid.

      We adjusted the correction factor (See also response to comment #2) and compared the average number of CO and NCO events in introgressed and non-introgressed regions between crosses (two comparisons: introgression CO+NCO in natural cross vs introgression CO+NCO in fermentation cross; non-introgression CO+NCO in natural cross vs non-introgression CO+NCO in fermentation cross). We found no significant differences between the crosses in either of the comparisons. This indicates that the distribution of total events is replicated in both crosses once we correct for resolution.

      (3) It is important to distinguish the landscape of double-strand breaks from the landscape of recombination frequencies. Double-strand breaks, as measured by uncalibrated levels of Spo11-linked oligos, is a relative number - not an absolute frequency. So it is possible that two species could have a similar break landscape in terms of topography but have absolute levels higher in one species than in the other.

      We agree with this statement, however, we have removed the relevant text to streamline our introduction.

      (4) Lines 123-125. Just meiosis will produce mosaic genomes in the progeny of the F1; further backcrossing will reduce mosaicism to the level of isolated regions of introgression.

      Adjusted the language to be more specific.

      (5) Please provide actual units for the Y axes in Figure 2D.

      We have corrected the units on the axes.

      (6) Tables (general). Are the significance measures corrected for multiple comparisons?

      In Table 3, the cutoff was chosen to be more conservative than a Bonferroni corrected alpha=0.01 with 9 comparisons (0.0011). In text, any result referred to as significant has an associated hypothesis test with a p-value less than its corresponding Bonferroni-corrected alpha of 0.05. This has been clarified in the caption for Table 3 and in the text where relevant.

    1. eLife Assessment

      This study presents a valuable finding in understanding the role of plectin in the development and progression of hepatocellular carcinoma (HCC). The evidence supporting the conclusions is convincing because multiple orthogonal ways were used to demonstrate the requirement of this target in liver cancer models. However, the study is incomplete as the downstream molecular activities of plectin that mediate the cancer phenotypes were not fully evaluated.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the role of plectin, a cytoskeletal crosslinker protein, in liver cancer formation and progression. Using the liver-specific Plectin knockout mouse model, the authors convincingly showed that PLECTIN is critical for hepatocarcinogenesis, as functional inhibition of plectin suppressed tumor formation in several models. They also provided evidence to show that inhibition of plectin inhibited HCC cell invasion and reduced metastatic outgrowth in the lung. Mechanistically, they suggested that plectin inhibition attenuated FAK, MAPK/ERK, and PI3K/AKT signaling.

      Strengths:

      The authors generated a liver-specific plectin knockout mouse model. By using DEN and sgP53/MYC models, the authors convincingly demonstrated an oncogenic role of PLECTIN in HCC development. plecstatin-1 (PST), as a plectin inhibitor, showed promising efficacy in inhibiting HCC growth, which provides a basis for potentially treating HCC using PST.

      The MIR images for tracking tumor growth in animal models were compelling. The high-quality confocal images and related qualifications convincingly showed the impact of plectin functional inhibition on contractility and adhesions in HCC cells.

      Weaknesses:

      The conclusions of this paper are primarily well supported by data. However, some claims were not fully supported by the data presented.

      The authors suggest that plectin controls oncogenic FAK, MAPK/Erk, and PI3K/Akt signaling in HCC cells, representing the mechanisms by which plectin promotes HCC formation and progression. However, the effect of plectin inactivation on these signaling was inconsistent in Huh7 and SNU-475 cells (Figure 3D), despite similar cell growth inhibition in both cell lines (Figure 2G). For example, pAKT and pERK were only reduced by plectin inhibition in SNU-475 cells but not in Huh7 cells. In addition, pFAK was not changed by plectin inhibition in both cells, and the ratio of pFAK/FAK was increased in both cells. Thus, it is hard to convince me that plectin promotes HCC formation and progression by regulating these signalings. Overall, the mechanistic studies in this manuscript lack sufficient depth.

      The authors claimed that plectin inactivation inhibits HCC invasion and metastasis using in vitro and in vivo models. However, the results from in vivo models were not as compelling as the in vitro data. The lung colonization assay is not an ideal in vivo model for studying HCC metastasis and invasion, especially when plectin inhibition suppresses HCC cell growth and survival. Using an orthotopic model that can metastasize into the lung or spleen could be much more convincing for an essential claim. Also, in Figure 6H, histology images of lungs from this experiment need to be shown to understand plectin's effect on metastasis better. Figure 6G, it is unclear how many mice were used for this experiment. Did these mice die due to the tumor burdens in the lungs?

      The whole paper used inhibition strategies to understand the function of plectin. However, the expression of plectin in Huh7 cells is low (Figure 1D). It might be more appropriate to overexpress plectin in this cell line or others with low plectin expression to examine the effect on HCC cell growth and migration.

    3. Reviewer #2 (Public review):

      Summary:

      Plectin is a cytolinker that associates with all three main components of the cytoskeleton and intercellular junctions and is essential for epithelial tissue integrity. Previous reports showed that PLEC regulates tumor growth and metastasis in different cancers. In this manuscript, the authors described PLEC as a target in the initiation and growth of HCC. They showed that inhibiting PLEC reduced tumorigenesis in different in vitro and in vivo HCC models, including in a xenograft model, DEN model, oncogene-induced HCC model, and a lung metastasis model. Mechanistically, the authors showed that inhibiting PLEC results in a disorganized cytoskeleton, deficiency in cell migration, and changes in relevant signaling pathways.

      Strengths:

      In general, the data are shown in multiple ways and support the main conclusion of the manuscript. The results add to the field by highlighting the importance of cellular mechanics in cancer progression.

      Weaknesses:

      (1) The annotation of mouse numbers is confusing. In Figures 2A B D E F, it should be the same experiment, but the N numbers in A are 6 and 5. In E and F they are 8 and 3. Similarly, in Figure 2H, in the tumor size curve, the N values are 4,4,5,6. In the table, N values are 8,8,10,11 (the authors showed 8,7,8,7 tumors that formed in the picture).

      (2) In Figure 3D and Figure S3C, the changes in most of the proteins/phosphorylation sites are not convincing/consistent. These data are not essential for the conclusion of the paper and WB is semi-quantitative. Maybe including more plots of the proteins from proteomic data could strengthen their detailed conclusions about the link between Plectin and the FAK, MAPK/Erk, PI3K/Akt pathways as shown in 3E.

      (3) Figure S7A and B, The pictures do not show any tumor, which is different from Figure 7A and B (and from the quantification in S7A lower right). Is it just because male mice were used in Figure 7 and female mice were used in Figure S7? Is there literature supporting the sex difference for the Myc-sgP53 model?

      (4) Figure 2F, S2A, PleΔAlb mice more frequently formed larger tumors, as reflected by overall tumor size increase. The interpretation of the authors is "possibly implying reduced migration or increased cohesion of plectin-depleted cells". It is quite arbitrary to make this suggestion in the absence of substantial data or literature to support this theory.

      (5) Mutation or KO PLEC has been shown to cause severe diseases in humans and mice, including skin blistering, muscular dystrophy, and progressive familial intrahepatic cholestasis. Please elaborate on the potential side effects of targeting plectin to treat HCC.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Outla Z et al described the analysis of plectin in HCC pathogenesis. Specifically, it was found that elevated plectin levels in liver tumors, correlated with poor prognosis for HCC patients. Mechanistically, it showed that plectin-dependent disruption of cytoskeletal networks leads to the attenuation of oncogenic FAK, MAPK/Erk, and PI3K/AKT signals. Finally, the authors showed that plectin inhibitor plecstatin-1 (PST) is well-tolerated and capable of overcoming therapy resistance in HCC.

      Strengths:

      The studies of plectin are not entirely novel (Pubmed: 36613521). Nevertheless, the current manuscript provides a much more detailed mechanistic study and the results have translational implications. Additional strengths include convincing cell biology data, such as plectin regulates cytoskeletal networks, and HCC migration/invasion.

      Weaknesses:

      Multiple major issues are noted, and the conclusion is not well supported by the data presented.

      (1) The rationale for using Huh7 cells in the manuscript is not well explained as it has the lowest plectin expression levels.

      (2) The KO cell experiments should be supplemented with overexpression experiments.

      (3) There is significant concern that while ablation of Ple led to reduced tumor number, these mice had larger tumors. The data indicate that plectin may have distinct roles in HCC initiation versus progression. The data are not well explained and do not fully support that plectin promotes hepatocarcinogenesis.

      (4) Figure 3 showed that plectin does not regulate p-FAK/FAK expression. Therefore, the statement that plectin regulates the FAK pathway is not valid. Furthermore, there are too many variables in turns of p-AKT and p-ERK expression, making the conclusion not well supported.

      (5) The studies of plecstatin-1 in HCC should be expanded to a panel of human HCC cells with various plectin expression levels in turns of cell growth and cell migration. The IC50 values should be determined and correlate with plectin expression.

      (6) One of the major issues is the mechanistic studies focusing on plectin regulating HCC migration/metastasis, whereas the in vivo mouse studies focus on HCC formation (Figures 3 and 7). These are distinct processes and should not be mixed.

      (7) Figure 7B showed that Ple KO mice were treated with PST, but the data are not presented in the manuscript. Tumor cell proliferation and apoptosis rates should be analyzed as well.

      (8) The status of FAK, AKT, and ERK pathway activation was not analyzed in mouse liver samples. In Figure 7D, most of the adjusted p-values are not significant.

      (9) There is no evidence to support that PST is capable of overcoming therapy resistance in HCC. For example, no comparison with the current standard care was provided in the preclinical studies.

    1. eLife Assessment

      This study presents a valuable examination of the prevalence of interactions between amino acids from different periods of Earth's history and coenzymes. While the premise of this work is well founded and the analysis is solid, with more data, the interpretation could change. This manuscript would be of interest to evolutionary biologists and biophysicists.

    1. eLife Assessment

      This important study addresses two questions: (i) how danger signaling is altered for people with childhood adversities, and (ii) how this differs across different operationalizations of adversity. The latter is of particularly broad interest to multiple fields, given that childhood adversity is operationalized very differently across the literature. The study provides compelling evidence using a large sample size and rigorous statistical methods. These data will be of interest to scientists and clinicians interested in early life adversity, statistical approaches for quantifying stress exposure, or aversive learning.

    2. Author response:

      Joint Public Reviews:

      Here, the authors compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task. They use a large dataset to definitively understand a phenomenon that, to date, has been addressed using a range of different definitions and methods, typically with insufficient statistical power. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature. However, the analysis would be strengthened by employing more sophisticated modelling techniques that account for between-subjects covariates and the presentation of the data needs to be streamlined to make it clearer for the broad audience for which it is intended.

      Strengths

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are generally thorough.

      Weaknesses

      Weakness 1: The largest concern is that the paper primarily relies on ANOVAs and pairwise testing for its analyses and does not include between-subjects covariates. Employing mixedeffects models instead of ANOVAs would allow more sophisticated control over sources of random variance in the sample (especially important for samples from multi-site studies such as the present study), and further allow the inclusion of potentially relevant between-subjects covariates such as age (e.g. Eisenstein et al., 1990) and gender identity or sex assigned at birth (e.g. Kopacz II & Smith, 1971) (perhaps especially relevant due to possible to gender or sex-related differences in ACE exposure; e.g. Kendler et al., 2001). Also, proxies for socioeconomic status (e.g. income, education) can be linked with ACE exposure (e.g. Maholmes & King, 2012) and warrant consideration as covariates, especially if they differ across adversity-exposed and unexposed groups. 

      We appreciate the reviewer's suggestion and recognize the value of using (more) sophisticated statistical methods. However, we think that considerations which methods to employ should not only be guided by perceived complexity and think that the chosen ANOVA -based approach provides reliable and valid data. In our revision, we address the reviewer's suggestion by demonstrating that employing mixed models leaves the reported results unchanged (a). We would also like to refer the reviewer to the robustness analyses provided in the initial supplementary material (b).

      a) Re-running analyses using mixed models

      Based on the reviewers' suggestion, we repeated our main analyses (association between exposure to childhood adversity and SCRs, arousal, valence, and contingency ratings during fear acquisition and generalization) using linear mixed models, including age, sex, educational attainment, and childhood adversity as fixed effects, and site as a random effect. These analyses produced results similar to those in our manuscript, demonstrating a significant effect of childhood adversity on SCRs, as assessed by CS discrimination during both acquisition training and the generalization phase, and on general reactivity, but not on linear deviation scores (LDS). For the different rating types, we did not observe any significant effects of childhood adversity.

      We would prefer to retain our main analyses as they are and report the linear mixed model results as additional results in the supplement. However, if the reviewer and editor have strong preferences otherwise, we are open to presenting the mixed models in the main manuscript and moving our previous analyses to the supplement.

      We added the following paragraph to the main manuscript (page 25-26):

      “At the request of a reviewer, we repeated our main analyses by using linear mixed models including age, sex, school degree (i.e., to approximate socioeconomic status), and exposure to childhood adversity as mixed effects as well as site as random effect. These analyses yielded comparable results demonstrating a significant effect of childhood adversity on CS discrimination during acquisition training and the generalization phase as well as on general reactivity, but not on the generalization gradients in SCRs (see Supplementary Table 2 A). Consistent with the results of the main analyses reported in our manuscript, we did not observe any significant effects of childhood adversity on the different types of ratings when using mixed models (see Supplementary Table 2 B-D). Some of the mixed model analyses showed significantly lower CS discrimination during acquisition training and generalization, and lower general reactivity in males compared to females (see Supplementary Table 2 for details).”

      b) Additional robustness tests for the main analyses (already provided in the initial submission as supplementary material)

      We would also like to refer the reviewer to the robustness analyses in the initial supplement to account for possible site effects. Adding site to the analyses affected the pvalue in only one instance: entering site as covariate in analyses of CS discrimination during acquisition training attenuated the p-value of the ACQ exposure effect from p = 0.020 to p = 0.089.

      Further robustness checks involved repeating our main analyses while excluding (a) physiological non-responders (participants with only SCRs = 0) and (b) extreme outliers (data points ± 3 SDs from the mean) to ensure generalizable results. These repetitions of the analyses did not lead to any changes in the results.

      We did not include age in our primary analyses due to the homogeneity of our sample and the lack of related hypotheses. Additionally, socio-economic status was assessed only crudely via the highest education level attained, rendering it of limited use.

      Weakness 2: On a related methodological note, the authors mention that scores representing threat and deprivation were not problematically collinear due to VIFs being <10; however, some sources indicate that VIFs should be <5 (e.g. Akinwande et al., 2015).

      We thank the reviewer for bringing different cut-offs to our attention. We have revised this section to highlight the arbitrary nature of their interpretation (page 33):

      “Within the dimensional model framework, the issue of multicollinearity among predictors (i.e., different childhood adversity types) is frequently discussed (McLaughlin et al., 2021; Smith & Pollak, 2021). If we apply the rule of thumb of a variance inflation factor (VIF) > 10, which is often used in the literature to indicate concerning multicollinearity (e.g., Hair, Anderson, Tatham, & Black, 1995; Mason, Gunst, & Hess, 1989; Neter, Wasserman, & Kutner, 1989), we can assume that that multicollinearity was not a concern in our study (abuse: VIF = 8.64; neglect: VIF = 7.93). However, some authors state that VIFs should not exceed a value of 5 (e.g., Akinwande, Dikko, and Samson (2015)), while others suggest that these rules of thumb are rather arbitrary (O’brien, 2007).”

      Weakness 3: Additionally, the paper reports that higher trait anxiety and depression symptoms were observed in individuals exposed to ACEs, but it would be helpful to report whether patterns of SCR were in turn associated with these symptom measures and whether the different operationalizations of ACE exposure displayed differential associations with symptoms.

      We thank the reviewer for highlighting these relevant points. We have included additional analyses in the supplementary material in response to this comment. Figures and the corresponding text are also copied below for your convenience.

      We added the following paragraphs to the main manuscript: Methods (page 21):

      “Analyses of trait anxiety and depression symptoms

      To further characterize our sample, we compared individuals being unexposed compared to exposed to childhood adversity on trait anxiety and depression scores by using Welch tests due to unequal variances.

      On the request of a reviewer, we additionally investigated the association of childhood adversity as operationalized by the different models used in our explanatory analyses (i.e., cumulative risk, specificity, and dimensional model) and trait anxiety as well as depression scores (see Supplementary Figure 7). By using STAI-T and ADS-K scores as independent variable, we calculated a) a comparison of conditioned responding of the four severity groups (i.e., no, low, moderate, severe exposure to childhood adversity) using one-way ANVOAs and the association with the number of sub-scales exceeding an at least moderate cut-off in simple linear regression models for the implementation of the cumulative risk model, and b) the association with the CTQ abuse and neglect composite scores in separate linear regression models for the implementation of the specificity/dimensional models. On request of the reviewer, we also calculated the Pearson correlation between trait anxiety (i.e., STAI-T scores), depression scores (i.e., ADS-K scores) and conditioned responding in SCRs (see Supplementary Table 8).”

      Results (page 38):

      “Analyses of trait anxiety and depression symptoms

      As expected, participants exposed to childhood adversity reported significantly higher trait anxiety and depression levels than unexposed participants (all p’s < 0.001; see Table 1 and Supplementary Figure 6). This pattern remained unchanged when childhood adversity was operationalized differently - following the cumulative risk approach, the specificity, and dimensional model (see methods). These additional analyses all indicated a significant positive relationship between exposure to childhood adversity and trait anxiety as well as depression scores irrespective of the specific operationalization of “exposure” (see Supplementary Figure 7).

      CS discrimination during acquisition training and the generalization phase, generalization gradients, and general reactivity in SCRs were unrelated to trait anxiety and depression scores in this sample with the exception of a significant association between depression scores and CS discrimination during fear acquisition training (see Supplementary Table 8). More precisely, a very small but significant negative correlation was observed indicating that high levels of depression were associated with reduced levels of CS discrimination (r = -0.057, p =0.033). The correlation between trait anxiety levels and CS discrimination during fear acquisition training was not statistically significant but on a descriptive level, high anxiety scores were also linked to lower CS discrimination scores (r = -0.05, p = 0.06) although we highlight that this should not be overinterpreted in light of the large sample. However, both correlations (i.e., CS-discrimination during fear acquisition training and trait anxiety as well as depression, respectively) did not statistically differ from each other (z = 0.303, p = 0.762, Dunn & Clark, 1969). Interestingly, and consistent with our results showing that the relationship between childhood adversity and CS discrimination was mainly driven by significantly lower CS+ responses in exposed individuals, trait anxiety and depression scores were significantly associated with SCRs to the CS+, but not to the CS- during acquisition training (see Supplementary Table 8).”

      Weakness 4: Given the paper's framing of SCR as a potential mechanistic link between adversity and mental health problems, reporting these associations would be a helpful addition. These results could also have implications for the resilience interpretation in the discussion (lines 481-485), which is a particularly important and interesting interpretation.

      We have added a paragraph on this to the discussion (page 41):

      “Interestingly, in our study, trait anxiety and depression scores were mostly unrelated to SCRs, defined by CS discrimination and generalization gradients based on SCRs as well as general SCR reactivity, with the exception of a significant - albeit minute - relationship between CS discrimination during acquisition training and depression scores (see above). Although reported associations in the literature are heterogeneous (Lonsdorf et al., 2017), we may speculate that they may be mediated by childhood adversity. We conducted additional mediation analyses (data not shown) which, however, did not support this hypothesis. As the potential links between reduced CS discrimination in individuals exposed to childhood adversity and the developmental trajectories of psychopathological symptoms are still not fully understood, future work should investigate these further in - ideally - prospective studies.”

      Weakness 5: Given that the manuscript criticizes the different operationalizations of childhood adversity, there should be greater justification of the rationale for choosing the model for the main analyses. Why not the 'cumulative risk' or 'specificity' model? Related to this, there should also be a stronger justification for selecting the 'moderate' approach for the main analysis. Why choose to cut off at moderate? Why not severe, or low? Related to this, why did they choose to cut off at all? Surely one could address this with the continuous variable, as they criticize cut-offs in Table 2.

      We thank the reviewers and editors for bringing to our attention that our reasoning for choosing the main model was not clear. As outlined in the manuscript, we chose the approach for the main analyses from the literature as a recent review on this topic (Ruge et al., 2023) has shown the moderate CTQ cut-off to be the most abundantly employed in the field of research on associations between childhood adversity and threat learning. We have made this rationale more explicit in our revised manuscript (page 15/21):

      “Operationalization of "exposure"

      We implemented different approaches to operationalize exposure to childhood adversity in the main analyses and exploratory analyses (see Table 2). In the main analyses, we followed the approach most commonly employed in the field of research on childhood adversity and threat learning - using the moderate exposure cut-off of the CTQ (for a recent review see Ruge et al. (2024)). In addition, the heterogeneous operationalizations of classifying individuals into exposed and unexposed to childhood adversity in the literature (Koppold, Kastrinogiannis, Kuhn, & Lonsdorf, 2023; Ruge et al., 2024) hampers comparison across studies and hence cumulative knowledge generation. Therefore, we also provide exploratory analyses (see below) in which we employ different operationalizations of childhood adversity exposure.”

      “Exploratory analyses

      Additionally, the different ways of classifying individuals as exposed or unexposed to childhood adversity in the literature (Koppold et al., 2023; for discussion see Ruge et al., 2024) hinder comparison across studies and hence cumulative knowledge generation. Therefore, we also conducted exploratory analyses using different approaches to operationalize exposure to childhood adversity (see Table 2 for details).”

      Furthermore, as correctly noted, we fully agree that employing the moderate cut-off (or any cut-off in fact) is in principle an arbitrary decision - despite being guided by and derived from the literature in the field. However, we would like to draw the reviewers’ attention to Figure 5 in the initial submission (please see also below): Although the differences in SCR between severity groups were not significant, the overall pattern suggests at a descriptive level that the decline in CS discrimination, LDS and general reactivity in SCR occurs mainly when childhood adversity exceeds a moderate level. Thus, while we used the moderate cut-off as it was recently shown to be the most widely used approach in the literature (see Ruge et al., 2023), our exploratory analyses also seem to suggest on a descriptive level, that this cut-off may indeed “make sense”. We also refer to this in the results section (page 31-32) and discussion (page 43-44):

      Results:

      “However, on a descriptive level (see Figure 5), it seems that indeed exposure to at least a moderate cut-off level may induce behavioral and physiological changes (see main analysis, Bernstein & Fink, 1998). This might suggest that the cut-off for exposure commonly applied in the literature (see Ruge et al., 2024) may indeed represent a reasonable approach.”

      Discussion:

      “It is noteworthy, however, that this cut-off appears to map rather well onto psychophysiological response patterns observed here (see Figure 5). More precisely, our exploratory results of applying different exposure cut-offs (low, moderate, severe, no exposure) seem to indicate that indeed a moderate exposure level is “required” for the manifestation of physiological differences, suggesting that childhood adversity exposure may not have a linear or cumulative effect.”

      Weakness 6: In the Introduction, the authors predict less discrimination between signals of danger (CS+) and safety (CS-) in trauma-exposed individuals driven by reduced responses to the CS+. Given the potential impact of their findings for a larger audience, it is important to give greater theoretical context as to why CS discrimination is relevant here, and especially what a reduction in response specifically to danger cues would mean (e.g. in comparison to anxiety, where safety learning is impacted).

      We thank the reviewer for highlighting that this was not sufficiently clear. We revised the paragraph in the introduction as follows (page 7-8):

      “Fear acquisition as well as extinction are considered as experimental models of the development and exposure-based treatment of anxiety- and stress-related disorders. Fear generalization is in principle adaptive in ensuring survival (“better safe than sorry”), but broad overgeneralization can become burdensome for patients. Accordingly, maintaining the ability to distinguish between signals of danger (i.e., CS+) and safety (i.e., CS-) under aversive circumstances is crucial, as it is assumed to be beneficial for healthy functioning (Hölzel et al., 2016) and predicts resilience to life stress (Craske et al., 2012), while reduced discrimination between the CS+ and CS- has been linked to pathological anxiety (Duits et al., 2015; Lissek et al., 2005): Meta-analyses suggest that patients suffering from anxiety- and stress-related disorders show enhanced responding to the safe CS- during fear acquisition (Duits et al., 2015). During extinction, patients exhibit stronger defensive responses to the CS+ and a trend toward increased discrimination between the CS+ and CS- compared to controls, which may indicate delayed and/or reduced extinction (Duits et al., 2015). Furthermore, meta-analytic evidence also suggests stronger generalization to cues similar to the CS+ in patients and more linear generalization gradients (Cooper, van Dis, et al., 2022; Dymond, Dunsmoor, Vervliet, Roche, & Hermans, 2015; Fraunfelter, Gerdes, & Alpers, 2022). Hence, aberrant fear acquisition, extinction, and generalization processes may provide clear and potentially modifiable targets for intervention and prevention programs for stress-related psychopathology (McLaughlin & Sheridan, 2016).”

      Recommendations for the authors:

      Abstract:

      Comment 1:

      (a) It does not succinctly describe the background rationale well (i.e. it tries to say too much). It should be streamlined. There is a lot of 'jargon', which muddies the results, and too many concepts are introduced at each part and assume knowledge from the reader. 

      We thank the reviewer for providing constructive guidance for revisions. We have revised our abstract according to these suggestions.

      (b) Multiple terms for childhood trauma are used: ACEs, early adversity, childhood trauma, and childhood maltreatment. Choose one term and stick to it to enhance clarity. Why not just use childhood adversity, as in the title? Related to this, the use of ACEs sets up an expectation that ACE questionnaire was used, so readers are then surprised to find they used the childhood trauma questionnaire.

      We thank the reviewer for bringing this to our attention. As suggested by the reviewer, we use the term “childhood adversity” in our revised manuscript.

      Introduction:

      Comment 2:

      The phrasing seems to 'exaggerate' the trauma problem and is too broad in the first paragraph - e.g., "two-thirds of people experience one or more traumatic events..." It is important to clarify that not all of these people will go on to develop behavioral, somatic, and psychopathological conditions. Could break this down more into how many people have low, moderate, or severe for clarity, as 1 childhood adversity is different to 5+, and the type.

      We thank the reviewer for bringing this to our attention and have revised the first paragraph accordingly (page 6). Please note, however, that in the literature typically a specific cut-off (e.g. moderate) is used and the number of individuals that would meet different cut-offs (e.g., low and high) are not specifically reported.

      “Exposure to childhood adversity is rather common, with nearly two thirds of individuals experiencing one or more traumatic events prior to their 18th birthday (McLaughlin et al., 2013). While not all trauma-exposed individuals develop psychopathological conditions, there is some evidence of a dose-response relationship (Danese et al., 2009; Smith & Pollak, 2021; Young et al., 2019). As this potential relationship is not yet fully clear, understanding the mechanisms by which childhood adversity becomes biologically embedded and contributes to the pathogenesis of stress-related somatic and mental disorders is central to the development of targeted intervention and prevention programmes.”

      Comment 3:

      The published cut-offs for exposed/unexposed should be indicated here.

      We have included the published cut-offs as suggested (page 10):

      We operationalize childhood adversity exposure through different approaches: Our main analyses employ the approach adopted by most publications in the field (see Ruge et al., 2024 for a review) - dichotomization of the sample into exposed vs. unexposed based on published cut-offs for the Childhood Trauma Questionnaire [CTQ; Bernstein et al. (2003); Wingenfeld et al. (2010)]. Individuals were classified as exposed to childhood adversity if at least one CTQ subscale met the published cut-off (Bernstein & Fink, 1998; Häuser, Schmutzer, & Glaesmer, 2011) for at least moderate exposure (i.e., emotional abuse  13, physical abuse  10, sexual abuse  8, emotional neglect  15, physical neglect  10).

      Comment 4:

      Please check for overly complex sentences, and reduce the complexity. For example: "In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to ACEs into statistical tests while acknowledging that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions."

      We have revised this section and carefully proofread our manuscript by paying attention to this (page 10):

      “In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to childhood adversity into statistical tests. At the same time, we acknowledge that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions”

      Here is another example of reducing the complexity of our sentences (page 6):

      “Learning is a core mechanism through which environmental inputs shape emotional and cognitive processes and ultimately behavior. Thus, learning mechanisms are key candidates potentially underlying the biological embedding of exposure to childhood adversity and their impact on development and risk for psychopathology (McLaughlin & Sheridan, 2016).”

      Methods:

      Comment 5:

      Is this study part of a larger project? These outcomes were probably not the primary outcomes of this multicenter project. The readers need to understand how this (crosssectional?) analysis was nested in this larger trial.

      We thank the reviewers and editor for bringing to our attention that this was not sufficiently clear. Thus far, we included the information that we used the participants recruited for large multicentric study in the main manuscript, but point to the inclusion of more information in the supplement (page 11):

      “In total, 1678 healthy participants (age_M_ = 25.26 years, age_SD_ = 5.58 years, female = 60.10%, male = 39.30%) were recruited in a multi-centric study at the Universities of Münster, Würzburg, and Hamburg, Germany (SFB TRR58). Data from parts of the Würzburg sample have been reported previously (Herzog et al., 2021; Imholze et al., 2023; Schiele, Reinhard, et al., 2016; Schiele, Ziegler, et al., 2016; Stegmann et al., 2019). These previous reports, also those focusing on experimental fear conditioning (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019), addressed, however, research questions different from the ones investigated here (see also Supplementary Material for details).”

      Moreover, we have included additional information on the larger trial in our revised supplement (page 2):

      “Participants of this study were recruited in a multi-centric collaborative research center “Fear, anxiety, anxiety disorders” joining forces between the Universities of Hamburg,

      Würzburg, and Münster, Germany (SFB TRR58). During the second funding period of (20132016), all three sites recruited a large sample (N ~500) in the context of the Z project. All participants underwent the cross-sectional experimental paradigm reported here and were additionally extensively characterized to allow specific subprojects to recruit target subpopulations serving different aims with a focus on molecular genetic, epigenetic, or other research questions (see Herzog et al. (2021); Imholze et al. (2023); Schiele, Reinhard, et al. (2016); Schiele, Ziegler, et al. (2016); Stegmann et al. (2019)). The question on the association of exposure to childhood adversity and recent adversity was part of the primary research question of one subproject led by the senior author of this work (B07, TBL) and was hence a research question of primary interest also for this multicentric project.”

      Comment 6:

      Table 1 does not include percentages (a reader must calculate them: for example, 15% exposed?). These numbers belong in the results (i.e., it is confusing to read about the exposed/non-exposed before we know how it has been calculated).

      We have added the percentages as suggested and have included information on how exposed and unexposed was calculated as a table caption. We have considered moving the table to the results section but find it more suitable here. 

      Comment 7:

      A procedure figure could be useful.

      We thank the reviewer for this advice and have included a procedure figure in the supplementary material.

      Comment 8:

      Physiological data recordings and processing paragraph: The reasoning as to why the authors chose log transformation over square root transformation, or an approach that does not require transformation is not clear.

      We thank the reviewer for notifying us that we did not make this point clear enough. We opted for a log-transformation and range-correction of the SCR data because we use these transformations consistently in our laboratory (e.g., Ehlers et al., 2020; Kuhn et al., 2016; Scharfenort & Lonsdorf, 2016; Sjouwerman et al., 2015; Sjouwerman et al. 2020). In addition, log-transformed and range-corrected data are assumed to be closer to a normal distribution, to have a lower error variance resulting in larger effect sizes (Lykken & Venables, 1971; Lykken, 1972; Sjouwerman et al., 2022), and appear to have - at least descriptively - higher reliability compared to raw data (Klingelhöfer-Jens et al., 2022). We added a sentence on this to the methods section (page 14):

      Note that previous work using this sample (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019) had used square-root transformations but we decided to employ a log-transformation and range-correction (i.e., dividing each SCR by the maximum SCR per participant). We used log-transformation and range-correction for SCR data because these transformations are standard practice in our laboratory and we strive for methodological consistency across different projects (e.g., Ehlers, Nold, Kuhn, Klingelhöfer-Jens, & Lonsdorf, 2020; Kuhn, Mertens, & Lonsdorf, 2016; Scharfenort, Menz, & Lonsdorf, 2016; Sjouwerman & Lonsdorf, 2020; Sjouwerman, Niehaus, & Lonsdorf, 2015). Additionally, log-transformed and rangecorrected data are generally assumed to approximate a normal distribution more closely and exhibit lower error variance, which leads to larger effect sizes (Lykken, 1972; Lykken & Venables, 1971; Sjouwerman, Illius, Kuhn, & Lonsdorf, 2022). Additionally, on a descriptive level, this combination of transformations appear to offer greater reliability compared to using raw data alone (Klingelhöfer-Jens, Ehlers, Kuhn, Keyaniyan, & Lonsdorf, 2022).

      Ehlers, M. R., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. B. (2020). Revisiting potential associations between brain morphology, fear acquisition and extinction through new data and a literature review. Scientific Reports, 10(1), 19894. https://doi.org/10.1038/s41598-020-76683-1

      Kuhn, M., Mertens, G., & Lonsdorf, T. B. (2016). State anxiety modulates the return of fear. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 110, 194–199. https://doi.org/10.1016/j.ijpsycho.2016.08.001

      Scharfenort, R., & Lonsdorf, T. B. (2016). Neural correlates of and processes underlying generalized and differential return of fear. Social Cognitive and Affective Neuroscience, 11(4), 612–620. https://doi.org/10.1093/scan/nsv142

      Sjouwerman, R., Niehaus, J., & Lonsdorf, T. B. (2015). Contextual Change After Fear Acquisition Affects Conditioned Responding and the Time Course of Extinction Learning—Implications for Renewal Research. Frontiers in Behavioral Neuroscience, 9. https://doi.org/10.3389/fnbeh.2015.00337

      Sjouwerman, R., Scharfenort, R., & Lonsdorf, T. B. (2020). Individual differences in fear acquisition: Multivariate analyses of different emotional negativity scales, physiological responding, subjective measures, and neural activation. Scientific Reports, 10(1), 15283. https://doi.org/10.1038/s41598-020-72007-5

      Comment 9:

      There are 24 lines of text of R packages. I do not think this is necessary for the manuscript document and could be moved to the Supplement.

      We thank the reviewer for this comment and understand that it may take a considerable amount of space to list all the references of the R packages. However, we think it is important to prominently credit the respective authors of the R packages. Yet, if this is an important concern of the reviewer and editor, we will reconsider this point.

      Comment 10:

      It is not clear why the authors chose to analyze summary scores across trials rather than including a time factor for the acquisition phase.

      We would like to thank the reviewer for highlighting that the factor time may be interesting as well. However, we think that in our case the time factor is less interesting, as the acquisition effect itself is rather strong. Nevertheless, we have included a figure in the supplement that shows the time course of the SCR by displaying trial-by-trial data across the acquisition and generalization phase for transparency. This figure (Supplementary figure 4) shows that the trajectories appear to barely differ between individuals who were unexposed vs. exposed to moderate childhood adversity. Hence, we think that the analysis approach we have chosen is unlikely to overshadow central time-depending effects. However, if the reviewer and editor has strong feelings about this point, we will consider integrating additional analyses including the time factor in the supplement.

      Results:

      Comment 11:

      The caption of Figure 3 does not match the figure. Please check this.

      We thank the reviewers and editor for attentive reading and have revised this part.

      References:

      Comment 12:

      The Ruge et al paper that is cited many times throughout does not have a valid DOI in the References section. Additionally, the author list on the preprint server is substantially different from that listed in the manuscript. Please correct this reference.

      We thank the reviewers and editor for attentive reading and have corrected this reference. The provided doi was functioning at our end and we hope that this now also applies to the reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As a follow-up to their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well-designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site is not defined.

      We revised the Introduction to include previous findings from our laboratory regarding processing on YTS cells:

      “YTS cells, a key cytotoxic human NK cell line used for these studies, express FcγRIIIa with extensive glycan processing, including the N162 site with predominantly hybrid and complex-type glycoforms {Patel 2021}.” 

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      There are a number of minor weaknesses that should be addressed.

      (1) Since S164A is the only mutant in Figure 1 that seems to improve affinity, even if minimally, it would be a nice reference to highlight that residue in the structural model in panel B.

      We revised Figure 1B to include the S164 site.

      (2) It is confusing why some of the mutants in the study are not represented in Figure 1 panel A. Those affinities and mutants should be incorporated into panel A so the reader can easily see where they all fall on the scale.

      We thank the reviewer for this comment. We restructured the Results section to highlight that a primary outcome of the experiment referenced was to map the contribution of interface residues to antibody binding affinity. These data were not previously available, highlighting hotspots at the interface. Figure 1A and B report these results.

      We then used a subset of mutations from this experiment, as well as a subset of mutations from an additional library containing mutations proximal to the interface, to build a small library for evaluation using ADCC. The complete binding data for all variants, binding to two different IgG1 Fc glycoforms, is presented in Supplemental Table 1. 

      T167Y in particular needs to be shown, as it is one of few mutants that fall between what seems to be ADCC+ and ADCC- lines. Also, that mutant seems to have a stronger affinity compared to wt (judged by panel D), yet less ADCC than wt. This would imply that the relationship between affinity and activity is not as clean as stated, though it is clearly important. Comments about this would strengthen the overall manuscript.

      We thank the reviewer for this particular insight. We agree that the lack of a clean correlation between ADCC potency and affinity implies additional factors that could have affected these experimental results. We added the following sentence to the discussion. 

      “Notably, the ADCC potency for those high-affinity variants does not fall cleanly on a line, indicating that other factors affect our observations, which may include organization at the cell surface, changes to glycan composition, or receptor trafficking.”

      (3) This statement feels out of place: "In summary, this result demonstrates that the sensitivity to antibody fucosylation may be eliminated through FcγRIIIa engineering while preserving antibody-binding affinity." In Figure 2, the authors do indeed show that mutations in FcgRIIIa can alter the impact of IgG core fucosylation, but implying that receptor engineering is somehow translatable or as impactful therapeutically as engineering the antibody itself deflates the real basic science/biochemical impact of understanding these interactions in molecular detail. Not everything has to be immediately translatable to be important. 

      We agree and removed the highlighted sentence.   

      (4) The findings reported in Figure 2, panel C are exciting. Controls for the quality of digestion at each step should be shown (perhaps in supplementary data).

      We agree. We added an example of the digestions as Figure S2.  

      (5) Figure 3 is confusing (mislabeled?) and does not show what is described in the Results. First, there is a F158V variant in the graph but a V158F variant in the text.

      Please correct this. 

      Thank you for identifying this typo. We corrected Figure 3.

      Second, this variant (V158F/F158V) does not show the 2-fold increase in ADCC with kifunesine as stated. 

      Thank you for drawing our attention to this rounding error. We revised the text to report a statistically significant 1.4-fold increase.

      Finally, there are no statistical evaluations between the groups (+/- kif; +/- fucose). 

      We provide the p values for +/-fuc and +/- Kifunensine for each YTS cell line in the figure. We did not provide a global comparison of p values that included all cell lines due to some cell lines experiencing a significant change and others not. However, we added the raw data as Supplemental Table 2 should readers wish to perform these analyses.

      The differences stated are not clearly statistically significant given the wide spread of the data. This is true even for the wt variant.

      We agree that there are points that overlap in this figure between the different treatments. However, our use of the students T-test (two tailed) using three experiments collected on three different days (each with three technical replicates) provides enough resolution to determine the significance of difference of the means for the different treatments. This is, by our estimation, a highly rigorous manner to collect and analyze the data.  

      (6) The kifunensine impact is somewhat confusing. They report a major change in ADCC, yet similar large changes with trimming only occur once most of the glycan is nearly gone (Figure 2). Kifunensine will tend to generate high mannose and possibly a few hybrid glycans. It is difficult to understand what glycoforms are truly important outside of stating that multi-branched complex-type N-glycans decrease affinity.

      Note that Figure 2 does not evaluate the kifunensine-treated glycan, which is mostly Man8 and Man9 structures. In our previous work, these structures likewise provide increased binding affinity (see pubmed ID 30016589). We believe the most important message is that composition of the N162 glycan (removed with the S164A mutation) regulates NK cell ADCC. On cells, we are not able to modulate N162 glycan composition without affecting potentially every other N-glycan on the surface, so we do not have an ADCC experiments that is directly comparable to Figure 2. Thus, this increased ADCC resulting from kifunensine treatment is consistent with previously observed increases in binding affinity measurement.  

      (7) This is outside of the immediate scope, but I feel that the impact would be increased if differences in NK cell (and thus FcgRIIIA) glycosylation are known to occur during disease, inflammation, age, or some other factor - and then to demonstrate those specific changes impact ADCC activity via this mechanism.

      We agree completely. As mentioned in the Introduction, we know that N162 glycan composition varies substantially from donor to donor based on previous work from our

      lab. Curiously, little variability appeared between donors at the other four Nglycosylation sites. Thus, there is the potential that different NK cell N162 glycan compositions are coincident with different indications. This is an area we are quite interested in pursuing.

    2. eLife Assessment

      This study explores the mechanistic link between glycosylation at the N162 site of the Fc gamma receptor FcγRIIIa and the modulation of NK cell-mediated antibody-dependent cytotoxicity. Using innovative isotope labeling strategies and advanced NMR spectroscopy techniques, the authors provide compelling evidence of how glycan composition influences receptor stability and immune function. These findings offer fundamental insights that may contribute to the development of more effective therapeutic antibodies. The manuscript will be of significant interest to immunologists and researchers focused on therapeutic antibody design.

    3. Reviewer #1 (Public review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As follow up of their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site.

    4. Reviewer #2 (Public review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      After revision, which I feel addressed the minor concerns well, the last comment about significance in the long-term is all that remains. Essentially, it will be important in downstream research to determine whether changes in N162 glycan composition ever occur naturally as a result of some factor(s) that include various disease states, inflammation, age, and so on. The answer (either way) does not diminish the importance of understanding molecular details governing antibody-receptor interactions, but it would be very interesting to know if those glycans are regulated in a way that modulates ADCC activity.

    1. eLife Assessment

      This valuable study focuses on gene regulatory mechanisms essential for hindbrain development. Through molecular genetics and biochemistry, the authors propose a new mechanism for the control of Hox genes, which encode highly conserved transcription factors essential for hindbrain development. The strength of evidence is solid, as most claims are supported by the data. This work will be of interest to developmental biologists.

    2. Reviewer #1 (Public review):

      The manuscript by Wang et al. investigates the role of Rnf220 in hindbrain development and Hox expression. The authors suggest that Rnf220 controls Hox expression in the hindbrain through regulating WDR5 levels. The authors combine in vivo experiments with experiments in P19 cells to demonstrate this mechanism. However, the in vivo data does not provide strong support for the claims the authors make and the role of Rnf in Hox maintenance and pons development is unclear.

      While the authors partially addressed some of the issues raised in the first round of reviews, and the in vitro data showing a relationship between Rnf220 and WDR5 is convincing, some issues still remain about the experimental evidence supporting their claims and the relationship of this work with previous studies demonstrating the role of Hox proteins in pontine nuclei in vivo.

      The authors say they were unable to detect Hox levels via in situ hybridization at late embryonic stages, stating that the levels are likely too low to be detected-yet they are presumably high enough to cause ectopic targeting of pontine neurons. Work from the Rijli group, which the authors cite, shows that Hox3-5 paralogs can be clearly detected both by in situ and by staining with commercially available antibodies. Since a major claim of this paper is the upregulation of Hox genes in Rnf220+/- mice through WDR5 regulation, the authors need to show this more convincingly. The inability to detect Hox upregulation, and subsequent rescue, by means other than qPCR in vivo remains a major weakness of the paper. The authors also do not discuss how broad upregulation of all Hox paralogs leads to the changes in PN targeting in the context of previous work.

      The links between Wdr5 expression, epigenetic modifications, Hox expression and axon mistargeting in vivo remains somewhat tenuous. For example, the authors show epigenetic modification changes in some Hox genes, but not Hox5 paralogs, and only show the rescue by Wdr5 KO in vitro. Similarly, they do not attempt to show rescue of axon targeting in vivo after presumably restoring Hox levels by Wdr5 inhibition or knockdown.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) A major issue throughout the paper is that Hox expression analysis is done exclusively through quantitative PCR, with values ranging from 2-fold to several thousand-fold upregulation, with no antibody validation for any Hox protein (presumably they are all upregulated).

      Thank you for your comment.

      We tried to verify the stimulated Hox expression pattern by in situ hybridization. Although in early embryos (E9.5) we could detect clearly hox (i.e. Hox8 and Hox9 in Author response image 1) expression patterns in the neural tube by whole mount in situ hybridization, we failed to detect a clear pattern in the brain stem at E18.5 either in whole mount tissue or on sections. That’s one reason that we turned to single nuclear RNA-seq instead.

      This is likely due to their low expression levels at late developmental stages and need to be detected by more sensitive method. However, we estimated that the stimulated expression levels of the representative Hox genes are at least comparable to the physiological levels at posterior spinal cord to evoke a functional effect.

      Author response image 1.

      Some Hox8 and Hox9 expression pattern in E9.5 embryos.

      (2) In Figure 1, massive upregulation of most Hox genes in the brainstem is shown after e16.5 but the paper quickly focuses on analysis of PN nuclei. What are the other consequences of this broad upregulation of Hox genes in the brainstem? There is no discussion of the overall phenotype of the mice, the structure of the brainstem, the migration of neurons, etc. The very narrow focus on motor cortex projections to PN nuclei seems bizarre without broad characterization of the mice, and the brainstem in particular. There is only a mention of "severe motor deficits" from previous studies, but given the broad expression of Rnf220, the fact that is a global knockout, and the effects on spinal cord populations shown previously the justification for focusing on PN nuclei does not seem strong.

      Thank you for your comment.

      Although RNF220 is important for the dorsal-ventral patterning of the spinal cord as well as the hindbrain during embryonic development, the earlier neural patterning and differentiation are normal in the Rnf220+/- mice (Wang et al., 2022). However, these mice showed reduced survival and motility to various degree postnatally (Ma et al., 2019; Ma et al., 2021), likely suggesting a dosage dependent role of RNF220 in maintaining late neural development. As our microarray assay showed the deregulation of the Hox genes in the brain, we followed this direction in this study and narrowed down the affected region to the pons. Our single nuclear RNA-Seq (snRNA-seq) data further shows that the Hox de-regulation mainly occurred in 3 clusters of neurons. However, the pons is complex and contains tens of nuclei. And the current resolution of our data does not support to assign a clear identity to each of them. Although it is clear that more nuclei are likely affected, the PN (cluster7) is the only cluster we can identify to follow in the current study. 

      As to general effect of RNF220 haploinsufficiency on the brainstem, we carried out Nissl staining assays and found no clear difference in neuronal cell organization between WT and Rnf220+/- pons (revised Figure 2-figure supplement 2).

      (3) It is stated that cluster 7 in scRNA-seq corresponds to the PN nuclei. The modest effect shown on Hox3-5 expression in that data in Figure 1 is inconsistent with the larger effect shown in Figure 2.

      Thank you for your comment.

      Due to the low efficiency of snRNA-seq and the depth of the sequencing, the quantification of the Hox expression based on the snRNA-seq data is likely less accurate as the qRT-PCR. In addition, only mRNAs in the nuclear could be captured by snRNA-seq, while mRNAs in both the nuclear and cytoplasm were reversed-transcribed and examined for qRT-PCR assays in Figure 2A.

      (4) Presumably, Hox genes are not the only targets of Rnf220 as shown in the microarray/RNA-sequencing data. There is no definitive evidence that any phenotypes observed (which are also not clear) are specifically due to Hox upregulation. The only assay the authors use to look at a Hox-dependent phenotype in the brainstem is the targeting of PN nuclei by motor cortex axons. This is only done in 2 animals and there are no details as to how the data was analyzed and quantified. The only 2 images shown are not convincing of a strong phenotype, they could be taken at slightly different levels or angles. At the very least, serial sections should be shown and the experiment repeated in more animals. There is also no discussion of how these phenotypes, if real, would relate to previous work by the Rijli group which showed very precise mechanisms of synaptic specificity in this system.

      Thank you for your comments and suggestions.

      The deregulation of Hox is the most obvious phenomena observed from the RNA-seq data, and we tried to assign its specific phenotypic effect in this study. As the roles of Hox in PN patterning and circuit formation is well established, we focused on the PN in the following study. Based on literature, we carried out the circuit analysis to examine the targeting of PN neurons by the motor cortex axons. A cohort of additional animals with different genotypes (n=10 for WT and n=9 for Rnf220+/-) were used to repeat the experiment and we got the same conclusion. More detailed information on data analysis and serial images were included in the revised manuscript and figure legends.

      (5) The temporal aspect of this regulation in vivo is not clear. The authors show some expression changes begin at e16.5 but are also present at 2 months. Is the presumed effect on neural circuits a result of developmental upregulation at late embryonic stages or does the continuous overexpression in adult mice have additional influence? Are any of the Hox genes upregulated normally expressed in the brainstem, or PN specifically, at 2 months? Why perform single-cell sequencing experiments at 2 months if this is thought to be mostly a developmental effect? Similarly, the significance of the upregulated WRD5 in the pons and pontine nuclei at 2 months in Figure 3 is not clear.

      Thank you for your comment.

      The spatial and temporal expression pattern of Hox genes is established at early embryonic stages and then maintained throughout developmental stage in mammals. As we have shown, the de-repression of Hox genes is a long-lasting defect in Rnf220+/- mice beginning at late embryonic stages. Since the neuronal circuit is established after birth in mice, we speculated that the neuronal circuit defects from motor cortex to PN neurons were due to the long-lasting up-regulation of Hox genes in PN neurons. We could not distinguish the effect on neural circuit a result of Hox genes developmental upregulation or continuous overexpression in adult mice. An inducible knockout mouse model may help to answer this question in the future. The discussion on this point was included in the revised manuscript.

      We carried out snRNA-seq analysis using pons tissues from adult mice aiming to identify the specific cell population with Hox up-regulation, which we failed to specify by in situ hybridization.

      We repeated the related experiments in the original Figure 3 and some of the blot images were replaced and quantified.

      (6) In Figure 3C, the levels of RNF220 in wt and het don't seem to be that different.

      We repeated the experiments and changed the related image in the revised Figure 3C.

      (7) Based on the single-cell experiments, and the PN nuclei focus, the rescue experiments are confusing. If the Rnf220 deletion has a sustained effect for up to 2 months, why do the injections in utero? If the focus is the PN nuclei why look at Hox9 expression and not Hox3-5 which are the only Hox genes upregulated in PN based on sc-sequencing? No rescue of behavior or any phenotype other than Hox expression by qPCR is shown and it is unclear whether upregulation of Hox9 paralogs leads to any defects in the first place. The switch to the Nes-cre driver is not explained. Also, it seems that wdr5 mRNA levels are not so relevant and protein levels should be shown instead (same for rescue experiments in P19 cells).

      Thank you for your comments.

      Since our data suggest that the upregulation of Hox genes expression is a long-lasting effect beginning at the late embryonic stage of E16.5, we conducted the rescue experiments by in utero injection of WDR5 inhibitor at E15.5 and examined the expression of Hox genes at E18.5. Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection is also a long-lasting effect at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. As a supplement, rescue assays with genetic ablation of Wdr5 gene were conducted and the results showed that genetic ablation of a single copy of Wdr5 allele could revere the upregulation of Hox genes by RNF220 haploinsufficiency in the hindbrains at P15.

      Most of the upregulated Hox genes including both Hox9 and Hox3-5 were examined in our rescue experiments. Since this study focuses on the PN nuclei, the results of Hox3-5 genes were shown in the revised main Figure 6.

      We conducted rescue experiments by deleting Wdr5 in neural tissue using Nestin-Cr_e mice because _Wdr5+/- mice is embryonic lethal. And the up-regulation of Hox genes could be also observed in the hindbrains of Rnf220fl/wt; Nestin-Cre mice. Although Rnf220fl/wt; Wdr5fl/wt; Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue of behavior tests was conducted in this study. We believe that it is out of the scope of this study to discuss the role of WDR5 in the development of forebrains.

      The potential defects due to the up-regulation of Hox9 paralogs awaits further investigations.

      Wdr5 mRNA levels were firstly examined to confirm the genetic deletion or siRNA mediated knockdown of Wdr5 genes. We have carried out western blot to examine the WDR5 protein levels and the results were included in the revised Figure 3.

      (8) What is the relationship between Retinoic acid and WRD5? In Figure 3E there is no change in WRD5 levels without RA treatment in Rnf KO but an increase in expression with RA treatment and Rnf KO. However, the levels of WRD5 do not seem to change with RA treatment alone. Does Rnf220 only mediate WDR5 degradation in the presence of RA? This does not seem to be the case in experiments in 293 cells in Figure 4.

      Thank you for your comment.

      We believe that the regulation of WDR5 and Hox expression by RNF220 is context dependent and precisely controlled in vivo, depending on the molecular and epigenetic status of the cell, which is fulfilled by RA treatment in P19 cells. In Figure 4, the experiment is based on exogenous overexpression assays, which might not fully reflect the situation in vivo.

      (9) Why are the levels of Hox upregulation after RA treatment so different in Figure 5 and Figure Supplement 5?

      In Figure.5C, the Hox expression levels were normalized against the control group in the presence of RA; while in Figure Supplement 5 they were normalized to the control group without RA treatment.

      (10) In Figures 4B+C which lanes are input and which are IP? There is no quantitation of Figure 4D, from the blot it does look that there is a reduction in the last 2 columns as well. The band in the WT flag lane seems to have a bubble. Need to quantitate band intensities. Same for E, the effect does not seem to be completely reversed with MG132.

      Thanks for pointing this out. The labels were included in the revised Figure 4B and 4C.

      We repeated the experiments for Figure 4D and 4E. Some of bot images were replaced and quantified in the revised Figure 4D and 4E.

      Reviewer 2:

      (1) Figure 1E shows that Rnf220 knockdown alone could not induce an increase in Hox expression without RA, which indicates that Rnf220 might endogenously upregulate Retinoic acid signaling. The authors should test if RA signaling is downstream of Rnf220 by looking at differences in the expression of Retinaldehyde dehydrogenase genes (as a proxy for RA synthesis) upon Rnf220 knockdown.

      Thank you for your comment and suggestion.

      Two sequential reactions are required for RA synthesis from retinol, which catalyzed by alcohol dehydrogenases (ADHs)/ retinol dehydrogenase (RDH) and retinaldehyde dehydrogenase (RALDHs also known as ALDHs) respectively. When RA is no longer needed, it is catabolized by cytochrome enzymes (CYP26 enzymes) (Niederreither, et al.,2008; Kedishvili et al., 2016). Here, we test ADHs、ALDHs and CYP26 enzymes in E16.5 WT and Rnf220-/- embryos.

      The results are as follows. ADH7 and ADH10 are slightly upregulated. ALDH1 and ALDH3 are upregulated and downregulated in Rnf220-/- embryos, respectively, but there is no significant change in the expression of ALDH2, which plays a key role in RA synthesis during embryonic development (Niederreither, et al.,2008). Furthermore, Cyp26a1 which responsible for RA catabolism was upregulated in Rnf220-/- embryos. Collectively, these data do not support a clear effect on RA signaling by RNF220.  

      Author response image 2.

      The effect of Rnf220 on RA synthesis and degradation pathways

      (2) In Figure 2C-D further explanation is required to describe what criteria were used to segment the tissue into Rostral, middle, and caudal regions. Additionally, it is unclear whether the observed change in axonal projection pattern is caused due to physical deformation and rearrangement of the entire Pons tissue or due to disruption of Hox3-5 expression levels. Labeling of the tissue with DAPI or brightfield image to show the structural differences and similarities between the brain regions of WT and Rnf220 +/- will be helpful.

      Thank you for your comment and suggestion.

      More information on the quantification of the results shown in Figure 2C-D was included in our revised manuscript. We carried out Nissl staining assays using coronal sections of the brainstem and found that there is no significant difference in neuronal cell organization between WT and Rnf220+/- (revised Figure 2-figure supplement 2).

      (3) Line 192-195. These roles of PcG and trxG complexes are inconsistent with their initial descriptions in the text - lines 73-74.

      We are sorry for the mistake. We carefully revised the related descriptions to avoid such mistake. Thank you.

      (4) In Figure 4D, the band in the gel seems unclear and erased. Please provide a different one. These data show that neither Rnf220 nor wdr5 directly regulates Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target. This point should be addressed in the text and discussion section of the paper. example for the same data which shows a full band with lower intensity.

      Thank you for your suggestion.

      We repeated the experiment of Figure 4D and some of the blot images were replaced in the revised Figure 4D.

      Indeed, in the presence of RA, knockdown of Rnf220 alone can upregulate the expression Hox genes (Figure 5C). Knockdown of Wdr5 could reverse the upregulation of Hox genes in RNF220 knockdown cells, suggesting that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (5) In Figure 4G the authors could provide some form of quantitation for changes in ubiquitination levels to make it easier for the reader. They should also describe the experimental procedures and conditions used for each of the pull-down and ubiquitination assays in greater detail in the methods section.

      Thank you for your suggestion.

      The quantitation and statistics for the original Figure 4G were included in the revised Figure 4. More information on the biochemical assays was included in the “Methods and Materials” section of our revised manuscript.

      (6) Figure 5 shows that neither Rnf220 nor wdr5 directly regulate Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target.

      Thank you for your comment.

      In fact, knockdown of Rnf220 alone can upregulate the expression Hox genes in the presence of RA (Figure 5C). Furthermore, knockdown of Wdr5 could reverse the upregulation of Hox genes in Rnf220 knockdown cells, which suggest that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (7) In Figure 6, while the reversal of changes in Hox gene expression upon concurrent Rnf220; Wdr5 inhibition highlights the importance of Wdr5 in this regulatory process, the mechanistic role of wdr5 and its functional consequences are unclear. To answer these questions, the authors need to: (i) Assay for activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 similar to that shown in Figure 3- supplement 1. This will reveal if wdr5 functions according to its intended role as part of the TrxG complex. (ii) The authors need to assay for changes in axon projection patterns in the double knockdown condition to see if Wdr5 inhibition rescues the neural circuit defects in Rnf220 +/- mice.<br />

      Thank you for your suggestion.

      Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection in uetro is also a long-lasting effect for neuronal cirtuit at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. Although Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue effect on defects of behavior and neuronal circuit were examined in this study. Maybe, a PN nuclei specific inducible Cre mouse line could help toward this direction in the future.

      We carried out ChIP-qPCR and tested activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 in P19 cell line and found Rnf220 and Wdr5 double knockdown recured Hox epigenetic modification to a certain degree (Figure 6-figure supplement 1).

      References

      Kedishvili, N.Y. 2016. Retinoic acid synthesis and degradation. Subcell Biochem, 81:127-161. DOI: 10.1007/978-94-024-0945-1_5, PMID: 2783050

      Ma, P., Li, Y., Wang, H., Mao, B., Luo, Z.-G. 2021. Haploinsufficiency of the TDP43 ubiquitin E3 ligase RNF220 leads to ALS-like motor neuron defects in the mouse. Journal of Molecular Cell Biology, 13: 374-382. DOI: 10.1093/jmcb/mjaa072, PMID: 33386850

      Ma, P., Song, N.-N., Li, Y., Zhang, Q., Zhang, L., Zhang, L., Kong, Q., Ma, L., Yang, X., Ren, B., Li, C., Zhao, X., Li, Y., Xu, Y., Gao, X., Ding, Y.-Q., Mao, B. 2019. Fine-Tuning of Shh/Gli Signaling Gradient by Non-proteolytic Ubiquitination during Neural Patterning. Cell Rep, 28: 541-553.e544. DOI: 10.1016/j.celrep.2019.06.017, PMID: 31291587

      Niederreither, K., Dollé, P. 2008. Retinoic acid in development: towards an integrated view. Nat Rev Genet, 9: 541-53. DOI: 10.1038/nrg2340, PMID: 18542081

      Wang, Y.-B., Song, N.-N., Zhang, L., Ma, P., Chen, J.-Y., Huang, Y., Hu, L., Mao, B., Ding, Y.-Q. 2022. Rnf220 is Implicated in the Dorsoventral Patterning of the Hindbrain Neural Tube in Mice. Front Cell Dev Biol, 10. DOI: 10.3389/fcell.2022.831365, PMID: 35399523

    1. eLife Assessment

      This work reports an important new method for activity-dependent neuronal labeling in Drosophila using in situ hybridization, with the potential to establish a new standard in the field. The authors demonstrate the method's applicability by generating compelling evidence of the function of male-specific neurons in both aggression and courtship behaviors. These results and the new method will be of great interest to the neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have nicely demonstrated the efficiency of the HCR v.3.0 using hr38 mRNA expression as a marker of neuronal activity. This is very important in the Drosophila neuroscience field as in situ hybridization in adult Drosophila brains have been so far very challenging to do and replicate. The HCR v.3.0 has been described before [Choi et al., (2018)] and is now the property of the non-profit organization Molecular Technologies, who are the ones responsible for designing the probes. Here, taking advantage of this new FISH method, the authors have demonstrated the use of the FISH to identify neurons activated by a specific behavioral task using hr38 mRNA as a marker of neuronal activation. They named their method HI-FISH.<br /> In addition, based on the catFISH method [Guzowski et al., 1999], the authors were able to distinguish between newly activated neurons (nascent nuclear mRNA) and mature hr38 mRNA showing an earlier activation. They describe this method as HI-catFISH.<br /> Finally, to test what are the neurons activated downstream of their neuronal group of interest, the authors combined the HI-FISH method with optogenetic using chrimson. They named this method opto-HI-FISH.

      Using these three new methods, the authors have addressed the following biological question: are love and aggressiveness neuronally the same in Drosophila?<br /> Here, the authors focused on the male specific P1a neurons which are activated by both an aggressive context (male-male encounter) and sexual context (male female encounter).

      Strengths:

      The demonstration of the efficiency of the method is very convincing and well-performed. It gives the will for the reader to apply the method to their own subject.

      Weaknesses:

      The more neurons are present, the more difficult it is to identify neurons. This is something to take into account when applying these methods.

    3. Reviewer #2 (Public review):

      Summary:

      Watanabe et al. introduce a novel approach for activity-dependent labeling of neural circuits in Drosophila at single-cell resolution, based on detecting the expression of the immediate early gene Hr38 using in situ hybridization. While activity mapping of neurons during specific behaviors is well-established in rodent models, its application in Drosophila has been limited, primarily due to technical constraints. By overcoming these challenges, this study tackles an important and timely issue, providing a foundational tool that will serve as a key reference in the field of circuit neuroscience.

      Strengths:

      The principal strength of this method lies in its versatility and high sensitivity. It can be applied to a broad range of biological questions and enables the investigation of dynamic transcriptional regulation across an unlimited number of genes with a strong signal-to-noise ratio. As such, it holds great potential for widespread use across research labs.

      Weaknesses:

      No major weaknesses; all concerns have been adequately addressed.

    4. Author response:

      Reviewer #1:

      Response to Public Review

      We thank the reviewer for taking the time to carefully read our paper and to provide helpful comments and suggestions, most of which we have incorporated in our revised manuscript.  One of this reviewer’s (and reviewer #2’s) main concerns was that the confocal images provided in some cases did not appear to reflect the quantitative data in the bar graphs.  These images were provided only for illustrative purposes, to give the reader a sense of what the primary data look like. The reviewer may not have appreciated that the quantitative data reflect counts of RNA smFISH signals (dots) in hundreds of cells collected through z-stacks comprising multiple optical sections in multiple flies for each condition  For example, in P1a control condition (in Figure 2A), we have analyzed 135 neurons from 8 individuals. There, the number of z-planes ranged from 3 to 8 per hemisphere. It is generally not possible to find a single confocal section that encompasses quantitatively the statistics that are presented in the graphs. Presenting the data as an MIP (Maximum Intensity Projection, i.e., collapsed z-stack) in a single panel would generate an image that is too cluttered to see any detail.  We have now included, for the reader’s benefit, additional example confocal sections in both a z-stack and from the opposite hemisphere, in Supplemental Figure S4D. We have also inserted clarifying statements in the text on p. 7 (lines 154-156).

      Another suggestion from Reviewer #1 is that "it would be more informative to separate in the quantification between the GAL4-expressing neurons and the non-expressing ones" based on the presented pictures where more non-P1a neurons (that the reviewer speculates may be pC1-type neurons) are activated by a male-male encounter than by a male-female encounter, while the P1a-positive neurons seem to be more responsive during courtship behavior. In this paper, we were not looking at pC1 neurons and did not try to answer which neuronal population(s) outside of the P1a population is/are responsible for aggression and/or courtship. Rather, we focused on P1a neurons and addressed whether P1a neurons that induce both aggression and courtship behavior when they are artificially activated (Hoopfer et al. 2015) are also naturally activated during spontaneous performance of these two social behaviors. However, this result did not exclude the possibility that P1a neurons were inactive during naturalistic courtship or aggression. Our data in the current manuscript provide further experimental evidence in support of the idea that P1a neurons as a population play a role in both of these behaviors. Moreover, we provided data identifying P1a neurons activated only during aggression or during courtship (or both). However this does not exclude that pC1 or other neighboring populations are activated during aggression as well (See also the response to 'Recommendations For The Authors' and text lines 151-154).

      In Figure 3, we used opto-HI-FISH to identify candidate downstream targets (direct or indirect) of P1a neurons. We used 50 Hz Chrimson stimulation to activate P1a neurons to induce expression of Hr38 and identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells. In Figure 3 – supplement we performed calcium imaging of KCs and PAM neurons in response to P1a optogenetic stimulation to confirm independently our results from the Hr38 labeling experiments. That control was the purpose of that supplemental experiment.

      Based on those imaging data, the reviewer asked the further question of which [natural] behavioral context induces Hr38 expression in these populations (i.e., mating or aggression). This question is reasonable because our calcium imaging data (Figure 3-supplement) showed that both Kenyon cells and PAM neurons are active only during photo-stimulation of P1a neurons.  Our previous behavioral studies (Inagaki et al., 2014; Hoopfer et al., 2015) showed that 50 Hz photo-stimulation of P1a neurons in freely moving flies induced unilateral wing extension during stimulation, while aggression was observed only after the offset of the stimulation (Hoopfer et.al., 2015). Based on the comparison of those behavioral data to the imaging results in this paper, the reviewer suggested that Kenyon cells and PAM neurons are activated during courtship rather than during aggression. This is certainly a possible interpretation. However it is difficult to extrapolate from behavioral experiments in freely moving animals to calcium imaging results in head-fixed flies, particularly with response to neural dynamics.  Furthermore, Hr38 expression, like that of other IEGs (e.g., c-fos), may reflect persistently activated 2nd messenger pathways (e.g., cAMP, IP3) in Kenyon cells and PAM neurons that are not detected by calcium imaging, but that nevertheless play a role in mediating its behavioral effects. We still do not understand the mechanisms of how optogenetic stimulation of P1a neurons in freely behaving flies induces aggression vs. courtship behavior. Although 50 Hz stimulation of P1a neurons does not induce aggressive behavior during photo-stimulation, it is possible that this manipulation activates both aggression and courtship circuits, but that the courtship circuit might inhibit aggressive behavior at a site downstream of the MB (e.g., in the VNC). Once stimulation is terminated and courtship stops the fly would show aggressive behavior, due to release of that downstream inhibition (see Models in Anderson (2016) Fig 2d, e). In that case, there would be no apparent inconsistency between the imaging data and behavioral data. We agree that the reviewer's question is interesting and important but we feel that answering this question with decisive experiments is beyond the scope of this manuscript.

      Finally, Reviewer #1 suggested a method to evaluate the Hr38 signals in the catFISH experiment of Figure 4. We appreciate their suggestions, but the way that we evaluated the Hr38 signals was basically the same as the way the reviewer suggested. We apologize for the confusion caused by the lack of detailed descriptions in the original manuscript. We have now revised the methods section to explain more clearly how we define the cells as positive based on Hr38EXN and Hr38INT signals.

      Response to Recommendations for the authors:

      “To strengthen the author's argumentation, I would distinguish in their quantification between gal4+ from the other [classes of neighboring neurons]” (Fig. 2 and 4).”

      Our focus in this paper was to ask simply whether P1a neurons are active or not active during natural occurrences of the social behaviors they can evoke when artificially activated. We did not claim that they are the only cells in the region that control the behaviors.  It is not possible to compare their activation to that of 'other' cells neighboring P1a neurons without a separate marker to identify those cells driven by a different reporter system (e.g., LexA). This in turn would require repeating all of the experiments in Figs 2 and 4 from scratch with new genotypes permitting dual-labeling of the two populations by different XFPs, and quantifying the data using 4-color labeling. We respectfully submit that such curiosity-driven experiments, while in principle interesting, are beyond the scope of the present manuscript.  However, we have inserted text to acknowledge the possibility that the aggression-activated Hr38 signals in P1a- cells neighboring P1a+ cells may correspond to other classes of P1 neurons (of which there are 70 in total) or to pC1 cells. Changes:  Text lines 151-154.

      “if the magenta dot is outside of the nuclei I would not count this as positive also the size of the dot seems to be a good marker of the reality of the signal). I would measure the intensity of the hr38EXN. A high Hr38EXN level associated with the presence of hr38INT would indicate that the cell has been activated during both encounters, while a lower hr38EXN with no hr38INT would suggest only an activation during the 1st behavioural context. Finally, a lower hr38EXN associated with the presence of hr38INT would suggest the opposite, an activation only during the 2nd behaviour.”

      We agree that there are some tiny dot signals with hr38 INT probe that are more likely the background signals. We only counted the INT probe signals as positive when the cells had a clearly visible dot and also co-localize with the exonic probe's signal, as primary (un-spliced) Hr38 transcripts in the nucleus should be positive for both EXN and INT probes. Regarding the reviewer’s latter comments, we agree with their interpretation of the catFISH results and that is how we interpreted them originally. We measured the intensity of hr38EXN expression and defined hr38EXN-labeled cells as “positive” when the relative intensity was 3σ >average, a stringent criterion. In the revised manuscript, we added more detailed information in the methods section regarding our criteria for defining cell types as positive.

      “Knowing that the P1a neurons (using the split-gal4) can trigger only wing extension when activated by optogenetic 50Hz, I would test to which behavioral context the MB neurons and the PAM neurons positively respond to.”

      As we answered in 'Response to Public Review,' our opto-HI-FISH experiments identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells, using Hr38 labeling. The purpose of the calcium imaging experiment in Figure 3 – supplement was to confirm the P1a-dependent activation of KCs and PAM neurons using an independent method. In that respect this control experiment was successful in that methodological confirmation. The reviser raised an interesting question about how our calcium imaging experiments relate to our behavioral experiments, in terms of the dynamics of KC and PAM activation. A recent publication (Shen et al., 2023) revealed that courtship behavior has a positive valence and that activation of P1 neurons mimics a courtship-reward state via activation of PAM dopaminergic neurons. Therefore, it is reasonable to think that PAM neurons (and Kenyon cells as downstream of PAM neurons) are activated during female exposure. However those data do not exclude the possibility that inter-male aggression is also rewarding in Drosophila males, as it has shown to be in mice. This is an interesting curiosity-driven question that has yet to be resolved.  Therefore, as mentioned in the 'Response to Public Review,' we feel that the additional experiment the reviewer suggests is beyond the scope of our manuscript.

      Changes: None.

      Minor comments:

      “Please provide different pictures from main fig2 and sup2 for the three common conditions (control, aggression, and courtship).” 

      The data set for Figure 2 and Figure 2 supplement are from the same experiment. Because of the limited space, we just presented the selected key conditions ('Control', 'Aggression', and 'Courtship') in the main figure and put the complete data set (including these three key conditions) in the supplemental figure.

      Changes: None

      “Please, provide scale bars for the images.”

      Also, Reviewer #2 commented, 'Scale bars are missing on all the images throughout the main and supplementary figures.'

      We have now added scale bars for each figure. 

      “Fig.1: “Is the chrimsonTdtom images from endogenous fluorescence? It is not said in the legend and anti-dsred is not provided in the material and method while anti-GFP is.”

      We are sorry for the confusion and thank the reviewer for raising that question. The signals were native fluorescence, and we have now added that information to the figure legend.

      P7: "As an initial proof-of-concept application of HI-FISH, we asked whether neuronal subsets initially identified in functional screens for aggression-promoting neurons (Asahina et al., 2014; Hoopfer et al., 2015; Watanabe et al., 2017) were actually active during natural aggressive behavior. These included P1a, Tachykinin-FruM+ (TkFruM), and aSP2 neurons". Please put the references to the corresponding group of neurons listed. For example: "These included P1a neurons [Hoopfer et al., 2015]". 

      We have now added these references.

      P9: "Optogenetic and thermogenetic stimulation experiments have shown that that P1a interneurons can promote both male-directed aggression and male- or female-directed courtship" typo

      We appreciate the reviewer for catching this error and have corrected the text.

      (P10:" To validate this approach, we first asked whether we could detect Hr38 induction in pCd neurons, which were previously shown by calcium imaging to be (indirect) targets of P1a neurons". Reference [Jung et al., 2020] 

      We have now added this reference.

      Fig. 4A: Put the time scale on the diagram (3h adaptation-20min-30min rest-20min-10min rest-collect) 

      We have now added the time scale in Figure 4A.

      Reviewer #2: 

      Response to Public Review: 

      We thank the reviewer for their helpful comments and suggestions. We have addressed most of them in our revised manuscript. The main concern of Reviewer #2 was the temporal resolution of the HI-catFISH experiment shown in Figure 4 and Figure 4-Supplement. Our original manuscript illustrated temporal patterns of Hr38EXN and Hr38ITN signals concomitant with different behavioral paradigms (Figure 4B). The reviewer pointed out that the illustrated experimental design does not reflect the actual data shown in Figure 4-Supplement A-C. We believe this issue was raised because we drew the temporal pattern of Hr38EXN signals in Figure 4B based on the intensity of Hr38EXN signals (Figure 4-Supplement B) rather than based on the % number of positive cells (Figure 4-Supplement C). We have now revised the schematic time course of Hr38EXN signals in Figure 4B using the % of positive cells. We believe this change will be helpful for readers to understand better the experimental design since we used the % of positive cells to identify patterns of P1a neuron activation during male-male vs. male-female social interactions in Figure 4D. Another suggestion from Reviewer #2 was to add additional controls, such as the quantification of the intronic and exonic Hr38 probes after either only the first or second social context exposure. In response, we have now added the data from only the first social context (Figure 4C, and 4D, right column). These new data provides evidence that there are essentially no detectable Hr38INT signals 60 minutes later without a second behavioral context, while Hr38EXN signals are still present at the time of the analysis.  Unfortunately, we are not able to provide the converse dataset with the second behavioral context only to show that Hr38 INT signals are detected. On this point, we call the reviewer’s attention to Figure 4-supplement-S4A-C, which show that the INT probe signals are detectable at 15 and 30 minutes following stimulation, but not at 60 minutes.  In the experiment of Fig. 4B, flies are fixed and labeled for Hr38 30 minutes after the beginning of the second behavior, conditions under which we should obtain robust INT signals (as observed).  EXN signals are also expected at 30 minutes because the primary (non-spliced) RNA transcript detected by the INT probe also contains exonic sequences.

      Response to Recommendations for the authors:

      Given that the development of in situ HCR for the adult fly brain is so central to the present manuscript, I think that the methods section describing the HCR protocol can be significantly improved. In particular, the authors should fully describe the in situ HCR protocol including the 'minor modifications' they refer to, and define how they calculate the 'relative intensity to the background'.

      We appreciate the reviewer’s suggestion. We have now revised the methods section to describe the procedure in more detail. Also, we will submit a separate document describing the HI-FISH protocol.

      Note: The authors refer to a recently published paper by Takayanagi-Kiya et al (2023) describing activity-based neuronal labeling using a different immediate early gene, stripe/egr-1. The authors state the following: 'That study used a GAL4 driver for the stripe/egr-1 gene to label and functionally manipulate activated neurons. In contrast, our approach is based purely on detecting expression of the IEG mRNA using..'. Takayanagi-Kiya et al. (2023) also use in situ mRNA detection of the IEG stripe/egr-1 and not only a GAL4 driver system. This claim should be modified and the paper should be cited in the introduction of the present paper.

      We have now cited the paper in the Introduction and have modified and moved the description originally in 'Note' section to Discussion (text lines: 392-404) as the reviewer requested. We have emphasized the difference between the two approaches for comparing neuronal activities during two different behaviors within the same animal. Takayanagi-Kiya used GAL4/UAS and stripe protein expression with immunohistochemistry to analyze neuronal activities during two different behaviors, while we exclusively analyzed Hr38 mRNA expression for this purpose, using intronic and exonic Hr38 probes. This approach made it possible to perform catFISH with higher temporal resolution and also allows extension of our approach to other IEGs for which antibodies are not available.

      Please specify the nature of the iron fillings in the methods section.

      We added a detailed description in the methods section, including the catalog number.

      In Figure 1B, the authors may add a dashed outline to the regions magnified in 1C so that readers can more easily follow the figures. Moreover, it would be informative to see a more detailed quantification of the number of Hr38-positive cells in different brain regions marked by Fru-GAL4.

      We have now added the whole brain images for each condition in Figure 1C and also quantitative data in Figure 1-Supplement C, as the reviewer suggested.

      In the middle right aggression panel of Figure 2A, it looks as if one P1a neuron is not outlined.

      We have carefully examined other z-planes through this region and based on those data have concluded that the signals mentioned by the reviewer are neurites from neurons labeled in other z-planes.

      Changes: None.

      The images in Figure 2A can be again found in Figure Supplement 2A, yet the number of neurons analyzed suggests the quantification was performed from different samples. The images in Figure Supplement 2A should be either changed or it should be explained as to why the images are the same yet the numbers in the legend are different.

      We apologize for the confusion. Figure 2 and Figure 2-Supplement are from the same experiment. To avoid clutter we illustrated three key conditions ('Control,' 'Aggression,' and 'Courtship') in the main figure. The reason why the numbers in the legend are different is that the purpose of presenting Figure 2-Supplement B-D was to determine whether there were differences in the intensity of Hr38 FISH signals in the neurons considered as 'positive' in different conditions. Therefore, the numbers described in Figure 2-Supplement legend are derived only from those neurons that were considered Hr38-positive, while the numbers in Figure 2 include all neurons analyzed. We have now added notes to explain this in the Figure 2 – supplement legend.

      The panels of the quantification of the Hr38 relative intensity in Figure 2B/C/D are very difficult to read, ideally, they should be plotted as in Figure Supplement 2B/C/D.

      The graphs in Figure 2B-D (upper) show data from all GFP-labeled cells scored, including cells defined as 'negative' or 'borderline.' In contrast, the graphs in Figure 2-supplement show the relative Hr38 signal intensity in those GFP neurons defined as positive based on the analysis in Fig. 2B. If we were to plot the data in Fig. 2B (upper) as box plots (like that in Figure-2-supplement), we would see either a skewed (only negative cells) or a bimodal distribution (one around the negative population and the other around the positive population); the shapes of these distributions would likely be hidden in the box-whisker plots format. Therefore, we prefer to plot all of the data points as we did in the original manuscript. However, we agree that the data points in the original manuscript were hard to read. We therefore changed the format of the datapoints from blurry dots to open circles with clear solid lines.

      In Figure 2B/C/D, please specify in the figure legend what 'grouped in categories according to character' means. 

      We used letters to mark statistically significant differences (or lack thereof) between conditions. Bars sharing at least one common letter are not significantly different.  If they do not share any letter, they are significantly different. For example, Aggression: bc vs. Dead: bc, means no difference. Aggression: bc vs. No Food: b, or Aggression: bc vs. Courtship: c also means no difference between Aggression and each of the two other conditions. However, 'No Food: b' and 'Courtship: c' have no common letter, meaning they are different. This is a standard method for showing statistically comparisons among multiple bars without lots of asterisks and horizontal bars cluttering the figure, and we have revised the legend to clarify what each letter means. We have also removed the color shading in Figure 2 B-D as it may have been confusing.

      A quantification of the number of Hr38-positive neurons and Hr38 relative intensity during the entire time course would be informative in Figure 3D. 

      Although the data set for this figure is different from that for Figure 4-Supplement A-C, the main claim is the same. Therefore, Figure 4 - Supplement essentially provides the information that the reviewer suggested. However, we also reanalyzed the data set used for the original Figure 3D and evaluated % positive cells at the 30-minute time point and have now added that number in the figure legend.

      In the legend of Figure 3D, it says '..The expression level reaches its peak at 30-60min', yet I don't see timepoints beyond 60min. Please rephrase or add additional timepoints. 

      We apologize for the error. We have rephrased the text.

      Figure Supplement 3A/D: please add an outline or a schematic figure to better understand where the imaging is performed.

      We added illustrated schemas next to the title of each experiment (P1->PAM neurons (bundle) and P1 -> Kenyon cells (bundle)).

      Figure Supplement 3C/F: please add information about the statistical test to the corresponding figure legend.

      We have added a phrase to describe the test used.

      Figure Supplement 3G/H/I/J: motion artifacts can potentially strongly affect the performed analysis given that cell bodies are very small and highly subjected to motion. Can the authors comment on how they corrected for motion?

      We have now described how we corrected for motion artifacts in the Methods section.

      Figure 4C/D: It seems as if the representative images don't reflect the quantification, e.g., in the male -> female panel, close to 100% of the neurons are positive for the exonic probe as opposed to approx. 40% in the bar graph.

      Please see our response to this issue in the 'Response to Public Review (Reviewer #1)'.

      Additional controls should be included in Figure 4C in order to assess the temporal resolution of HI-CatFISH more in detail (see 'Weaknesses').

      We have also answered this in the 'Response to Public Review'.

      The authors should adjust the scheme in the main Figure 4B to reflect the data presented in Figure S4A and C. For instance, the peak for the intronic version is observed at 15 minutes, while at 30 minutes, both the exonic and intronic signals show an equal level of signal.

      We have addressed this issue in the 'Response to Public Review'.

      We thank the reviewers again for their helpful comments and hope that with these changes, the manuscript will now be acceptable for official publication in eLife.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors study the variability of patient response of NSCLC patients on immune checkpoint inhibitors using single-cell RNA sequencing in a cohort of 26 patients and 33 samples (primary and metastatic sites), mainly focusing on 11 patients and 14 samples for association analyses, to understand the variability of patient response based on immune cell fractions and tumor cell expression patterns. The authors find immune cell fraction, clonal expansion differences, and tumor expression differences between responders and non-responders. Integrating immune and tumor sources of signal the authors claim to improve prediction of response markedly, albeit in a small cohort.

      Strengths:

      The problem of studying the tumor microenvironment, as well as the interplay between tumor and immune features is important and interesting and needed to explain the heterogeneity of patient response and be able to predict it.

      Extensive analysis of the scRNAseq data with respect to immune and tumor features on different axes of hypothesis relating to immune response and tumor immune evasion using state-of-the-art methods.

      The authors provide an interesting scRNAseq data set linked to outcomes data.

      Integration of TCRseq to confirm subtype of T-cell annotation and clonality analysis.

      Interesting analysis of cell programs/states of the (predicted) tumor cells and characterization thereof.

      Weaknesses:

      Generally, a very heterogeneous and small cohort where adjustments for confounding are hard. Additionally, there are many tests for association with outcome, where necessary multiple testing adjustments would negate signal and confirmation bias likely, so biological takeaways have to be questioned.

      Thank you for your comment. We made multiple testing adjustments as suggested in “Recommendations for Authors.”

      RNAseq is heavily influenced by the tissue of origin (both cell type and expression), so the association with the outcome can be confounded. The authors try to argue that lymph node T-cell and NK content are similar, but a quantitative test on that would be helpful.

      Following the reviewer’s suggestion, we performed principal component analysis (PCA) to assess the influence of tissue of origin on immune and stromal cell populations. In the revised Figure S1g, we quantified the similarity using Euclidean distances of centroids between sample groups based on their tissue of origin in the PC1 and PC3 plot.

      The authors claim a very high "accuracy" performance, however, given the small cohort and lack of information on the exact evaluation it is not clear if this just amounts to overfitting the data.

      We acknowledge the concern about the high “accuracy” potentially indicating overfitting. To address this, we revised the manuscript to clarify the use of 'accuracy,' 'AUC,' and 'performance' with clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.

      Especially for tumor cell program/state analysis the specificity to the setting of ICIs is not clear and could be prognostic.

      Thank you for your comments. As outlined in the ‘Table 2 in the revised manuscript’, we conducted a multivariate survival analysis of tumor signature candidates using the TCGA lung adenocarcinoma (LUAD, n = 533) and squamous cell carcinoma (LUSC, n = 502) cohorts to evaluate their prognostic potential. No tumor cell programs or states were found to be associated with overall survival in either LUAD or LUSC. We added descriptions related to Table 2 in the Results (Lines 249-251) and Methods (Lines 530-542) section.

      Due to the small cohort with a lot of variability, more external validation is needed to be convincingly reproducible, especially when talking about AUC/accuracy of a predictor.

      Expanding the cohort size was difficult due to limited resources. We recognize the challenges posed by the small and heterogeneous cohort. We have acknowledged these limitations and applied statistical corrections to address them.

      Reviewer #2 (Public Review):

      Summary:

      The authors have utilised deep profiling methods to generate deeper insights into the features of the TME that drive responsiveness to PD-1 therapy in NSCLC.

      Strengths:

      The main strengths of this work lie in the methodology of integrating single-cell sequencing, genetic data, and TCRseq data to generate hypotheses regarding determinants of IO responsiveness.

      Some of the findings in this study are not surprising and well precedented eg. association of Treg, STAT3, and NFkB with ICI resistance and CD8+ activation in ICI responders and thus act as an additional dataset to add weight to this prior body of evidence. Whilst the role of Th17 in PD-1 resistance has been previously reported (eg. Cancer Immunol Immunother 2023 Apr;72(4):1047-1058, Cancer Immunol Immunother 2024 Feb 13;73(3):47, Nat Commun. 2021; 12: 2606 ) these studies have used non-clinical models or peripheral blood readouts. Here the authors have supplemented current knowledge by characterization of the TME of the tumor itself.

      Weaknesses:

      Unfortunately, the study is hampered by the small sample size and heterogeneous population and whilst the authors have attempted to bring in an additional dataset to demonstrate the robustness of their approach, the small sample size has limited their ability to draw statistically supported conclusions. There is also limited validation of signatures/methods in independent cohorts, no functional characterization of the findings, and the discussion section does not include discussion around the relevance/interpretation of key findings that were highlighted in the abstract (eg. role of Th17, TRM, STAT3, and NFKb). Because of these factors, this work (as it stands) does have value to the field but will likely have a relatively low overall impact.

      We acknowledge the challenges posed by the small and heterogeneous cohort. To address this, we tempered our claims related to accuracy by applying statistical testing corrections. We also appreciate the feedback on functional characterization and have expanded the discussion in the revised manuscript to include an overview of specific cell populations and genes.

      Related to the absence of discussion around prior TRM findings, the association between TRM involvement in response to IO therapy in this manuscript is counter to what has been previously demonstrated (Cell Rep Med. 2020;1(7):100127, Nat Immunol. 2017;18(8):940-950., J Immunol. 2015;194(7):3475-3486.). However, it should be noted that the authors in this manuscript chose to employ alternative markers of TRM characterisation when defining their clusters and this could indicate a potential rationale for differences in these findings. TRM population is generally characterised through the inclusion of the classical TRM markers CD69 (tissue retention marker) and CD103 (TCR experienced integrin that supports epithelial adhesion), which are both absent from the TRM definition in this study. Additional markers often used are CD44, CXCR6, and CD49a, of which only CXCR6 has been included by the authors. Conversely, the majority of markers used by the authors in the cell type clustering are not specific to TRM (eg. CD6, which is included in the TRM cluster but is expressed at its lowest in cluster 3 which the authors have highlighted as the CD8+ TRM population). Therefore, whilst there is an interesting finding of this particular cell cluster being associated with resistance to ICI, its annotation as a TRM cluster should be interpreted with caution.

      Single-cell RNA sequencing (scRNA-seq) can sometimes fail to detect the expression of classical cell type markers due to incomplete capture of a cell’s transcriptome. To determine cell identity, we utilized cell type markers established in previous scRNA-seq studies. In response to your comments, we have added the expression levels of classical TRM markers, including CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Although these markers were not exclusively expressed in TRM clusters, TRM clusters exhibited relatively high levels of these genes while lacking other clusters’ specific marker genes.

      Reviewer #1 (Recommendations For The Authors):

      General suggestions:

      When analyzing the association of cell type proportions with outcomes, some adjustment for multiple testing should be considered (either sampling-based, e.g. permutation test, or adjustment based on assumptions of independence of tests, e.g. Bonferroni).

      Thank you for your comments. As suggested, we calculated the adjusted p-value using the False Discovery Rate for the association of cell type proportions with outcomes in Figure 3a. The heatmap in Reviewer's ONLY Figure 1, using the adjusted p-value consistently showed the expected grouping of cell types and outcomes. However, the significance did not meet the conventional statistical cutoff criteria. We acknowledge this limitation, which results from statistical testing based on ratio values.

      Author response image 1.

      Heat map with unsupervised hierarchical clustering of proportional changes in cell subtypes within total immune cells. Proportional changes were compared across multiple ICI response groups. The color represents the adjusted -log (p-value) calculated using the False Discovery Rate.

      A formal test of clonotype differences (normalized to cell type fraction) would be great as the shown plot 2e could be confounded by cell number and type differences between responders and non-responders.

      Thank you for your suggestion. We have revised Figure 2e to display the relative clonotype differences versus CD4+ and CD8+ T cell fractions in each sample. The relative clone size of each cell was calculated by dividing the size of each clone by the total number of CD4+ or CD8+ T cells, respectively.

      It could be made a bit more clear when the core group of patients was used (only when associating with outcomes?) and when all other patients were used as well (only cell type annotation?).

      As the reviewer correctly noted, we performed scRNA-seq analysis on all specimens, but only the core group of patients was used for the comparative analysis between the responder and non-responder groups. This information has been detailed in the manuscript (Lines 103-105).

      For immune cells, it would be interesting to look at expression patterns (NMF, scINSIGHT) as well, not just immune cell fractions and expansion.

      In contrast to tumor signatures, immune cell programs are more directly tied to their functional characteristics. Therefore, we focused on annotating immune cells based on their functional properties and conducted comparative analyses between responders and non-responders.

      Multiple testing is necessary for the univariate association analysis. Some adjustments for confounders in a multivariate model (despite the size) could be informative.

      As shown in ‘Reviewer's ONLY Table 1’, we conducted a multivariate regression analysis of immune and tumor signatures for ICI response, adjusting for clinical variables such as tissue origin, cancer subtype, pathological stage, and smoking status. However, the results were not significant, likely due to the heterogeneity and small size of the cohort.

      Author response table 1.

      P-values from univariate and multivariate regression analysis of immune and tumor signatures for ICI response.

      It is not clear from the manuscript how "accuracy" is measured. The terms "accuracy" and "AUC", as well as "performance" are used interchangeably, a section in the methods with the precise definition is needed.

      We have revised the manuscript to clarify the terms 'accuracy,' 'AUC,' and 'performance' by using clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.

      Furthermore, it has to be clear if this is in-sample performance or if there was some train/test split or cross-validation used. Given the small cohort size and wealth of features finding some combination of predictors that could overfit on responders/non-responders would not be surprising.

      As the reviewer has noted, we acknowledge the statistical limitations due to the small cohort size. We have revised the sentence on Lines 545-547 “Classification models of responders and non-responders for PC signatures and combinatorial indexes between tumor and/or immune cells were generated based on in-sample performance…”.

      Suggestions to improve readability:

      Line 84: The sentence should be reformulated to improve understanding.

      We have revised sentences in lines 81-93.

      Line 86: missing a "the".

      We have revised the sentences in lines 81-93.

      Reviewer #2 (Recommendations For The Authors):

      "Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells" Please look to rephrase this sentence as this is not entirely accurate: PD-1 is upregulated in tumor-experienced T cells as a consequence of antigen recognition ie those cells that recognise tumor will increase PD-1, whereas the sentence as it's currently written indicates that PD1+ cells have an intrinsically increased capacity to kill tumors, which is incorrect.

      We have revised the sentence “Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells” in lines 86-88 as “More specifically, PD-1 expression is upregulated upon antigen recognition (PMID29296515), indicating that certain T cells in the tumor microenvironment are actively engaged as tumor-specific T cells.” in the revised manuscript.

      Cancer subtype abbreviations (eg. SQ, ADC, NUT) are used in figures in the main article and so should be defined in the main text (they are currently only explained in the legend for the supplementary table).

      As per the reviewer’s suggestion, the manuscript has been revised to include definitions of cancer type abbreviations in lines 108-110.

      Figure S1d-f does not appear to corroborate the statement that "Although there were differences in tissue-specific resident populations, we found that the immune cell profiles, especially T/NK cells of mLN were similar to those of primary tumor tissues indicating the activation of immune responses were 118 consistently observed at metastatic sites (Figure S1d-f)." The diagrams are complex (please explain all abbreviations) and it is not clear how the authors have come to this conclusion. Additionally, cell quantity does not indicate that the 'activation of immune responses' is consistently observed at metastatic sites as these cells could be dysfunctional/bystander.

      In the revision, we have quantified the diagrams (Figure S1f) to more clearly highlight the differences in tissue-specific resident populations. We performed principal component analysis (PCA) to evaluate the impact of tissue origin on immune and stromal cell populations. In the revised Figure S1g, we illustrated the quantitative similarity between sample groups using Euclidean distances in the PC plot based on their tissue of origin. Additionally, the legends for Figures S1d and S1e have been updated to include definitions for all abbreviations.

      We agree with the reviewer's comment that cell quantity alone may not fully reflect activation of antigen-specific immune responses, even though we annotated the functional T cell subtypes. To better focus on the comparisons of cellular profiles between metastatic sites (mLN) and primary tumors (tLung and tL/B), we removed the sentence “…indicating the activation of immune responses were consistently observed at metastatic sites (Fig. S1d-f).” from the revised manuscript.

      In Figure 2c, classical markers for TRM (CD103, CD69) should be included in the description for the definition of the TRM clusters, or their exclusion appropriately explained. The findings regarding the negative correlation between follicular B cells and ICI response are surprising. Figure S3, the cluster identified as Follicular B cells contains MS4A1 (CD20) and HLA-DRA. Classical markers are CD20 (pan-B cell), CD21 (CR2), CD23, and IgD/IgM (double positive), and as such it is not clear if the authors have appropriately annotated this cluster as representing follicular B cells. These classical markers should be included in the interpretation of the cell clustering or their exclusion appropriately explained.

      We appreciate your comments. In response, we have added the expression levels of classical TRM markers such as CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Additionally, we revised the dot plot showing the mean expression of marker genes in each cell cluster for B/Plasma cells (revised Figure S3b) by incorporating classical markers for Follicular B cells, such as CD21 (CR2), CD23 (FCER2), IgD (IGHD), IgM (IGHM).

      Figure 2f is rather confusing for the reader. I would recommend changing to an alternative plot that shows logP and response in a different way. If keeping to this plot type please clarify why plotting response vs PD, and whether the lower left quadrant indicates patients with progressive disease and the top right indicates responders as the interpretation is not clear currently.

      Thank you for your feedback. To address the concerns raised, we have updated the figure legend for Figure 2f to clarify the interpretation of the quadrants: “The lower left quadrant shows cell types overrepresented in the poor responder groups, while the upper right quadrant indicates cell types overrepresented in the better responder groups”. This clarification aims to help readers understand that the lower left quadrant reflects cell types associated with worse treatment outcomes, while the upper right quadrant reflects cell types associated with improved therapeutic responses.

      The terms "PC7.neg, INT.down, and UNION.down" are included in the results with no explanation to the reader of what they are or how to interpret them. The methods description "We constructed DEGs with 470 intersections (INT) and union (UNION) of up- or down-regulated genes for comparisons" does not sufficiently describe how they were generated/calculated and, therefore, this is difficult for the reader to interpret in the final results section. Please add an additional explanation for the reader in the final section of the results/Figure 5 and in the methods.

      Following the reviewer’s suggestion, we added additional explanation in the Results section (lines 258-261): “PC7.neg denotes genes negatively correlated with PC7, a principal component extracted from PCA that distinguishes tumor cells in poor response groups. INT.down and UNION.down represent the intersection and union of down-regulated genes in the responder group, respectively.”. We also explained the details in the Methods section (lines 489-495): “We reconstructed DEGs as four groups: INT.up, INT.down, UNION,up, and UNION.down, based on with the intersection (INT) and union (UNION) of up- or down-regulated genes for pairwise comparisons between responder versus non-responder, PR versus PD, and PR versus SD. INT.up and INT.down represent the intersection of up- and down-regulated genes in the responder group, respectively. UNION.up and UNION.down represent the union of up- and down-regulated genes in the responder group, respectively.”

      The TRM and Th17+ T cell populations are highlighted in the abstract as being related to ICI resistance, but these populations of cells are not even mentioned in the discussion. Likewise, STAT3 and NFkb pathways are also highlighted in the abstract but absent in the discussion section. Please discuss the relevance of these findings, particularly given the prior studies demonstrating the opposite impact of TRM populations in NSCLC.

      We have expanded the discussion in the revised manuscript (Lines 295-313) to address the roles of TRM and Th17+ T cell, as well as the STAT3 and NF-κB pathways, in association with ICI resistance in NSCLC.

      “The identification of an abundance of CD4+ TRM cells as a negative predictor of ICI response is an unexpected finding, considering that higher frequencies of TRM cells in lung tumor tissues are generally associated with better clinical outcomes in NSCLC (PMID28628092). This is largely due to their role in sustaining high densities of tumor-infiltrating lymphocytes and promoting anti-tumor responses. Additionally, previous studies have demonstrated that TRM cell subsets coexpressing PD-1 and TIM-3 are relatively enriched in patients who respond to PD-1 inhibitors (PMID31227543). However, recent findings suggest that pre-existing TRM-like cells in lung cancer may promote immune evasion mechanisms, contributing to resistance to immune checkpoint blockade therapies (PMID37086716). These observations suggest that the roles of TRM subsets in tumor immunity are highly context-dependent.

      Similarly, CD4+ TH17 cells, which were overrepresented in the non-responder groups, exhibit context-dependent roles in tumor immunity and may be associated with both unfavorable and favorable outcomes (PMID34733609; PMID30941641). In exploring tumor cell signatures linked to ICI response, non-responder attributes were regulated by STAT3 and NFKB1. The STAT3 and NF-κB pathways are crucial for Th17 cell differentiation and T cell activation (PMID24605076; PMID32697822). Notably, STAT3 activation in lung cancer orchestrates immunosuppressive characteristics by inhibiting T-cell mediated cytotoxicity (PMID31848193). The combined influence of the Th17/STAT3 axis and TRM cell activity in predicting ICI response underscores the complexity of these pathways and suggests that their roles in tumor immunity and therapy response warrants further investigation.”

    2. eLife Assessment

      The authors utilized single-cell RNA-seq profiling of non-small cell lung cancer (NSCLC) patient tumor samples to generate useful insights into the determinants of immune checkpoint inhibitor (ICI) responsiveness in NSCLC patients. While some of the findings add weight to the current literature, the analysis is incomplete due to the small cohort size and heterogeneous population which has limited their ability to draw statistically supported conclusion after adjusting for multiple hypothesis testing, as well as the lack of functional characterization of the findings. This study would benefit from external cohorts to both validate the findings and justify the statistical analysis undertaken.

    3. Reviewer #1 (Public review):

      Summary:

      The authors study the variability of patient response of NSCLC patients on immune checkpoint inhibitors using single-cell RNA sequencing in a cohort of 26 patients and 33 samples (primary and metastatic sites), mainly focusing on 11 patients and 14 samples for association analyses, to understand the variability of patient response based on immune cell fractions and tumor cell expression patterns. The authors find immune cell fraction and clonal expansion differences, as well as tumor expression differences between responders and non-responders, partly validating previous hypotheses, and partly suggesting new markers for ICI response. Integrating immune and tumor sources of signal the authors claim to improve prediction of response markedly, albeit in a small cohort and using in-sample metrics.

      Strengths:

      - The problem of studying the tumor microenvironment, as well as the interplay between tumor and immune features is important and interesting and needed to explain heterogeneity of patient response and be able to predict it.<br /> - Extensive analysis of the scRNAseq data with respect to immune and tumor features on different axes of hypothesis relating to immune response and tumor immune evasion using state of the art methods.<br /> - The authors provide an interesting scRNAseq data set with well-curated cell types linked to outcomes data, which is valuable<br /> - High-quality immune cell type annotation including annotations based on additional ADT data<br /> - Integration of TCRseq to confirm subtype of T-cell annotation and clonality analysis<br /> - Interesting analysis of cell programs/states of the (predicted) tumor cells and characterization thereof

      Weaknesses:

      - Generally a very heterogeneous and small cohort where adjustments for confounding is hard. Additionally, there are many tests for association with outcome, where necessary multiple testing adjustments negate signal and confirmation bias likely, so biological take-aways have to be questioned.<br /> - The authors claim a very high "accuracy" performance, however given the small cohort and possible overfitting due to in-sample ROC the generalization of this to other cohorts is questionable.<br /> - Due to the small cohort with a lot of variability, more external validation is needed to be convincingly reproducible, especially when talking about AUC/accuracy of a predictor.

    4. Reviewer #2 (Public review):

      Summary:

      The authors have utilised deep profiling methods to generate deeper insights into the features of the TME that drive responsiveness to PD-1 therapy in NSCLC.

      Strengths:

      The main strengths of this work lie in the methodology of integrating single cell sequencing, genetic data and TCRseq data to generate hypotheses regarding determinants of IO responsiveness.

      Some of the findings in this study are not surprising and well precedented eg. association of Treg, STAT3 and NFkB with ICI resistance and CD8+ activation in ICI responders and thus act as an additional dataset to add weight to this prior body of evidence. Whilst the role of Th17 in PD-1 resistance has been previously reported (eg. Cancer Immunol Immunother 2023 Apr;72(4):1047-1058, Cancer Immunol Immunother 2024 Feb 13;73(3):47, Nat Commun. 2021; 12: 2606 ) these studies have used non-clinical models or peripheral blood readouts. Here the authors have supplemented current knowledge by characterization of the TME of the tumor itself.

      Weaknesses:

      Unfortunately, the study is hampered by the small sample size and heterogeneous population and whilst the authors have attempted to bring in an additional dataset to demonstrate robustness of their approach, the small sample size has limited their ability to draw statistically supported conclusions. There is also limited validation of signatures/methods in independent cohorts and no functional characterisation of the findings. Because of these factors, this work (as it stands) does have value to the field but will likely have a relatively low overall impact.

    1. eLife Assessment

      The manuscript by Kolbeck and co-workers is an important contribution to understanding the physical mechanism that controls a key step in the retroviral infectious cycle. The authors employ a wide range of experimental techniques, complemented with Montecarlo simulations, that result in convincing evidence of compaction of HIV DNA by the viral integrase. This manuscript would benefit from in-depth discussion and analysis of the biophysical implications of the results.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the compaction of HIV DNA by the viral enzyme integrase (IN) in vitro.

      Strengths:

      The authors employ robust techniques, including single-molecule force microscopy and spectroscopy, to investigate the impact of IN-DNA interactions on DNA conformation. Additionally, they interpret their experimental findings using coarse-grained Monte Carlo simulations.

      Weaknesses:

      The authors could provide a more in-depth discussion of the biophysical reasons behind their experimental observations. Currently, there is insufficient analysis to explain why certain behaviors are observed experimentally.

    3. Reviewer #2 (Public review):

      Summary:

      This is a high-quality biophysical study providing valuable new in vitro information on the modes of HIV-1 integrase protein (IN) interaction with the double stranded (ds)DNA.

      Strengths:

      Both main experimental approaches used in this study: magnetic tweezers (MT) and atomic force microscopy (AFM) are used at the state-of-the-art level.

      Weaknesses:

      (1) The findings of Fig.1 suggest modest preference of IN oligomers for the processed DNA ends typical of the viral dsDNA in the intasome and the DNA with blunt ends relative to the IN-oligomer binding to the random internal sites on DNA. This is an impressive result. Is it completely new? What was known about it? Can IN oligomer bind and unbind on the time of experiment? Is it an equilibrium preference? Was the effect of Mg2+ in that binding known?

      (2) Regarding the AFM-observed IN-induced DNA bending and looping. How defined is the DNA crossover angle in the looped state? How many IN molecules typically hold it together? What density of IN per DNA length is needed to observe formation of IN oligomers, and their induced DNA beds and loops? It looks like more information on the two dsDNA crossover points held together by IN oligomers can be obtained from the AFM images, similar to the ones in Fig. S22. In particular, the preferred crossover angle (similar to bending angel of one DNA) and the total number of IN proteins within the oligomer holding this crossover point together can be extracted from the AFM data at higher resolution.

      (3) Similarly, questions for Fig.3. What is the typical binding density (i.e. IN per DNA unit length) required for the IN-induced rosette formation? For the IN-induced 3D condensation? I understand that the AFM is not the good method to estimate the protein:DNA stoichiometry, as the mica surface and its treatment affect the protein/DNA interactions compared to the bulk solution. But still, in combination with the MT data there should be at least approximate estimate of the degree of DNA saturation. With IN oligomers that cause these sharp cooperative structural transitions of the complex. The fact that higher salt increases critical concentration of IN for these transitions is consistent with the critical levels of DNA saturation with IN required for each transition. Also, the fact that the rosette formation is not observed on shorter 3Kbp DNA but is observed on longer 4.8Kbp and 9Kbp comes from the lower probability of looping in the shorter DNA and can be discussed/interpreted. Maybe the persistence length of the DNA/IN complex at this level of its saturation can be estimated from these data. This persistence length should be shorter than for the bare DNA, as the IN binding induces DNA bending.

      (4) In the section describing the simulations of the IN-induced dsDNA compaction the authors introduce a very simple model in which IN tetramer is presented as a bead of the size of ~12 bp similar to the binding site size of the singe IN on DNA with the four binding sites for DNA. It would be useful to discuss the published experimental structural data on the IN-DNA complexes available to better rationalize this choice of the model. In general, more overview of the available information on IN-DNA complexes and discussion of how present results fit into the general story and add to it would be useful. The authors fit their modeling results to their experimental data to obtain the individual monomeric IN-DNA interaction strength of 5 kBT. What is the geometry of these for DNA binding sites on the IN tetramer? Is it important for the complex structure? Also, the authors mention that the additional IN-IN interactions are required to reproduce their AFM results. What is the geometry and the strength of these interactions? It should matter for the structure of the IN-DNA aggregate. For example, if the IN molecules or DNA-bound oligomers were only interacting head-to-tail on the DNA that they bind to, it would lead to the filament formation, rather than the 3D condensate. What was the density of the IN oligomers on DNA to lead to each of the two AFM-observed transitions: (i) the "rosette formation" and (ii) the denser 3D aggregate formation? It may be possible to answer these important questions based on the AFM images. Is the higher resolution AFM measuring the oligomer sizes and their densities on the DNA possible?

      (5) Regarding the elastic and viscoelastic properties of the IN-DNA complexes studied in Fig. 4. These are very interesting observations that could take more interpretation. For example, why is the rosette center in Fig.4C has lower stiffness that the loop area? Is it because in the loops the stiffness is more of the background and bare DNA is felt? Does the stiffness of the fully compacted complex in Fig.4D follow the density of the globule?

      (6) Also, more interpretation of the observed dwell times and velocity distributions of the complex unfolding vs force can be provided, and what it tells us about the interactions that hold this complex together.

      (7) The effect of ALINIs on the structure of rosette and denser condensate is interesting. Based on the published notion on where ALINIS bind to IN and what kind of interactions they prevent can these results be better interpreted? Maybe the IN-IN interactions that hold the rosette together are the same as the ones that hold the dense aggregate together, but just at higher [IN]? And because the fewer IN interactions have to hold large DNA loops in the rosette, they are weaker interactions that are easier to disrupt via the same ALINI-IN interactions?

      (8) Finally, in the discussion it would be quite valuable if the authors could comment on the conclusions based on their findings for the in vivo IN-DNA interactions inside the mature capsid. As there are 100-150 IN molecules per capsid within the very small capsid volume, do all of these IN bunch up together on the dsDNA being synthesized? By the end of the reverse transcription when the vDNA ends are synthesized and processed, can this IN oligomer be re-bound to form the synapse of the vDNA ends?

    4. Reviewer #3 (Public review):

      Summary:

      In this work, the authors aims and efforts point towards evaluating the interaction mechanisms between viral protein integrase (IN) and viral DNA. They develop a multifaceted approach to probe the effect that IN has on the formation and structure of IN-DNA complexes under different environmental conditions to determine the role of IN in early stages of infection. HIV infection is considered a global pandemic with huge challenges in both treatment and prevention. This work presents a step towards understanding the mechanisms in early infection and thus prevention.

      The experimental work is carried out using single molecule imaging and force spectroscopy, alongside computational verification using Monte-Carlo simulations. The authors use a range of well-established methods to quantitatively evaluate this, pushing forward the current state of the art.

      The paper shows that in the presence of IN, DNA is compacted into a condensate in a biphasic manner, first forming a 'semi-compact' rosette condensate followed by a fully compacted condensate. As HIV DNA must be fully compacted to enter the cell nucleus for infection, this work describes the importance of the role of IN and the conditions required for it to reach a full condensate, and hence provides a new understanding on the early role of IN in infection. Furthermore, the authors show that the semi-compact rosette condensate (i.e. the first phase) is susceptible to IN inhibitors whereas the second compaction phase is insusceptible. This work provides us with information that using inhibitors in the early stages of IN-DNA interaction, infection may be prevented.

      Strengths:

      The authors present a strong piece of work, using current experimental and computational methods to investigate IN-DNA interactions and to convincingly describe their experimental observations. Firstly the data and analysis shown from AFM and MT experiments convincingly show a two-phase compaction of DNA upon interaction with IN. The authors use Monte-Carlo simulations to model DNA-IN interactions, specifically showing that their experimental results of a two-phase compaction can only be observed via simulations if IN-IN attraction is included.

      The authors aim of showing the effect of IN on the compaction of DNA was achieved successfully using AFM and MT. Furthermore, the works show clearly the susceptibility of the partially compacted DNA-IN core to inhibitors. Overall the conclusions in this paper are supported well by their experimental data and it is likely that this paper will not only be used as a model for future experimental work to explore other retroviral nucleoprotein condensation but also to develop a deeper understanding of the role of IN-inhibitors infection prevention.

      Finally, the article is written very coherently and is well supported by critical analysis of their findings and appropriate referencing to supplementary figures.

      Overall, this article is very worthy and through extensive and detailed work the authors probe difficult questions regarding HIV infection, which currently poses a huge global risk. The work completed by the authors substantially advances our understanding of HIV infection and can be used by those in the future to probe this question further.

      Weaknesses:

      Important aspects of the methodologies in this paper are not described in detail. For example, force volume curves have been used to evaluate the mechanical properties of the DNA-IN complex. Force-volume measurements are prone to a number of errors, particularly relating to data acquisition and analysis. The methodology presented is not clear on how the data is acquired, whether statically or in amplitude modulation, which affects analysis and interpretation. Although the authors do recognise some of the difficulties with force curve analysis, a more rigorous study could have been provided with citations to additional relevant literature (particularly taking note of the methods).

      A minor point is that it is not clear that the AFM imaging is performed in air, in contrast to AFM force spectroscopy in liquid, which could affect the interpretation of the data and therefore comparisons which are drawn between the two. This is made more challenging as the methodology for the compaction measurements is not described in the methods, and the code is not provided. The source code should be made open-access and available to enable the work to be better understood and reproduced.

    1. eLife Assessment

      This valuable paper reports image analysis pipelines for the automated segmentation of micronuclei and the detection and sorting of micronuclei-containing cells, which could be powerful tools for researchers studying micronuclei. While the development of the pipelines is solid, a proof-of-principle experiment is not entirely conclusive and leaves open the possibility that additional refinements are required, which would be facilitated by a more detailed explanation of the methods used.

    2. Reviewer #1 (Public review):

      DiPeso et al. develop two tools to (i) classify micronucleated (MN) cells, which they call VCS MN, and (ii) segment micronuclei and nuclei with MMFinder. They then use these tools to identify transcriptional changes in MN cells.

      The strengths of this study are:

      (1) Developing highly specialized tools to speed up the analysis of specific cellular phenomena such as MN formation and rupture is likely valuable to the community and neglected by developers of more generalist methods.

      (2) A lot of work and ideas have gone into this manuscript. It is clearly a valuable contribution.

      (3) Combining automated analysis, single-cell labeling, and cell sorting is an exciting approach to enrich phenotypes of interest, which the authors demonstrate here.

      Weaknesses:

      (1) Images and ground truth labels are not shared for others to develop potentially better analysis methods.

      (2) Evaluations of the methods are often not fully explained in the text.

      (3) To my mind, the various metrics used to evaluate VCS MN reveal it not to be terribly reliable. Recall and PPV hover in the 70-80% range except for the PPV for MN+. It is what it is - but do the authors think one has to spend time manually correcting the output or do they suggest one uses it as is?

    3. Reviewer #2 (Public review):

      Summary:

      Micronuclei are aberrant nuclear structures frequently seen following the missegregation of chromosomes. The authors present two image analysis methods, one robust and another rapid, to identify micronuclei (MN) bearing cells. The authors induce chromosome missegregation using an MPS1 inhibitor to check their software outcomes. In missegregation-induced cells, the authors do not distinguish cells that have MN from those that have MN with additional segregation defects. The authors use RNAseq to assess the outcomes of their MN-identifying methods: they do not observe a transcriptomic signature specific to MN but find changes that correlate with aneuploidy status. Overall, this work offers new tools to identify MN-presenting cells, and it sets the stage with clear benchmarks for further software development.

      Strengths:

      Currently, there are no robust MN classifiers with a clear quantification of their efficiency across cell lines (mIoU score). The software presented here tries to address this gap. GitHub material (tools, protocols, etc) provided is a great asset to naive and experienced computational biologists. The method has been tested in more than one cell line. This method can help integrate cell biology and 'omics' studies.

      Weaknesses:

      Although the classifier outperforms available tools for MN segmentation by providing mIOU, it's not yet at a point where it can be reliably applied to functional genomics assays where we expect a range of phenotypic penetrance.

      Spindle checkpoint loss (e.g., MPS1 inhibition) is expected to cause a variety of nuclear atypia: misshapen, multinucleated, and micronucleated cells. It may be difficult to obtain a pure MN population following MPS1 inhibitor treatment, as many cells are likely to present MN among multinucleated or misshapen nuclear compartments. Given this situation, the transcriptomic impact of MN is unlikely to be retrieved using this experimental design, but this does not negate the significance of the work. The discussion will have to consider the nature, origin, and proportion of MN/rupture-only states - for example, lagging chromatids and unaligned chromosomes can result in different states of micronuclei and also distinct cell fates.

    4. Reviewer #3 (Public review):

      Summary:

      The authors develop a method to visually analyze micronuclei using automated methods. The authors then use these methods to isolate MN post-photoactivation and analyze transcriptional changes in cells with and without micronuclei of RPE-1 cells. The authors observe in RPE-1 cells that MN-containing cells show similar transcriptomic changes as aneuploidy, and that MN rupture does not lead to vast changes in the transcriptome.

      Strengths:

      The authors develop a method that allows for automating measurements and analysis of micronuclei. This has been something that the field has been missing for a long time. Using such a method has the potential to advance micronuclei biology. The authors also develop a method to identify cells with micronuclei in real time and mark them using photoconversion and then isolate them via FACS. The authors use this method to study the transcriptome. This method is very powerful as it allows for the sorting of a heterogenous population and subsequent analysis with a much higher sample number than could be previously done.

      Weaknesses:

      The major weakness of this paper is that the results from the RNA-seq analysis are difficult to interpret as very few changes are found to begin with between cells with MN and cells without. The authors have to use a 1.5-fold cut-off to detect any changes in general. This is most likely due to the sequencing read depth used by the authors. Moreover, there are large variances between replicates in experiments looking at cells with ruptured versus intact micronuclei. This limits our ability to assess if the lack of changes is due to truly not having changes between these populations or experimental limitations. Moreover, the authors use RPE-1 cells which lack cGAS, which may contribute to the lack of changes observed. Thus, it is possible that these results are not consistent with what would occur in primary tissues or just in general in cells with a proficient cGAS/STING pathway.

    1. eLife Assessment

      This manuscript provides valuable mechanistic insight into NSCLC progression, both in terms of tumour metastasis and the development of chemoresistance. The authors draw upon a range of techniques and assays and although the evidence shown is solid, suggestions by the two reviewers will strengthen the message. The work presented will be of interest to cancer biologists and more broadly to those interested in NSCLC translational studies.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Phosphodiesterase 1A Physically Interacts with YTHDF2 and Reinforces the Progression of Non-Small Cell Lung Cancer" explores the role of PDE1A in promoting NSCLC progression by binding to the m6A reader YTHDF2 and regulating the mRNA stability of several novel target genes, consequently activating the STAT3 pathway and leading to metastasis and drug resistance.

      Strengths:

      The study addresses a novel mechanism involving PDE1A and YTHDF2 interaction in NSCLC, contributing to our understanding of cancer progression.

      Weaknesses:

      The following issues should be addressed:

      (1) The body weight changes and/or survival times of each group in the in vivo metastasis studies should be provided.

      (2) In Figure 7, the direct binding between YTHDF2 and the potential target genes should be further validated by silencing YTHDF2 to observe the half-life of the mRNA levels of target genes, in addition to silencing PDE1A.

      (3) In Figure 7, the potential methylation sites of "A" on the target genes such as SOCS2 should be verified by mutation analysis, followed by m6A IP or reporter assays.

      (4) In Figure 6G, the correlation between the mRNA levels of STAT3 and YTHDF2 needs clarification. According to the authors' mechanism, the STAT3 pathway is activated, rather than upregulation of mRNA levels (or protein levels, as shown in Figure 6F). Figure 7 does not provide evidence that STAT3 is a bona fide target gene regulated by YTHDF2.

      (5) The final figure, which discusses sensitization to cisplatin by PDE1A suppression, does not appear to be closely related to the interaction or regulation of PDE1A/YTHDF2. If the authors claim this is an m6A-associated event, additional evidence is needed. Otherwise, this part could be removed from the manuscript.

    3. Reviewer #2 (Public review):

      This manuscript aims to investigate the biological impact and mechanisms of phosphodiesterase 1A (PDE1A) in promoting non-small cell lung cancer (NSCLC) progression. They first analyzed several databases and used three established NSCLC cell lines and a normal cell line to demonstrate that PDE1A is overexpressed in lung cancer and its expression negatively correlated with the outcomes of patients. Based on this data, they suggested PDE1A could be considered as a novel prognostic predictor in lung cancer treatment and progression. To study the biological function of PDE1A in NSCLC, they focused on testing the effect of inhibition of PDE1A genetically and pharmacologically on cell proliferation, migration, and invasion in vitro. They also used an experimental metastasis model via tail vein injection of H1299 cells to test if PDE1A promoted metastasis. By database analysis, they also decided to investigate if PDE1A promoted angiogenesis by co-culturing NSCLC cells with HUVECs as well as assessing the tumors from the subcutaneous xenograft model. However, in this model, whether PDE1A modulation impacted tumor metastasis was not examined. To address the mechanism of how PDE1A promotes metastasis, the authors again performed a bioinformatic and GSEA enrichment analysis and confirmed PDE1A indeed activated STAT3 signaling to promote migration. In combination with IP followed by Mass spectrometry, they found PDE1A is a partner of YTHDF2, the cooperation of PDE1A and YTHDF2 negatively regulated SOCS2 mRNA as demonstrated by RIP assay, and ultimately activated STAT3 signaling. Finally, the authors shifted the direction from metastasis to chemoresistance, specifically, they found that PDEA1 inhibitions sensitized NSCLC cells to cisplatin through MET and NRF2 signaling.

      Strength:

      Overall, the manuscript was well-written and the majority of the data supported the conclusions. The authors used a series of methods including cell lines, animal models, and database analysis to demonstrate the novel roles and mechanism of how PDE1 promotes NSCLC invasion and metastasis as well as cisplatin sensitivity. Given that PDE1A inhibitors have been perused to use in clinic, this study provided valuable findings that have the translational potential for NSCLC treatment.

      Weaknesses:

      The role of YTHDF2 in PDE1A-promoted tumor metastasis was not investigated. To make the findings more clinical and physiologically relevant, it would be interesting to test if inhibition of PDE1A impacts metastasis using lung cancer orthotopic and patient-derived xenograft models. It is also important to use a cisplatin-resistant NSCLC cell line to test if a PDE1A inhibitor has the potential to sensitize cisplatin in vitro and in vivo. Furthermore, this study relied heavily on different database analyses, although providing novel and compelling data that was followed up and confirmed in the paper, it is critical to have detailed statistical description section on data acquisition throughout the manuscript.

    1. eLife Assessment

      This manuscript reports on the crystal structures of two glycosaminoglycan (GAG) lyases from the PL35 family, along with in vitro enzyme activity assays and comprehensive structure-guided mutagenesis. While the study provides structural insights into the broad substrate specificity of these enzymes, the incomplete structural models, lack of key data such as Mn²⁺ binding confirmation, and reliance on basic docking methods diminish the overall impact. Although the work is useful for specialists in carbohydrate-active enzymes, additional data, and more rigorous analysis are required to present a complete study.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to uncover molecular and structural details underlying the broad substrate specificity of glycosaminoglycan lyases belonging to a specific family (PL35). They determined the crystal structures of two such enzymes, conducted in vitro enzyme activity assays, and a thorough structure-guided mutagenesis campaign to interrogate the role of specific residues. They made progress towards achieving their aims but I see significant holes in data that need to be determined and in the authors' analyses.

      Impact on the field:

      I expect this work will have a limited impact on the field, although, with additional experimental work and better analysis, this paper will be able to stand on its own as a solid piece of structure-function analysis.

      Strengths:

      The major strengths of the study were the combination of structure and enzyme activity assays, comprehensive structural analysis, as well as a thorough structure-guided mutagenesis campaign.

      Weaknesses:

      There were several weaknesses, particularly:

      (1) The authors claim to have done an ICP-MS experiment to show Mn2+ binds to their enzyme but did not present the data. The authors could have used the anomalous scattering properties of Mn2+ at the synchrotron to determine the presence and location of this cation (i.e. fluorescence spectra, and/or anomalous data collection at the Mn2+ absorption peak).

      (2) The authors have an over-reliance on molecular docking for understanding the position of substrates bound to the enzyme. The docking analysis performed was cursory at best; Autodock Vina is a fine program but more rigorous software could have been chosen, as well we molecular dynamics simulations. As well the authors do not use any substrate/product-bound structures from the broader PL enzyme family to guide the placement of the substrates in the GAGases, and interpret the molecular docking models.

      (3) The conclusion that the structures of GAGase II and VII are most similar to the structures of alginate lyases (Table 2 data), and the authors' reliance on DALI, are both questioned. DALI uses a global alignment algorithm, which when used for multi-domain enzymes such as these tends to result in sub-optimal alignment of active site residues, particularly if the active site is formed between the two domains as is the case here. The authors should evaluate local alignment methods focused on the optimization of the superposition of a single domain; these methods may result in a more appropriate alignment of the active site residues and different alignment statistics. This may influence the overall conclusion of the evolutionary history of these PL35 enzymes.

      (4) The data on the GAGase III residue His188 is not well interpreted; substitution of this residue clearly impacts HA and HS hydrolysis as well. The data on the impact on alginate hydrolysis is weak, which could be due to the fact that the WT enzyme has poor activity against alginate to start with.

      (5) The authors did not use the words "homology", "homologous", or "homolog" correctly (these terms mean the subjects have a known evolutionary relationship, which may or may not be known in the contexts the authors used these targets); the words "similarity" and "similar" are recommended to be used instead.

      (6) The authors discuss a "shorter" cavity in GAGases, which does not make sense and is not supported by any figure or analysis. I recommend a figure with a surface representation of the various enzymes of interest, with dimensions of the cavity labeled (as a supplemental figure). The authors also do not specifically define what subsites are in the context of this family of enzymes, nor do they specifically label or indicate the location of the subsites on the figures of the GAGase II and IV enzyme structures.

    3. Reviewer #2 (Public review):

      Summary:

      Wei et al. present the X-ray crystallographic structures of two PL35 family glycosaminoglycan (GAG) lyases that display a broad substrate specificity. The structural data show that there is a high degree of structural homology between these enzymes and GAGases that have previously been structurally characterized. Central to this are the N-terminal (α/α)7 toroid domain and the C-terminal two-layered β-sheet domain. Structural alignment of these novel PL35 lyases with previously deposited structures shows a highly conserved triplet of residues at the heart of the active sites. Docking studies identified potentially important residues for substrate binding and turnover, and subsequent site-directed mutagenesis paired with enzymatic assays confirmed the importance of many of these residues. A third PL35 GAGase that is able to turn over alginate was not crystallized, but a predicted model showed a conserved active site Asn was mutated to a His, which could potentially explain its ability to act on alginate. Mutation of the His into either Ala or Asn abrogated its activity on alginate, providing supporting evidence for the importance of the His. Finally, a catalytic mechanism is proposed for the activity of the PL35 lyases. Overall, the authors used an appropriate set of methods to investigate their claims, and the data largely support their conclusions. These results will likely provide a platform for further studies into the broad substrate specificity of PL35 lyases, as well as for studies into the evolutionary origins of these unique enzymes

      Strengths:

      The crystallographic data are of very high quality, and the use of modern structural prediction tools to allow for comparison of GAGase III to GAGase II/GAGase VII was nice to see. The authors were comprehensive in their comparison of the PL35 lyases to those in other families. The use of molecular docking to identify key residues and the use of site-directed mutagenesis to investigate substrate specificity was good, especially going the extra distance to mutate the conserved Asn to His in GAGase II and GAGase VII.

      Weaknesses:

      The structural models simply are not complete. A cursory look at the electron density and the models show that there are many positive density peaks that have not had anything modelled into them. The electron density also does not support the placement of a Mn2+ in the model. The authors indicate that ICP-MS was done to identify the metal, but no ICP-MS data is presented in the main text or supplementary. I believe the authors put too much emphasis on the possibility of GAGase III representing an evolutionary intermediate between GAG lyases and alginate lyases based on a single Asn to His mutation in the active site, and I don't believe that enough time was spent discussing how this "more open and shorter" catalytic cavity would necessarily mean that the enzyme could accommodate a broader set of substrates. Finally, the proposed mechanism does not bring the enzyme back to its starting state.

    1. eLife Assessment

      In this potentially important study, the authors conducted extensive atomistic and coarse-grained simulations as well as a lattice Monte Carlo analysis to probe the driving force and functional impact of supercomplex formation in the inner mitochondrial membrane. The study highlighted the importance of membrane mechanics to the supercomplex formation and revealed differences in structural and dynamical features of the protein components upon complex formation. In its current form, the analysis is considered incomplete, especially concerning the contributions of membrane mechanics and allosteric coupling of key regions.

    2. Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written but identified a number of technical issues that I suggest should be addressed:

      (1) Neither the acyl chain chemical makeup nor the protonation state of CDL are specified. The acyl chain is likely 18:2/18:2/18:2/18:2, but the choice of the protonation state is not straightforward.

      (2) The analysis of the bilayer deformation lacks membrane mechanical expertise. Here I am not ridiculing the authors - the presentation is very conservative: they find a deformed bilayer, do not say what the energy is, but rather try a range of energies in their Monte Carlo model - a good strategy for a group that focuses on protein simulations. The bending modulus and area compressibility modulus are part of the standard model for quantifying the energy of a deformed membrane. I suppose in theory these might be computed by looking at the per-lipid distribution in thickness fluctuations, but this route is extremely perilous on a per-molecule basis. Instead, the fluctuation in the projected area of a lipid patch is used to imply the modulus [see Venable et al "Mechanical properties of lipid bilayers from molecular dynamics simulation" 2015 and citations within]. Variations in the local thickness of the membrane imply local variations of the leaflet normal vector (the vector perpendicular to the leaflet surface), which is curvature. With curvature and thickness, the deformation energy is analyzed.

      See:<br /> Two papers: "Gramicidin A Channel Formation Induces Local Lipid Redistribution" by Olaf Andersen and colleagues. Here the formation of a short peptide dimer is experimentally linked to hydrophobic mismatch. The presence of a short lipid reduces the influence of the mismatch. See below regarding their model cardiolipin, which they claim is shorter than the surrounding lipid matrix.

      Also, see:<br /> Faraldo-Gomez lab "Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states", 2021. Mondal et al "Membrane Driven Spatial Organization of GPCRs" 2013 and many citations within these papers.

      While I strongly recommend putting the membrane deformation into standard model terms, I believe the authors should retain the basic conservative approach that the membrane is strongly deformed around the proteins and that making the SC reduces the deformation, then exploring the consequences with their discrete model.

      (1) If CDL matches the hydrophobic thickness of the protein it would disrupt SC formation, not favor it. The authors' hypothesis is that the SC stabilizes the deformed membrane around the separated elements. Lipids that are compatible with the monomer deformed region stabilize the monomer, similarly to a surfactant. That is, if CDL prefers the interface because the interface is thin and their CDL is thin, CDL should prevent SC formation. A simpler hypothesis is that CDL's unique electrostatics are part of the glue.

      (2) Error bars for lipid and Q* enrichments should be computed averaging over multi-lipid regions of the protein interface, e.g., dividing the protein-lipid interface into six to ten domains, in particular functionally relevant regions. Anionic lipids may have long, >500 ns residence times, which makes lipid enrichment large and characterization of error bars challenging in short simulations. Smaller regions will be noisy. The plots depicted in, for example, Figure S2 are noisy.

      (3) The membrane deformation is repeatedly referred to as "entropic" without justification. The bilayer has significant entropic and enthalpic terms just like any biomolecule, why are the authors singling out entropy? The standard "Helfrich" energetic Hamiltonian is a free energy model in that it implicitly integrates over many lipid degrees of freedom.

      (4) Figure S7 shows the surface area per lipid and leaflet height. This appears to show a result that is central to the interpretation of SC formation but which makes very little sense. One simply does not increase both the height and area of a lipid. This is a change in the lipid volume! The bulk compressibility of most anything is much higher than its Young's modulus [similar to area compressibility]. Instead, something else has happened. My guess is that there is *bilayer* curvature around these proteins and that it has been misinterpreted as area/thickness changes with opposite signs of the two leaflets. If a leaflet gets thin, its area expands. If the manuscript had more details regarding how they computed thickness I could help more. Perhaps they measured the height of a specific atom of the lipid above the average mid-plane normal? The mid-plane of a highly curved membrane would deflect from zero locally and could be misinterpreted as a thickness change.

      (5) The authors write expertly about how conformational changes are interpreted in terms of function but the language is repeatedly suggestive. Can they put their findings into a more quantitative form with statistical analysis? "The EDA thus suggests that the dynamics of CI and CIII2 are allosterically coupled."

      (6) The authors write "We find that an increase in the lipid tail length decreases the relative stability of the SC (Figure S5C)" This is a critical point but I could not interpret Figure S5C consistently with this sentence. Can the authors explain this?

      (7) The authors use a 6x6 and 15x15 lattice to analyze SC formation. The SC assembly has 6 units of E_strain favoring its assembly, which they take up to 4 kT. At 3 kT, the SC should be favored by 18 kT, or a Boltzmann factor of 10^8. With only 225 sites, specific and non-specific complex formation should be robust. Can the authors please check their numbers or provide a qualitative guide to the data that would make clear what I'm missing?

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results) ... but confusing in terms of the non-standard presentation of membrane mechanics and the difficulty of this reviewer to interpret some of the underlying figures: especially, the thickness of the leaflets around the protein and the relative thickness of cardiolipin. Resolving the quantitative interpretation of the bilayer deformation would greatly enhance the significance of their Monte Carlo model of SC formation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have used large-scale atomistic and coarse-grained molecular dynamics simulations on the respiratory chain complex and investigated the effect of the complex on the inner mitochondrial membrane. They have also used a simple phenomenological model to establish that the super complex (SC) assembly and stabilisation are driven by the interplay between the "entropic" forces due to strain energy and the enthalpies forces (specific and non-specific) between lipid and protein domains. The authors also show that the SC in the membrane leads to thinning and there is preferential localisation of certain lipids (Cardiolipin) in the annular region of the complex. The data reports that the SC assembly has an effect on the conformational dynamics of individual proteins making up the assembled complex and they undergo "allosteric crosstalk" to maintain the stable functional complex. From their conformational analyses of the proteins (individual and while in the complex) and membrane "structural" properties (such as thinning/lateral organization etc) as well from the out of their phenomenological lattice model, the authors have provided possible implications and molecular origin about the function of the complex in terms of aspects such as charge currents in internal mitochondrion membrane, proton transport activity and ATP synthesis.

      Strengths:

      The work is bold in terms of undertaking modelling and simulation of such a large complex that requires simulations of about a million atoms for long time scales. This requires technical acumen and resources. Also, the effort to make connections to experimental readouts has to be appreciated (though it is difficult to connect functional pathways with limited (additive forcefield) simulations.

      Weakness:

      There are several weaknesses in the paper (please see the list below). Claims such as "entropic effect", "membrane strain energy" and "allosteric cross talks" are not properly supported by evidence and seem far-fetched at times. There are other weaknesses as well. Please see the list below.

      (i) Membrane "strain energy" has been loosely used and no effort is made to explain what the authors mean by the term and how they would quantify it. If the membrane is simulated in stress-free conditions, where are strains building up from?

      (ii) In result #1 (Protein membrane interaction modulates the lipid dynamics ....), I strongly feel that the readouts from simulations are overinterpreted. Membrane lateral organization in terms of lipids having preferential localisation is not new (see doi: 10.1021/acscentsci.8b00143) nor membrane thinning and implications to function (https://doi.org/10.1091/mbc.E20-12-0794). The distortions that are visible could be due to a mismatch in the number of lipids that need to be there between the upper and lower leaflets after the protein complex is incorporated. Also, the physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets - none of which has been considered. Connecting chain length to strain energy is also not well supported - are the authors trying to correlate membrane order (Lo vs Ld) with strain energy?

      (iii) Entropic effect: What is the evidence towards the entropic effect? If strain energy is entropic, the authors first need to establish that. They discuss enthalpy-entropy compensation but there is no clear data or evidence to support that argument. The lipids will rearrange themselves or have a preference to be close to certain regions of the protein and that generally arises because of enthalpies reasons (see the body of work done by Carol Robinson with Mass Spec where certain lipids prefer proteins in the GAS phase, certainly there is no entropy at play there). I find the claims of entropic effects very unconvincing.

      (iv) The changes in conformations dynamics are subtle as reported by the authors and the allosteric arguments are made based on normal mode analyses. In the complex, there are large overlapping regions between the CI, CIII2, and SCI/III2. I am not sure how the allosteric crosstalk claim is established in this work - some more analyses and data would be useful. Normal mode analyses (EDA) suggest that the motions are coupled and correlated - I am not convinced that it suggests that there is allosteric cross-talk.

      (v) The lattice model should be described better and the rationale for choosing the equation needs to be established. Specific interactions look unfavourable in the equation as compared to non-specific interactions.

    4. Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained, and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. Overall, the study is rather thorough and highly creative, and the impact on the field is expected to be significant.

      Weaknesses:

      In general, I don't think the work contains any obvious weaknesses, although I was left with some questions.

    1. eLife Assessment

      This manuscript presents a valuable finding on the impact of FRMD8 loss on tumor progression and the resistance to tamoxifen therapy. The author conducted systematic experiments to explore the role of FRMD8 in breast cancer and its potential regulatory mechanisms, confirming that FRMD8 affects tamoxifen resistance and convincingly validating this hypothesis through a series of experiments.

    2. Reviewer #1 (Public review):

      Summary:

      Tamoxifen resistance is a common problem in partially ER-positive patients undergoing endocrine therapy, and this manuscript has important research significance as it is based on clinical practical issues. The manuscript discovered that the absence of FRMD8 in breast epithelial cells can promote the progression of breast cancer, thus proposing the hypothesis that FRMD8 affects tamoxifen resistance and validating this hypothesis through a series of experiments. The manuscript has a certain theoretical reference value.

      Strengths:

      At present, research on the role of FRMD8 in breast cancer is very limited. This manuscript leverages the MMTV-Cre+;Frmd8fl/fl;PyMT mouse model to study the role of FRMD8 in tamoxifen resistance, and single-cell sequencing technology discovered the interaction between FRMD8 and ESR1. At the mechanistic level, this manuscript has demonstrated two ways in which FRMD8 affects ERα, providing some new insights into the development of ER-positive breast cancer in patients who are resistant to tamoxifen.

      Weaknesses:

      This manuscript repeatedly emphasizes the role of FRMD8/FOXO3A in tamoxifen resistance in ER-positive breast cancer, but the specific mechanisms have not yet been fully elucidated. Whether FRMD8 can become a biomarker should be verified in large clinical samples or clinical data.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable finding on the impact of FRMD8 loss on tumor progression and the resistance to tamoxifen therapy. The author conducted systematic experiments to explore the role of FRMD8 in breast cancer and its potential regulatory mechanisms, confirming that FRMD8 could serve as a potential target to revere tamoxifen resistance.

      Strengths:

      The majority of the research is logically clear, smooth, and persuasive.

      Weaknesses:

      Some research in the article lacks depth and some sentences are poorly organized.

    1. eLife Assessment

      In this valuable manuscript, the authors propose that the lysosomal protein LAPTM4B plays a role in suppressing the TGF-β/SMAD signaling pathway and suggest that enhancing LAPTM4B function could be a potential therapeutic strategy for alleviating bleomycin-induced lung fibrosis. The findings will be of interest to the lung disease field, and the data presented to support the authors' conclusions is solid. However, it remains unclear whether the suppressive effect of LAPTM4b on idiopathic pulmonary fibrosis is mediated by Nedd4l.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors propose that LAPTM4B plays a role in suppressing the TGF-β/SMAD signaling pathway and suggest that enhancing LAPTM4B function could be a potential therapeutic strategy for alleviating BLM-induced lung fibrosis. Their data show that LAPTM4B knockdown exacerbates fibrosis phenotypes, both in vivo and in vitro, while LAPTM4B overexpression mitigates these effects by recruiting NEDD4L to destabilize SMAD proteins.

      Strengths:

      The findings are significant for the lung disease field, and the data presented support the authors' conclusions. This work would be of even higher interest after sufficiently addressing the weaknesses listed below.

      Weaknesses:

      Several issues need to be addressed. First, it is unclear why the authors chose to focus on LAPTM4B specifically, rather than other members of the LAPTM family, such as LAPTM4A or LAPTM5. Additionally, the manuscript does not address whether lysosomes are involved in the degradation of ubiquitinated LAPTM4B.

    3. Reviewer #2 (Public review):

      Summary:

      It was previously documented that lysosomal localization of the Lysosomal transmembrane proteins LAPTM4 or 5 (including LAPTM4b) is regulated by Nedd4 family ubiquitin ligases, and independently, that Nedd4l regulates IPF (Idiopathic Pulmonary Fibrosis) in mouse lungs via regulation of the TGFb pathway (ie, Nedd4l lung-specific KO mice develop IPF due to reduced ability to suppress the TGFb pathway -PMID: 32332792 ). Here, Xu et al investigated the role of LAPTM4b in IPF and suggested that the suppression of IPF by LAPTM4b, which they discovered here, is mediated via its interaction with Nedd4L, which normally suppresses TGFb signaling.

      Strengths:

      Overall, this is an interesting paper that identified for the first time a suppressive role of LAPTM4b in IPF, using both in vivo mouse models and cell culture studies.

      Weaknesses:

      (1) The most obvious shortcoming of this study is the lack of experimental evidence that the suppressive effect of LAPTM4b on IPF is mediated by Nedd4l.

      (2) Along the same lines, despite the authors' claim, overexpression of Nedd4L in cells does not increase SMAD3 ubiquitination (Fig 6D), which is a marker of TGFbR activation. Likewise, in Fig 5E, SMAD2 seems to be ubiquitinated similarly in the presence or absence of LAPTM4b (despite claims that LAPTM4b promotes ubiquitination of SMAD2). Same for K48 ubiquitination of TGFbR (Figure 5H).

      (3) How does LAPTM4b interact with SMAD2 or 3, or TGFbR?

      (4) All immunofluorescence (IF) studies depict 1 or 2 cells, with no quantification or statistics.

      (5) Some of the Western blots (WB) are also not quantified, so any claims of an effect cannot be evaluated without such quantification and statistics.

      (6) In the IF studies showing lung tissue (eg Figure 1B), why is LAPTM4b (wildtype) localized to the plasma membrane instead of lysosomes/endosomes?

    1. eLife Assessment

      This valuable study describes how a single effector of the Type Six Secretion System (T6SS) has two distinct enzymatic functions that together may contribute to bacterial survival and dynamics in a community and provide potential for developing new antimicrobial compounds. The authors have deployed a range of methods in biochemistry, microbiology, and microscopy, generating solid data that support the main assertions. While the manuscript could benefit from additional clarifying experiments and a more detailed discussion of the methods, it will appeal to those studying T6SS, particularly those interested in effectors and bacterial enzymes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript performs a comprehensive biochemical, structural, and bioinformatic analysis of TseP, a type 6 secretion system effector from Aeromonas dhakensis that includes the identification of a domain required for secretion and residues conferring target organism specificity. Through targeted mutations, they have expanded the target range of a T6SS effector to include a gram-positive species, which is not typically susceptible to T6SS attack.

      Strengths:

      All of the experiments presented in the study are well-motivated and the conclusions are generally sound.

      Weaknesses:

      There are some issues with the clarity of figures. For example, the microscopy figures could have been more clearly presented as cell counts/quantification rather than representative images. Similarly, loading controls for the secreted proteins for the westerns probably should be shown.

      Also, some of the minor/secondary conclusions reached regarding the "independence" of the N and C term domains of the TseP are a bit overreaching.

    3. Reviewer #2 (Public review):

      Summary:

      Wang et al. investigate the role of TseP, a Type VI secretion system (T6SS) effector molecule, revealing its dual enzymatic activities as both an amidase and a lysozyme. This discovery significantly enhances the understanding of T6SS effectors, which are known for their roles in interbacterial competition and survival in polymicrobial environments. TseP's dual function is proposed to play a crucial role in bacterial survival strategies, particularly in hostile environments where competition between bacterial species is prevalent.

      Strengths:

      (1) The dual enzymatic function of TseP is a significant contribution, expanding the understanding of T6SS effectors.

      (2) The study provides important insights into bacterial survival strategies, particularly in interbacterial competition.

      (3) The findings have implications for antimicrobial research and understanding bacterial interactions in complex environments.

      Weaknesses:

      (1) The manuscript assumes familiarity with previous work, making it difficult to follow. Mutants and strains need clearer definitions and references.

      (2) Figures lack proper controls, quantification, and clarity in some areas, notably in Figures 1A and 1C.

      (3) The Materials and Methods section is poorly organized, hindering reproducibility. Biophysical validation of Zn²⁺ interaction and structural integrity of proteins need to be addressed.

      (4) Discrepancies in protein degradation patterns and activities across different figures raise concerns about data reliability.

    4. Reviewer #3 (Public review):

      Summary:

      Type VI secretion systems (T6SS) are employed by bacteria to inject competitor cells with numerous effector proteins. These effectors can kill injected cells via an array of enzymatic activities. A common class of T6SS effector are peptidoglycan (PG) lysing enzymes. In this manuscript, the authors characterize a PG-lysing effector-TseP-from the pathogen Aeromonas dhakensis. While the C-terminal domain of TseP was known to have lysozyme activity, the N-terminal domain was uncharacterized. Here, the authors functionally characterize TsePN as a zinc-dependent amidase. This discovery is somewhat novel because it is rare for PG-lysing effectors to have amidase and lysozyme activity.

      In the second half of the manuscript, the authors utilize a crystal structure of the lysozyme TsePC domain to inform the engineering of this domain to lyse gram-positive peptidoglycan.

      Strengths:

      The two halves of the manuscript considered together provide a nice characterization of a unique T6SS effector and reveal potentially general principles for lysozyme engineering.

      Weaknesses:

      The advantage of fusing amidase and lysozyme domains in a single effector is not discussed but would appear to be a pertinent question. Labeling of the figures could be improved to help readers understand the data.

    1. eLife Assessment

      This work provides a potentially valuable framework for understanding the primary causes of disease. However, the evidence supporting the utility of the approach is incomplete given the reliance on strong assumptions about the underlying causal mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript seeks to estimate the causal effect of genes on disease. To do so, they introduce a novel algorithm, termed the Root Causal Strength using Perturbations (RCSP) algorithm. RCSP uses perturb-seq to first estimate the gene regulatory network structure among genes, and then uses bulk RNA-seq with phenotype data on the samples to estimate causal effects of genes on the phenotype conditional on the learned network structure. The authors assess the performance of RCSP in comparison to other methods via simulation. Next, they apply RCSP to two real human datasets: 513 individuals age-related macular degeneration and 137 individuals with multiple sclerosis.

      Strengths:

      The authors tackle an important and ambitious problem - the identification of causal contributors to disease in the context of a causal inference framework. As the authors point out, observational RNA-seq data is insufficient for this kind of causal discovery, since it is very challenging to recover the true underlying graph from observational data; interventional data are needed. However, little perturb-seq data has been generated with annotated phenotype data, and much bulk RNA-seq data has already been generated, so it is useful to propose an algorithm to integrate the two as the authors have done.

      The authors also offer substantial theoretical exposition for their work, bringing to bear both the literature on causal discovery as well as literature on the genetic architecture of complex traits.

      Weaknesses:

      The notion of a "root" causal gene - which the authors define based on a graph theoretic notion of topologically sorting graphs - requires a graph that is directed and acyclic. It is the latter that constitutes an important weakness here - it simply is a large simplification of human biology to draw out a DAG including hundreds of genes and a phenotype Y and to claim that the true graph contains no cycles. This is briefly touched upon the discussion, but given the fundamental nature of this choice - the manuscript should devote at least some of the main results to exploring the consequence of mischaracterizing true cyclic graphs as DAGs in this framework. For example - consider the authors' analysis of T cell infiltration in multiple sclerosis (MS). CD4+ effector T cells have the interesting property that they are stimulated by IL2 as a growth factor; yet IL2 also stimulates the activation of (suppressive) regulatory T cells. What does it mean to analyze CD4+ regulation in disease with a graph that does not consider IL2 (or other cytokine) mediated feedback loops/cycles?

      I also encourage the authors to consider more carefully when graph structure learned from perturb-seq can be ported over to bulk RNA-seq. Consider again the MS CD4+ example - the authors first start with a large perturb-seq experiment (Replogle et al., 2022) performed in K562 cells. To what extent are K562 cells, which are derived from a leukemia cell line, suitable for learning the regulatory structure of CD4+ cells from individuals with an MS diagnosis? Presumably this structure is not exactly correct - to what extent is the RCSP algorithm sensitive to false edges in this graph? This leap - from cell line to primary human cells - is also not modeled in the simulation. Although challenging - it would be ideal for the RCSP to model or reflect the challenges in correctly identifying the regulatory structure.

      It should also be noted that in most perturb-seq experiments, the entire genome is not perturbed, and frequently important TFs (that presumably are very far "upstream" and thus candidate "root" causal genes) are not expressed highly enough to be detected with scRNA-seq. In that context - perhaps slightly modifying the language regarding RCSP's capabilities might be helpful for the manuscript - perhaps it would be better to describe it has an algorithm for causal discovery among a set of genes that were perturbed and measured, rather than a truly complete search for causal factors. Perhaps more broadly - it would also benefit the manuscript to devote slightly more text to describing the kinds of scenarios where RCSP (and similar ideas) would be most appropriately applied - perhaps a well-powered, phenotype annotated perturb-seq dataset performed in a disease relevant primary cell.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a very interesting use of a causal graph framework to identify the "root genes" of a disease phenotype. Root genes are the genes that cause a cascade of events that ultimately leads to the disease phenotype, assuming the disease progression is linear.

      Strengths:

      - The methodology has a solid theoretical background.<br /> - This is a novel use of the causal graph framework to infer root causes in a graph

      Weaknesses:

      (1) General Comments<br /> First, I have some general comments. I would argue that the main premise of the study might be inaccurate or incomplete. There are three major attributes of real biological systems, which are not considered in this work.

      One is that the process from health-to-disease is not linear most of the time with many checks along the way that aim to prevent the disease phenotype. This leads to a non-deterministic nature of the path from health-to-disease. In other words, with the same root gene perturbations, and depending on other factors outside of gene expression, someone may develop a phenotype in a year, another in 10 years and someone else never. Claiming that this information is included in the error terms might not be sufficient to address this issue. The authors should discuss this limitation.

      Two, the paper assumes that the network connectivity will remain the same after perturbation. This is not always true due to backup mechanisms in the cells. For example, suppose that a cell wants to create product P and it can do it through two alternative paths:<br /> Path #1: A -> B -> P Path #2: A -> C -> P<br /> Now suppose that path #1 is more efficient, so when B can be produced, path #2 is inactive. Once the perturbation blocks element B from being produced, the graph connectivity changes by activation of path #2. I did not see the authors taking this into consideration, which seems to be a major limitation in using perturb-seq results to infer connectivities.

      Three, there is substantial system heterogeneity that may cause the same phenotype. This goes beyond the authors claim that although the initial gene causes of a disease may differ from person to person, at some point they will all converge to changes in the same set of "root genes". This is not true for many diseases, which are defined based on symptoms and lab tests at the patient level. You may have two completely different molecular pathologies that lead to the development of the same symptoms and test results. Breast cancer with its subtypes is a prime example of that. In theory, this issue could be addressed if there is infinite sample size. However, this assumption is largely violated in all existing biological datasets.

      All the above limit the usefulness of this method for most chronic diseases, although it might still lead to interesting discoveries in cancer (in which the association between genes' dysregulation and development of cancer is more direct and occurs in less amount of time).

      With these in mind, the theoretical and algorithmic advances this paper offers are interesting. And the theoretical proofs are solid.

      (2) Specific comments.<br /> I am curious how the simulated data were generated and processed. Specifically, were the values of the synthetic variables Z-scored? If not, then I would expect that the variance of every variable will increase from the roots of the graph to the leaves. That will give an advantage in any algorithm aiming to identify causal relations based on error terms. For fairness and completeness, the authors should Z-score the values in the synthetic data and compare the results.

      The algorithm seems to require both RNA-seq and Perturb-seq data (Algorithm 1, page 14). Can it function with RNA-seq data only? What will be different in this case?

      (3) Additional comments:<br /> Although the manuscript is generally written clearly, some parts are not clear and others have missing details that make the narrative difficult to follow up. Some specific examples:<br /> - Synthetic data generation: how many different graphs (SEMs) did they start from? (30?) How many samples per graph? Did they test different sample sizes?<br /> - The presentation of comparative results (Suppl fig 4 and 7) is not clear. No details are given on how these results were generated. (what does it mean "The first column denotes the standard deviation of the outputs for each algorithm"?) Why all other methods have higher SD differences than RCSP? Is it a matter of scaling? Shouldn't they have at least some values near zero since the authors "added the minimum value so that all histograms begin at zero"? also, why RCSP results are more like a negative binomial distribution and every other is kind of normal?<br /> - What is the significance of genes changing expression "from left to right" in a UMAP plot? (eg Fig. 3h and 3g)

      The authors somewhat overstate the novelty of their algorithm. Representation of GRNs as causal graphs dates back in 2000 with the work of Nir Friedman in yeast. Other methods were developed more recently that look on regulatory network changes at the single sample level which the authors do not seem to be aware (e.g., Ellington et al, NeurIPS 2023 workshop GenBio and Bushur et al, 2019, Bioinformatics are two such examples). The methods they mention are for single cell data and they are not designed to connect single sample-level changes to a person's phenotype. The RCS method needs to be put in the right background context in order to bring up what is really novel about it.

    4. Reviewer #3 (Public review):

      Summary:

      The authors provide an interesting and novel approach, RCSP, to determining what they call the "root causal genes" for a disease, i.e. the most upstream, initial causes of disease. RCSP leverages perturbation (e.g. Perturb-seq) and observational RNA-seq data, the latter from patients. They show using both theory and simulations that if their assumptions hold then the method performs remarkably well, compared to both simple and available state-of-the-art baselines. Whether the required assumptions hold for real diseases is questionable. They show superficially reasonable results on AMD and MS.

      Strengths:

      The idea of integrating perturbation and observational RNA-seq dataset to better understand the causal basis of disease is powerful and timely. We are just beginning to see genome-wide perturbation assay, albeit in limited cell-types currently. For many diseases, research cohorts have at least bulk observational RNA-seq from a/the disease-relevant tissue(s). Given this, RCSP's strategy of learning the required causal structure from perturbations and applying this knowledge in the observational context is pragmatic and will likely become widely applicable as Perturb-seq data in more cell-types/contexts becomes available.

      The causal inference reasoning is another strength. A more obvious approach would be to attempt to learn the causal network structure from the perturbation data and leverage this in the observational data. However, structure learning in high-dimensions is notoriously difficult, despite recent innovations such as differentiable approaches. The authors notice that to estimate the root causal effect for a gene X, one only needs access to a (superset of) the causal ancestors of X: much easier relationships to detect than the full network.

      The applications are also reasonably well chosen, being some of the few cases where genome-scale perturb-seq is available in a roughly appropriate (see below) cell-type, and observational RNA-seq is available at a reasonable sample size.

      Weaknesses:

      Several assumptions of the method are problematic. The most concerning is that the observational expression changes are all causally upstream of disease. There is work using Mendelian randomization (MR) showing that the _opposite_ is more likely to be true: most differential expression in disease cohorts is a consequence rather than a cause of disease (https://www.nature.com/articles/s41467-021-25805-y). Indeed, the oxidative stress of AMD has known cellular responses including the upregulation of p53. The authors need to think carefully about how this impacts their framework. Can the theory say anything in this light? Simulations could also be designed to address robustness.

      A closely related issue is the DAG assumption of no cycles. This assumption is brought to bear because it required for much classical causal machinery, but is unrealistic in biology where feedback is pervasive. How robust is RCSP to (mild) violations of this assumption? Simulations would be a straightforward way to address this.

      The authors spend considerable effort arguing that technical sampling noise in X can effectively be ignored (at least in bulk). While the mathematical arguments here are reasonable, they miss the bigger picture point that the measured gene expression X can only ever be a noisy/biased proxy for the expression changes that caused disease: 1) Those events happened before the disease manifested, possibly early in development for some conditions like neurodevelopmental disorders. 2) bulk RNA-seq gives only an average across cell-types, whereas specific cell-types are likely "causal". 3) only a small sample, at a single time point, is typically available. Expression in other parts of the tissue and at different times will be variable.

      My remaining concerns are more minor.

      While there are connections to the omnigenic model, the latter is somewhat misrepresented. 1) The authors refer to the "core genes" of the omnigenic model as being at the end (longitudinally) of pathogenesis. The omnigenic model makes no statements about temporally ordering: in causal inference terminology the core genes are simply the direct cause of disease. 2) "Complex diseases often have an overwhelming number of causes, but the root causal genes may only represent a small subset implicating a more omnigenic than polygenic model" A key observation underlying the omnigenic model is that genetic heritability is spread throughout the genome (and somewhat concentrated near genes expressed in disease relevant cell types). This implies that (almost) all expressed genes, or their associated (e)SNPs, are "root causes".

      The claim that root causal genes would be good therapeutic targets feels unfounded. If these are highly variable across individuals then the choice of treatment becomes challenging. By contrast the causal effects may converge on core genes before impacting disease, so that intervening on the core genes might be preferable. The jury is still out on these questions, so the claim should at least be made hypothetical.

      The closest thing to a gold standard I believe we have for "root causal genes" is integration of molecular QTLs and GWAS, specifically coloc/MR. Here the "E" of RCSP are explicitly represented as SNPs. I don't know if there is good data for AMD but there certainly is for MS. The authors should assess the overlap with their results. Another orthogonal avenue would be to check whether the root causal genes change early in disease progression.

      The available perturb-seq datasets have limitations beyond on the control of the authors. 1) The set of genes that are perturbed. The authors address this by simply sub-setting their analysis to the intersection of genes represented in the perturbation and observational data. However, this may mean that a true ancestor of X is not modeled/perturbed, limiting the formal claims that can be made. Additionally, some proportion of genes that are nominally perturbed show little to no actual perturbation effect (for example, due to poor guide RNA choice) which will also lead to missing ancestors.

      The authors provide no mechanism for statistical inference/significance for their results at either the individual or aggregated level. While I am a proponent of using effect sizes more than p-values, there is still value in understanding how much signal is present relative to a reasonable null.

      I agree with the authors that age coming out of a "root cause" is potentially encouraging. However, it is also quite different in nature to expression, including being "measured" exactly. Will RCSP be biased towards variables that have lower measurement error?

      Finally, it's a stretch to call K562 cells "lymphoblasts". They are more myeloid than lymphoid.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Many thanks to the editors for the reviewing of the revised manuscript.

      We are very grateful to the Reviewers for their time and for the appreciation of the revision.

      We thank the Reviewer 3 for acknowledging the use of sulforhodamine B (SRB) fluorescence as a real-time readout of astrocyte volume dynamics. Experimental data in brain slices were provided to validate this approach.<br /> The incomplete matching of our observation with early reported data in cultured astrocytes (e.g., Solenov et al., AJP-Cell, 2004), might reflect certain of their properties differing from the slice/in vivo counterparts as discussed in the manuscript.<br /> The study (T.R. Murphy et al., Front Cell Neurosci., 2017) showed that AQP4 knockout increased astrocyte swelling extent in response to hypoosmotic solution in brain slices (Fig 9), and discussed '... AQP4 can provide an efficient efflux pathway for water to leave astrocytes.’ Correspondingly, our data suggest that AQP4 mediate astrocyte water efflux in basal conditions.<br /> We have discussed the study (Igarashi et al., NeuroReport 2013); our current data would help to understand the cellular mechanisms underlying the finding of Igarashi et al.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB-loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increases the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in the efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      We thank the reviewer for the insightful comments.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular end feet which all have different AQP4 expressions).

      Following the suggestion, we provide new data on the effect of AQP4 inhibition on spontaneous calcium signals in perivascular astrocyte end-feet. As shown now in Fig.S2, acute application of TGN020 induced Ca2+ oscillations in astrocyte end-feet regions where the GCaMP6 labeling lines the profile of the blood vessel. It is noted that on average, the strength of basal Ca2+ signals in the end-feet is higher than that observed across global astrocyte territories (4.65 ± 0.55 vs. 1.45 ± 0.79, p < 0.01), as does the effect of TGN (8.4 ± 0.62 vs. 6.35 ± 0.97, p < 0.05; Fig S2 vs. Fig 2B). This likely reflects the enrichment of AQP4 in astrocyte end-feet. We describe the data in Fig.S2, and on page 8, line 20 – 23.  

      We now use the transgenic line GLAST-GCaMP6 for cytosolic GCaMP6 expression in astrocytes. Spontaneous calcium signals, reflected by transient fluorescence rises, occur in discrete micro-domains whereas the basal GCaMP6 fluorescence in the soma is weak. In the present condition, it is difficult to unambiguously discriminate astrocyte soma from the highly intermingled processes. 

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for many of the other features of CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling).

      Regarding the features of the CSD swelling, we have performed new analysis to quantify the duration of swelling, speed of swelling and the recovery time from swelling in control condition and in the presence of TGN-020. The new analysis is now summarized in Fig. S5. Blocking AQP4 with TGN-020 increases the swelling speed, prolongs the duration of swelling and slows down the recovery from swelling, confirming our observation that acute inhibition of AQP4 water efflux facilitates astrocyte swelling while restrains shrinking. We describe the result on page 11, line 19-21. 

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water-selective. The authors here present important data showing that the application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with the glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4] have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still, a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as the basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow an influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly, AQP4-dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      We thank the reviewer in acknowledging the significance of our study and the functional implication in brain glymphatic system. We have now highlighted the mentioned studies as well as the potential implication glymphatic fluid circulation (page 4, line 9-10; page 5, line 1-3; and page 19, line 3-10). 

      Reviewer #2 (Public Review):

      Summary:

      The paper investigates the role of astrocyte-specific aquaporin-4 (AQP4) water channel in mediating water transport within the mouse brain and the impact of the channel on astrocyte and neuron signaling. Throughout various experiments including epifluorescence and light sheet microscopy in mouse brain slices, and fiber photometry or diffusion-weighted MRI in vivo, the researchers observe that acute inhibition of AQP4 leads to intracellular water accumulation and swelling in astrocytes. This swelling alters astrocyte calcium signaling and affects neighboring neuron populations. Furthermore, the study demonstrates that AQP4 regulates astrocyte volume, influencing mainly the dynamics of water efflux in response to osmotic challenges or associated with cortical spreading depolarization. The findings suggest that AQP4-mediated water efflux plays a crucial role in maintaining brain homeostasis, and indicates the main role of AQP4 in this mechanism. However authors highlight that the report sheds light on the mechanisms by which astrocyte aquaporin contributes to the water environment in the brain parenchyma, the mechanism underlying these effects remains unclear and not investigated. The manuscript requires revision.

      Strengths:

      The paper elucidates the role of the astrocytic aquaporin-4 (AQP4) channel in brain water transport, its impact on water homeostasis, and signaling in the brain parenchyma. In its idea, the paper follows a set of complimentary experiments combining various ex vivo and in vivo techniques from microscopy to magnetic resonance imaging. The research is valuable, confirms previous findings, and provides novel insights into the effect of acute blockage of the AQP4 channel using TGN-020.

      We thank the reviewer for the constructive comments.

      Weaknesses:

      Despite the employed interdisciplinary approach, the quality of the manuscript provides doubts regarding the significance of the findings and hinders the novelty claimed by the authors. The paper lacks a comprehensive exploration or mention of the underlying molecular mechanisms driving the observed effects of astrocytic aquaporin-4 (AQP4) channel inhibition on brain water transport and brain signaling dynamics. The scientific background is not very well prepared in the introduction and discussion sections. The important or latest reports from the field are missing or incompletely cited and missconcluded. There are several citations to original works missing, which would clarify certain conclusions. This especially refers to the basis of the glymphatic system concept and recently published reports of similar content. The usage of TGN-020, instead of i.e. available AER-270(271) AQP4 blocker, is not explained. While employing various experimental techniques adds depth to the findings, some reasoning behind the employed techniques - especially regarding MRI - is not clear or seemingly inaccurate. Most of the time the number of subjects examined is lacking or mentioned only roughly within the figure captions, and there are lacking or wrongly applied statistical tests, that limit assessment and reproducibility of the results. In some cases, it seems that two different statistical tests were used for the same or linked type of data, so the results are contradictory even though appear as not likely - based on the figures. Addressing these limitations could strengthen the paper's impact and utility within the field of neuroscience, however, it also seems that supplementary experiments are required to improve the report.

      The current data hint at a tonic water efflux from astrocyte AQP4 in physiological condition, which helps to understand brain water homeostasis and the functional implication for the glymphatic system. The underlying molecular and cellular mechanisms appear multifaceted and functionally interconnected, as discussed (page 14 line 8 –page 15, line 3). We agree that a comprehensive exploration will further advance our understanding.

      The introduction and discussion are now strengthened by incorporating the important advances in glymphatic system while highlighting the relevant studies. 

      The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies including the use of heterologous expression system and the AQP4 KO mice. The validation of AER-270(271, the water soluble prodrug) using AQP4 KO mice is reported recently (Giannetto et al., 2024). AER-271 was noted to impact brain water ADC (apparent diffusion coefficient evaluated by diffusion-weighted MRI) in AQP4 KO mice ~75 min after the drug application (Giannetto et al., 2024). This likely reflects that AER270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ) whose inhibition could reduce CNS water content independent of AQP4 targeting (Salman et al., 2022). In addition, the inhibition efficiency of AER-270(271) seems lower than TGN-020 (Farr et al., 2019; Giannetto et al., 2024; Huber et al., 2009; Salman et al., 2022). We have now supplemented this information in the manuscript (page 7, line 1-6 and page15, line 7-17).

      The description on the DW-MRI is now updated (page 4, line 10-14). 

      We also performed new experiments and data analysis as described in a point-to-point manner below in the section ‘Recommendations For The Authors’.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine fluorescence as the proxy for cell volume dynamics. Using this approach, they perform a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume in response to AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key finding is that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume regulation after spreading depolarizations. Additionally, systemic application of TGN-020 produced changes in diffusion-weighted MRI signal, which the authors interpret as cellular swelling. This study is perceived as potentially significant. However, several technical caveats should be strongly considered and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically elegant study, in which the authors employed a number of complementary ex vivo and in vivo techniques to explore functional outcomes of aquaporin inhibition. The presented data are potentially highly significant (but see below for caveats and questions related to data interpretation).

      (2) The authors go beyond measuring cell volume homeostasis and probe for the functional significance of AQP4 inhibition by monitoring Ca2+ signaling in neurons and astrocytes (GCaMP6 assay).

      (3) Spreading depolarizations represent a physiologically relevant model of cellular swelling. The authors use ChR2 optogenetics to trigger spreading depolarizations. This is a highly appropriate and much-appreciated approach.

      We thank the reviewer for the effort in evaluating our work.

      Weaknesses:

      (1) The main weakness of this study is that all major conclusions are based on the use of one pharmacological compound. In the opinion of this reviewer, the effects of TGN-020 are not consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically: Genetic deletion of AQP4 in astrocytes reduces plasmalemmal water permeability by ~two-three-fold (when measured a 37oC, Solenov et al., AJP-Cell, 2004). This is a significant difference, but it is thought to have limited/no impact on water distribution. Astrocytic volume and the degree of anisosmotic swelling/shrinkage are unchanged because the water permeability of the AQP4null astrocytes remains high. This has been discussed at length in many publications (e.g., MacAulay et al., Neuroscience, 2004; MacAulay, Nat Rev Neurosci, 2021) and is acknowledged by Solenov and Verkman (2004).

      Keeping this limitation in mind, it is important to validate astrocytic cell volume changes using an independent method of cell volume reconstruction (diameter of sulforhodamine-labeled cell bodies? 3D reconstruction of EGFP-tagged cells? Else?)

      Solenov and coll. used the calcein quenching assay and KO mice demonstrating AQP4 as a functional water channel in cultured astrocytes (Solenov et al., 2004). AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data from the pharmacological acute blocking. This discrepancy may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices or in vivo results as suggested previously (Risher et al., 2009). 

      Soma diameter might be an indicator of cell volume change, yet it is challenging with our current fluorescence imaging method that is diffraction-limited and insufficient to clearly resolve the border of the soma in situ. In addition, the lateral diameter of cell bodies may not faithfully reflect the volume changes that can occur in all three dimensions. Rapid 3D imaging of astrocyte volume dynamics with sufficient high Z-axis resolution appears difficult with our present tools. 

      We have now accordingly updated the discussion with relevant literatures being cited (page 17 line 14 – page 18, line 3).

      (2) TGN-020 produces many effects on the brain, with some but not all of the observed phenomena sensitive to the genetic deletion of AQP4. In the context of this work, it is important to note that TGN020 does not completely inhibit AQP4 (70% maximal inhibition in the original oocyte study by Huber et al., Bioorg Med Chem, 2009). Thus, besides not knowing TGN-020 levels inside the brain, even

      "maximal" AQP4 inhibition would not be expected to dramatically affect water permeability in astrocytes.

      This caveat may be addressed through experiments using local delivery of structurally unrelated AQP4 blockers, or, preferably, AQP4 KO mice.

      It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged. We mention this now on page 15, line 7-9 and 14-17.

      We agree that local delivery of an alternative blocker will provide additional information. Meanwhile, local delivery requires the stereotaxic implantation of cannula, which would cause inflammations to surrounding astrocytes (and neurons). The recently introduced AQP4 blocker AER-270(271) has received attention that it influences brain water dynamics (ADC in DW-MRI) in AQP4 KO mice (Giannetto et al., 2024), recalling that AER-270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ). This pathway can potentially perturb CNS water content and influence brain fluid circulation, in an AQP4independent manner (Salman et al., 2022). The inhibition efficiency on mouse AQP4 of AER-270 (~20%, Farr et al., 2019; Salman et al., 2022) appears lower than TGN-020 (~70%, Huber et al., 2009).

      We chose to use the pharmacological compound to achieve acute blocking of AQP4 thereby avoiding the chronic genetics-caused alterations in brain structural, functional and water homeostasis. Multiple lines of evidence including the recent study (Gomolka et al., 2023), have shown that AQP4 KO mice alters brain water content, extracellular space and cellular structures, which raises concerns to use the transgenic mouse to pinpoint the physiological functions of the AQP4 water channel. 

      We have now mentioned the concerns on AQP4 pharmacology by supplementing additional literatures in the field (page 15, line 8-18). 

      (3) This reviewer thinks that the ADC signal changes in Figure 5 may be unrelated to cellular swelling. Instead, they may be a result of the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes in water fluxes across pia matter which is highly enriched in AQP4. To amplify this concern, AQP4 KO brains have increased water mobility due to enlarged interstitial spaces, rather than swollen astrocytes (RS Gomolka, eLife, 2023). Overall, the caveats of interpreting DW-MRI signal deserve strong consideration.

      The previous observation show that TGN-020 increases regional cerebral blood flow in wild-type mice but not in AQP4 KO mice (Igarashi et al., 2013). Our current data provide a possible mechanism explanation that TGN-020 blocking of astrocyte AQP4 causes calcium rises that may lead to vasodilation as suggested previously (Cauli and Hamel, 2018). We now add updates to the discussion on page 15, line 3-7.

      We are in line with the reviewer regarding the structural deviations observed with the AQP4 KO mice

      (Gomolka et al., 2023), now mentioned on page 19, line 3-5. Following the Reviewer’s suggestion, we have also updated the interpretation of the DW-MRI signal and point that in addition to being related to the astrocyte swelling, the ADC signal changes may also be caused by indirect mechanisms, such as the transient upregulation of other water-permeable pathways in compensating AQP4 blocking. We now describe this alternative interpretation and the caveats of the DW-MRI signals (page 20, line 1-8). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Private recommendations

      My more broad experimental suggestions are in the "weaknesses" section. Some minor points that would improve the manuscript are included below:

      (1) A more detailed explanation for why SRB fluorescence reflects the astrocyte volume changes, whereas typical intracellular GFP does not.

      As an engineered fluorescence protein, the GFP has been used to tag specific type of cells. Meanwhile, as a relatively big protein (MW, 26.9 kDa), the diffusion rate of EGFP is expected to be much less than SRB, a small chemical dye (MW, 558.7 Da). Also, the IP injection of SRB enables geneticsless labeling of brain astrocytes, so to avoid the influence of protein overexpression on astrocyte volume and water transport responses. We have now stated this point in the manuscript (page 13, line 21 – page 14, line 4).

      (2) Figure 1 panel B should have clear labels on the figure and a description in the legend to delineate which part of the panel refers to hyper- or hypo-osmotic treatment.

      We have now updated the figure and the legend.  

      (3) For Figure 2, what is the rationale for analyzing the calcium signaling data between the cell types differently?

      We analyzed calcium micro-domains for astrocytes as their spontaneous signals occur mainly in discrete micro-domains (Shigetomi et al., 2013). While for neurons, we performed global analysis by calculating the mean fluorescence of imaging field of view, because calcium signal changes were only observed at global level rather than in micro-domains. This information is now included (page 24, line1820).

      (4) For Figure 3, the authors mention that TGN-020 likely caused swelling prior to the hypotonic solution administration. Do they have any measurements from these experiments prior to the TGN-020 application to use as a "true baseline" volume?

      The current method detects the relative changes in astrocyte volume (i.e., transmembrane water transport), which nevertheless is blind to the absolute volume value. We have no readout on baseline volumes.  

      (5) For Figures 3 and 4, did the authors see any evidence for regulatory volume decrease? And is this impaired by TGN-020? It is a well-characterized phenomenon that astrocytes will open mechanosensitive channels to extrude ions during hypo-osmotic induced swelling. This process is dependent on AQP4 and calcium signaling [5]

      Mola and coll. provided important results demonstrating the role of AQP4 in astrocyte volume regulation (Mola et al., 2016). In the present study in acute brain slices, when we applied hypotonic solution to induce astrocyte swelling, our protocol did not reveal rapid regulatory volume decrease (e.g., Fig. 3D). When we followed the volume changes of SRB-labeled astrocytes during optogenetically induced CSD, we observed the phase of volume decrease following the transient swelling (Fig. 4F), where the peak amplitude and the degree of recovery were both reduced by inhibiting AQP4 with TGN020. These data imply that regulatory astrocyte volume decrease may occur in specific conditions, which intriguingly has been suggested to be absent in brain slices and in vivo (e.g., Risher et al., 2009). We have not specifically investigated this phenomenon, and now briefly discuss this point on page18 line 6-14.

      (6) Figure 5 box plots do not show all data points, could the authors modify to make these plots show all the animals, or edit the legend to clarify what is plotted?

      We have now updated the plot and the legend. This plot is from all animals (n = 7 per condition).

      (7) pg. 9 line 6, there is a sentence that seems incomplete or otherwise unfinished. "We first followed the evoked water efflux and shrinking induced by hypertonic solution while."

      Fixed (now, page 9 line 17-18). 

      (8)  During the discussion on pg 13 line 11, it may be more clear to describe this as the cotransport of water into the cells with ions/metabolites as reviewed by Macaulay 2021 [6].

      We agree; the text is modified following this suggestion (now page14, line 12-13).  

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      (5) Mola, M., et al., The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia, 2016. 64(1).

      (6) MacAulay, N., Molecular mechanisms of brain water transport. Nat Rev Neurosci, 2021. 22(6): p. 326-344.

      We thank the reviewer. These important literatures are now supplemented to the manuscript together with the corresponding revisions.

      Reviewer #2 (Recommendations For The Authors):

      In its concept, the paper is interesting and provides additional value - however, it requires revision.

      Below, I provide the following remarks for the following sections/ pages/lines:

      ABSTRACT/page 2 (remarks here refer to the rest of the manuscript, where these sentences are repeated):

      - It seems that the 'homeostasis' provides not only physical protection, but also determines the diffusion of chemical molecules...' Please correct the sentence as it is grammatically incorrect.

      It is now corrected (page 2, line 1).

      - The term 'tonic water' is not clear. I understand, after reading the paper, that it is about tonicity of the solutes injected into the mouse.

      We use the term ‘tonic’ to indicate that in basal conditions, a constant water efflux occurs through the APQ4 channel.

      - 'tonic aquaporin water efflux maintains volume equilibrium' - I believe it is about maintaining volume and osmotic equilibrium?

      This description is now refined (now page 2, line 10).

      - It is not clear whether the tonic water outflow refers to the cellular level or outflow from the brain parenchyma (i.e., glymphatic efflux)

      It refers to the cellular level. 

      INTRODUCTION/page 3:

      - 'clearance of waste molecules from the brain as described in the glymphatic system' - The original papers describing the phenomena are not cited: Iliff et al. 2012, 2013, Mestre et al. 2018, as well as reviews by Nedergaard et al.

      Indeed. We have now cited these key literatures (now page 4, line 10).

      - 'brain water diffusion is the basis for diffusion-weighted magnetic resonance imaging (DW-MRI)' - The statement is wrong. it is the mobility of the water protons that DWI is based on, but not the diffusion of molecules in the brain. This should be clarified and based on the DW-MRI principle and the original works by Le Bihan from 1986, 1988, or 2015.

      This sentence is now updated (page 4, line10-14).

      - Similarly, I suggest correcting or removing the citations and the sentence part regarding the clinical use of DWI, as it has no value here. Instead, it would be worth mentioning what actually ADC reflects as a computational score, and what were the results from previous studies assessing glymphatic systems using DWI. This is especially important when considering the mislocalization of the AQP4 channel.

      We now states recent studies using DW-MRI to evaluate glymphatic systems (page 4, line16-17).  

      - 'In the brain, AQP4 is predominantly expressed in astrocytes'-please review the citations. I suggest reading the work by Nielsen 1997, Nagelhus 2013, Wolburg 2011, and Li and Wang from 2017. To my best knowledge, in the brain AQP4 is exclusively expressed in astrocytes.

      Thanks for the reviewer. It is described that while enriched in astrocytes, AQP4 is also expressed in ependymal cells lining the ventricles (e.g., (Mayo et al., 2023; Verkman et al., 2006)). ‘predominantly’ is now removed (page 4, line 21).

      - The conclusion: ' Our finding suggests that aquaporin acts as a water export route in astrocytes in physiological conditions, so as to counterbalance the constitutive intracellular water accumulation caused by constant transmitter and ion uptake, as well as the cytoplasmic metabolism processes. This mechanism hence plays a necessary role in maintaining water equilibrium in astrocytes, thereby brain water homeostasis' seems to be slightly beyond the actual findings in the paper. I suggest clarifying according to the described phenomena.

      We have now refined the conclusion sticking to the experimental observations (page 5, line16-18).

      - The introduction lacks important information on existing AQP4 blockers and their effects, pros and cons on why to use TGN-020. Among others, I would refer to recent work by Giannetto et al 2024, as well as previous work of Mestre et al. 2018 and Gomolka et al. 2023.

      We initiated the study by using TGN-020 as an AQP4 blocker because it has been validated by wide range of ex vivo and in vivo studies as documented in the text (page 7, line 1-6). We also update discussions on the recent advances in validating the AQP4 blocker AER-270(271) while citing the relevant studies (page 15, line 7-17).  

      RESULTS:

      - Page 5, lines 19-20: '...transport, we performed fluorescence intensity translated (FIT) imaging.' - this term was never introduced in the methods so it is difficult for the reader to understand it at first sight. -'To this end,' - it is not clear which action refers to 'this'. (is it about previous works or the moment that the brain samples were ready for imaging? Please clarify, as it is only starting to be clear after fully reading the methods.

      We now refine the description give the principle of our imaging method first, then explain the technical steps. To avoid ambiguity, the term ‘To this end’ is removed. The updated text is now on page 6, line 1-3.  

      - From page 6 onwards - all references to Figures lack information to which part of the figure subpanel the information refers (top/middle bottom or left/middle/right).

      We apologize. The complementary indication is now added for figure citations when applicable.  

      - 'whereas water export and astrocyte shrinking upon hyperosmotic manipulation increased astrocyte fluorescence (Figure 1B). Hence, FIT imaging enables real-time recording of astrocyte transmembrane water transport and volume dynamics.' - this part seems to be undescribed or not clear in the methods.

      We have now refined this description (page 6, line 19-20).

      - Page 6, lines 17-22: TGN-020. In addition to the above, I suggest familiarizing also with the following works by Igarashi 2011. doi: 10.1007/s10072-010-0431-1, and by Sun 2022. doi: 10.3389/fimmu.2022.870029.

      These studies are now cited (page 7, line 3-4).

      - Page 7: ' AQP4 is a bidirectional channel facilitating... ' - AQP4 water channel is known as the path of least resistance for water transfer, please see Manley, Nature Medicine, 2000 and Papadopoulos, Faseb J, 2004.

      This sentence is now updated (page 7, line 12-13).

      - ' astrocyte AQP4 by TGN-020 caused a gradual decrease in SRB fluorescence intensity, indicating an intracellular water accumulation' - tissue slice experiment is a very valuable method. However it seems right, the experiment does not comment on the cell swelling that may occur just due to or as a superposition of tissue deterioration and the effect of TGN-020. The AQP4 channel is blocked, and the influx of water into astrocytes should be also blocked. Thus, can swelling be also a part of another mechanism, as it was also observed in the control group? I suggest this should be addressed thoroughly.

      We performed this experiment in acute brain slices to well control the pharmacological environment and gain spatial-temporal information. Post slicing, the brain slices recovered > 1hr prior to recording, so that the slices were in a stable state before TGN-020 application as evidenced by the stable baseline. The constant decrease in the control trace is due to photobleaching which did not change its curve tendency in response to vehicle. TGN-020, in contrast, caused a down-ward change suggesting intracellular water accumulation and swelling. 

      The experiment was performed at basal condition without active water influx; a decrease in SRB fluorescence hints astrocyteintracellular water buildup. This result shows that in basal condition, astrocyte aquaporin mediates a constant (i.e., tonic) water efflux; its blocking causes intracellular water accumulation and swelling. 

      We have accordingly updated the description of this part (page 7, line 15-20).

      - From the Figure 1 legend: Only 4 mice were subjected to the experiment, and only 1 mouse as a control. I suggest expanding the experiment and performing statistics including two-way ANOVA for data in panels B, C, and D, as no results of statistical tests confirm the significance of the findings provided.

      The panel B confirms that cytosolic SRB fluorescence displays increasing tendency upon water efflux and volume shrinking, and vice versa. As for the panel C, the number of mice is now indicated. Also, the downward change in the SRB fluorescence was now respectively calculated for the phases prior and post to TGN (and vehicle) application, and this panel is accordingly updated. TGN-020 induced a declining in astrocyte SRB fluorescence, which is validated by t-test performed in MATLAB. To clarify, we now add cross-link lines to indicate statistical significance between the corresponding groups (Fig 1C, middle). As for panel D, we calculated the SRB fluorescence change (decrease) relative to the photobleaching tendency illustrated by the dotted line. The significance was also validated by t-test performed in MATLAB.  

      - Figure 1: Please correct the figure - pictures in panel A are low quality and do not support the specificity of SRB for astrocytes. Panels B-D are easier to understand if plotted as normal X/Y charts with associated statistical findings. Some drawings are cut or not aligned.

      In GFAP-EGFP transgenic, astrocytes are labeled by EGFP. SRB labeling (red fluorescence) shows colocalization with EGFP-positive astrocytes, meanwhile not all EGFP-positive astrocytes are labeled by SRB. The PDF conversion procedure during the submission may also somehow have compromised image quality. We have tried to update and align the figure panels.  

      - Page 12: ' TGN-020 increased basal water diffusion within multiple regions including the cortex,

      hippocampus and the striatum in a heterogeneous manner (Figure 5C).'

      This sentence is updated now (page 12, line 12 – page13, line 2). It reads ‘The representative images reveal the enough image quality to calculate the ADC, which allow us to examine the effect of TGN-020 on water diffusion rate in multiple regions (Fig. 5C).’

      - The expression of AQP4 within the brain parenchyma is known to be heterogenous. Please familiarize yourself with works by Hubbard 2015, Mestre 2018, and Gomolka 2023. A correlation between ADC score and AQP4 expression ROI-wise would be useful, but it is not substantial to conduct this experiment.

      We thank the reviewer. This point is stressed on page 19, line 12-14.

      DISCUSSION:

      - Most of the issues are commented on above, so I suggest following the changes applied earlier. -Page 16: 'We show by DW-MRI that water transport by astrocyte aquaporin is critical for brain water homeostasis.' This statement is not clear and does not refer to the actual impact of the findings. DWI is allowed only to verify the changes of ADC fter the application of TGN-020. I suggest commenting on the recent report by Giannetto 2024 here.

      This sentence is now refined (page 19, line 1-2), followed by the updates commenting on the recent studies employing DW-MRI to evaluate brain fluid transport, including the work of (Giannetto et al., 2024) (page 19, line 3-10). 

      METHODS:

      - Page 18: no total number of mice included in all experiments is provided, as well as no clearly stated number of mice used in each experiment. Please correct.

      We have now double checked the number of the mice for the data presented and updated the figure legends accordingly (e.g., updates in legends fig1, fig5, etc).

      -  Page 18, line 7: 'Axscience' is not a producer of Isoflurane, but a company offering help with scientific manuscript writing. If this company's help was used, it should be stated in the acknowledgments section. Reference to ISOVET should be moved from line 15 to line 7.

      We apologize. We did not use external writing help, and now have removed the ‘Axcience’. The Isoflurane was under the mark ‘ISOVET’ from ‘Piramal’. This info is now moved up (page 21, line 11). 

      - Page 18, line 9: ' modified artificial cerebrospinal fluid (aCSF)'. Additional information on the reason for the modified aCSF would be useful for the reader.

      In this modified solution, the concentration of depolarizing ions (Na+, Ca2+) was reduced to lower the potential excitotoxicity during the tissue dissection (i.e., injury to the brain) for preparing the brain slices. Extra sucrose was added to balance the solution osmolarity. This solution has been used previously for the dissection and the slicing steps in adult mice (Jiang et al., 2016). We now add this justification in the text and quote the relevant reference (page 21, line14-16). 

      - Page 19, line 6: a reasoning for using Tamoxifen would be helpful for the reader.

      The Glast-CreERT2 is an inducible conditional mouse line that expresses Cre recombinase selectively in astrocytes upon tamoxifen injection. We now add this information in the text (page 22, line 10-11). 

      - Line 8 - 'Sigma'

      Fixed.

      - Line 7/8: It is not clear if ethanol is of 10% solution or if proportions of ethanol+tamoxifen to oil were of 1:9. The reasoning for each performed step is missing.

      We have now clarified the procedure (page 22, line 11-15).

      - Line 10: '/' means 'or'?

      Here, we mean the bigenic mice resulting from the crossing of the heterozygous Cre-dependent GCaMP6f and Glast-CreERT2 mouse lines. We now modify it to ‘Glast-CreERT2::Ai95GCaMP6f//WT’, in consistence with the presentation of other mouse lines in our manuscript (page 22, line 16).

      - Lines 22-23: being in-line with legislation was already stated at the beginning of the Methods so I suggest combining for clearance.

      Done. 

      - Page 21, line 4: it is good to mention which printer was used, but it would be worth mentioning the material the chamber was printed from - was it ABS?

      Yes. We add this info in the text now (page 24, line 5).

      - Line 9 -'PI' requires spelling out.

      It is ‘Physik Instrumente’, now added (page 24, line 10).

      - Line 11-12: What is the reason for background subtraction - clearer delineation of astrocytes/ increasing SNR in post-processing, or because SRB signal was also visible and changing in the background over time? Was the background removed in each frame independently (how many frames)? How long was the time-lapse and was the F0 frame considered as the first frame acquired? The background signal should be also measured and plotted alongside the astrocytic signal, as a reference (Figure 1). This should be clarified so that steps are to be followed easily.

      We sought to follow the temporal changes in SRB fluorescence signal. The acquired fluorescent images contain not only the SRB signals, but also the background signals consisting of for instance the biological tissue autofluorescence, digital camera background noise and the leak light sources from the environments. The value of the background signal was estimated by the mean fluorescence of peripheral cell-free subregions (15 × 15 µm²) and removed from all frames of time-lapse image stack. The traces shown in the figures reflect the full lengths of the time-lapse recordings. F0 was identified as the mean value of the 10 data points immediately preceding the detected fluorescence changes. The text is now updated (page 24 line 21 - page 25 line 5).

      - Line 15: Was astrocyte image delineation performed manually or automatically? Where was the center of the region considered in the reference to the astrocyte image? It would be good to see the regions delineated for reference.

      Astrocytes labeled by SRB were delineated manually with the soma taken as the center of the region of interest. We now exemplify the delineated region in Fig 1A, bottom.

      - Page 22, line 2: 'x4 objective'.

      Added (now, page 25, line 16). 

      - Line 3: 'barrels' - reference to publication or the explanation missing.

      The relevant reference is now added on barrel cortex (Erzurumlu and Gaspar, 2020) (page 25, line 19-20). 

      - Line 19: were the coordinates referred to = bregma?

      Yes. This info is now added (page 26, line 12). 

      - Line 20: was the habituation performed directly at the acquisition date? It is rather difficult to say that it was a habituation, but rather acute imaging. I suggest correcting, that mice were allowed to familiarize themselves with the setup for 30 minutes prior to the imaging start.

      In this context, although it is a very nice idea and experiment, the influence of acute stress in animals familiar with the setup only from the day of acquisition is difficult to avoid. It is a major concern, especially when considering norepinephrine as a master driver of neuronal and vascular activity through the brain, and strong activation of the hypothalamic-adrenal axis in response to acute stress. It is well known, that the response of monoamines is reduced in animals subjected to chronic v.s acute stress, but still larger than that if the stressor is absent.

      Major remark: The animals should, preferably, be imaged at least after 3 days of habituation based on existing knowledge. I suggest exploring the topic of the importance of habituation. It is difficult though, to objectively review these findings without considering stress and associated changes in vascular dynamics.

      Many thanks for the reviewer to help to precise this information. The text is accordingly updated to describe the experiment (now page 26, line 14). 

      - Page 23, line 17: number of animals included in experiments missing.

      The number of animals is added in Methods (page 27, line 12) and indicated in the legend of Figure 5. 

      - Line 18/19: were the respiratory effects observed after injection of saline or TGN-020? Since DWI was performed, the exclusion of perfusive flow on ADC is impossible.

      I suggest an additional experiment in n=3 animals per group, verifying the HR (and if possible BP) response after injection of TGN-020 and saline in mice.

      The respiratory rate has been recorded. We added the averaged respiratory rate before and after injection of TGN-020 or saline (now, Fig. S6; page 13, line 5-6).

      - Line 22: Please, provide the model of the scanner, the model of the cryoprobe, as well as the model of the gradient coil used, otherwise it is difficult to assess or repeat these experiments.

      We have now added the information of MRI system in Methods section (page 27, line17-21).

      - Page 24: line 3/4: although the achieved spatial resolution of DWI was good and slightly lower than desired and achievable due to limitations of the method itself as well as cryoprobe, it is acceptable for EPI in mice.

      Still, there is no direct explanation provided on the reasoning for using surface instead of volumetric coil, as well as on assuming an anisotropic environment (6 diffusion directions) for DWI measurements. This is especially doubtful if such a long echo-time was used alongside lower-thanpossible spatial resolution. Longer echo time would lower the SNR of the depicted signal but also would favor the depiction of signal from slow-moving protons and larger water pools. On the other hand, only 3 b-values were used, which is the minimum for ADC measurements, while a good research protocol could encompass at least 5 to increase the accuracy of ADC estimation and avoid undersampling between 250 and 1800 b-values. What was the reason for choosing this particular set of b-values and not 50, 600, and 2000? Besides, gradient duration time was optimally chosen, however, I have concerns about the decision for such a long gradient separation times.

      If the protocol could have been better optimized, the assessment could have been also performed in respiratory-gated mode, allowing minimization of the effects of one of the glymphatic system driving forces.

      Thus, I suggest commenting on these issues.

      We chose the cryoprobe to increase the signal-to-noise ratio (SNR) in DW-MRI with long echo-time and high b-value. The volume coil has a more homogeneous SNR in the whole brain rather than the cryoprobe, but SNR should be reduced compared with cryoprobe. We confirmed that, even at the ventral part of the brain, the image quality of DW-MRI images was enough to investigate the ADC with cryoprobe (Fig. 5B-C). This is mentioned now in Methods (page 27, line 17-21).

      We performed DW-MRI scanning for 5 min at each time-point using the condition of anisotropic resolution and 3 b-values, to investigate the time-course of ADC change following the injection of TGN020. Because the effect of TGN-020 appears about dozen of minutes post the injection (Igarashi et al., 2011), fast DW-MRI scanning is required. If isotropic DW-MRI with lower echo-time and more direction is used, longer scan time at each time point is required, maybe more than 1h. We agree that three bvalues is minimum to calculate the ADC and more b-values help to increase the accuracy. However, to achieve the temporal resolution so as to better catch the change of water diffusion, we have decided to use the minimum b-values. The previous study also validates the enough accuracy of DW-MRI with three b-values (Ashoor et al., 2019). Furthermore, previous study that used long diffusion time (> 20 ms) and long echo time (40 ms) shows the good mean diffusivity (Aggarwal et al., 2020), supporting that our protocol is enough to investigate the ADC. We have now updated the description (page 28 line 5-9).  The reason why we choose the b = 250 and 1800 s/mm² is that 2000 s/mm² seems too high to get the good quality of image. In the previous study, we have optimized that ADC is measurable with b = 0, 250, and 1800 s/mm² (Debacker et al., 2020). 

      - Page 24, line 7: What was the post-processing applied for images acquired over 70 minutes? Did it consider motion-correction, co-registration, or drift-correction crucial to avoid pitfalls and mismatches in concluding data?

      The motion correction and co-registration were explained in Methods (page 28, line 12-14).

      Also, were these trace-weighted images or magnitude images acquired since DTI software was used for processing - while ADC fitting could be reliably done in Matlab, Python, or other software. Thus, was DSI software considering all 3 b-values or just used 0 and 1800 for the calculation of mean diffusivity for tractography (as ADC). The details should be explained.

      DSIstudio was used with all three b values (b = 0, 250, and 1800 s/mm²) to calculate the ADC. We added the description in Methods (page 28, line 16-18).

      To make sure that the results are not affected by the MR hardware, I suggest performing 3 control measurements in a standard water phantom, and presenting the results alongside the main findings.

      Thanks for this suggestion. We have performed new experiments and now added the control measurement with three phantoms, that is water, undecane, and dodecane. These new data are summarized now in Fig. S7, showing the stability of ADC throughout the 70 min scanning. We have updated the description on Method part (page 28, line 9-11) and on the Results (page 13, line 6-8).  

      - Line 13: were the ROI defined manually or just depicted from previously co-registered Allen Brain atlas?

      The ROIs of the cortex, the hippocampus, and the striatum were depicted with reference to Allen mouse brain atlas (https://scalablebrainatlas.incf.org/mouse/ABA12). This is explained in Methods (page 28, line 14-16).

      - Line 10: why the average from 1st and 2nd ADC was not considered, since it would reduce the influence of noise on the estimation of baseline ADC?

      We are sorry that it was a typo. The baseline was the average between 1st and 2nd ADC. We corrected the description (page 28, line 20).

      STATISTIC:

      Which type of t-test - paired/unpaired/two samples was used and why? Mann-Whitney U-tets are used as a substitution for parametric t-tests when the data are either non-parametric or assuming normal distribution is not possible. In which case Bonferroni's-Holm correction was used? - I couldn't find any mention of any multiple-group analysis followed by multiple comparisons. Each section of the manuscript should have a description of how the quantitative data were treated and in which aim. I suggest carefully correcting all figures accordingly, and following the remarks given to the Figure 1.

      We used unpaired t-test for data obtained from samples of different conditions. Indeed, MannWhitney U-test is used when the data are non-parametric deviating from normal distributions.  Bonferroni-Holm correction was used for multiple comparisons (e.g., Fig. 4D-E).

      Reviewer #3 (Recommendations For The Authors):

      I think that the following statement is insufficient: "The authors commit to share data, documentation, and code used in analysis". My understanding is eLife expects that all key data to be provided in a supplement.

      We thank the reviewer; we follow the publication guidelines of eLife. 

      References

      Aggarwal, M., Smith, M.D., and Calabresi, P.A. (2020). Diffusion-time dependence of diffusional kurtosis in the mouse brain. Magn Reson Med 84, 1564-1578.

      Ashoor, M., Khorshidi, A., and Sarkhosh, L. (2019). Estimation of microvascular capillary physical parameters using MRI assuming a pseudo liquid drop as model of fluid exchange on the cellular level. Rep Pract Oncol Radiother 24, 3-11.

      Cauli, B., and Hamel, E. (2018). Brain Perfusion and Astrocytes. Trends in neurosciences 41, 409-413.

      Debacker, C., Djemai, B., Ciobanu, L., Tsurugizawa, T., and Le Bihan, D. (2020). Diffusion MRI reveals in vivo and non-invasively changes in astrocyte function induced by an aquaporin-4 inhibitor. PLoS One 15, e0229702.

      Erzurumlu, R.S., and Gaspar, P. (2020). How the Barrel Cortex Became a Working Model for Developmental Plasticity: A Historical Perspective. J Neurosci 40, 6460-6473.

      Farr, G.W., Hall, C.H., Farr, S.M., Wade, R., Detzel, J.M., Adams, A.G., Buch, J.M., Beahm, D.L., Flask, C.A., Xu, K., et al. (2019). Functionalized Phenylbenzamides Inhibit Aquaporin-4 Reducing Cerebral Edema and Improving Outcome in Two Models of CNS Injury. Neuroscience 404, 484-498.

      Giannetto, M.J., Gomolka, R.S., Gahn-Martinez, D., Newbold, E.J., Bork, P.A.R., Chang, E., Gresser, M., Thompson, T., Mori, Y., and Nedergaard, M. (2024). Glymphatic fluid transport is suppressed by the aquaporin-4 inhibitor AER-271. Glia.

      Gomolka, R.S., Hablitz, L.M., Mestre, H., Giannetto, M., Du, T., Hauglund, N.L., Xie, L., Peng, W., Martinez, P.M., Nedergaard, M., et al. (2023). Loss of aquaporin-4 results in glymphatic system dysfunction via brain-wide interstitial fluid stagnation. eLife 12.

      Huber, V.J., Tsujita, M., and Nakada, T. (2009). Identification of aquaporin 4 inhibitors using in vitro and in silico methods. Bioorg Med Chem 17, 411-417.

      Igarashi, H., Huber, V.J., Tsujita, M., and Nakada, T. (2011). Pretreatment with a novel aquaporin 4 inhibitor, TGN-020, significantly reduces ischemic cerebral edema. Neurol Sci 32, 113-116.

      Igarashi, H., Tsujita, M., Suzuki, Y., Kwee, I.L., and Nakada, T. (2013). Inhibition of aquaporin-4 significantly increases regional cerebral blood flow. Neuroreport 24, 324-328.

      Jiang, R., Diaz-Castro, B., Looger, L.L., and Khakh, B.S. (2016). Dysfunctional Calcium and Glutamate Signaling in Striatal Astrocytes from Huntington's Disease Model Mice. J Neurosci 36, 3453-3470.

      Mayo, F., Gonzalez-Vinceiro, L., Hiraldo-Gonzalez, L., Calle-Castillejo, C., Morales-Alvarez, S., Ramirez-Lorca, R., and Echevarria, M. (2023). Aquaporin-4 Expression Switches from White to Gray Matter Regions during Postnatal Development of the Central Nervous System. Int J Mol Sci 24.

      Mola, M.G., Sparaneo, A., Gargano, C.D., Spray, D.C., Svelto, M., Frigeri, A., Scemes, E., and Nicchia, G.P. (2016). The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia 64, 139-154.

      Risher, W.C., Andrew, R.D., and Kirov, S.A. (2009). Real-time passive volume responses of astrocytes to acute osmotic and ischemic stress in cortical slices and in vivo revealed by two-photon microscopy. Glia 57, 207-221.

      Salman, M.M., Kitchen, P., Yool, A.J., and Bill, R.M. (2022). Recent breakthroughs and future directions in drugging aquaporins. Trends Pharmacol Sci 43, 30-42.

      Shigetomi, E., Bushong, E.A., Haustein, M.D., Tong, X., Jackson-Weaver, O., Kracun, S., Xu, J., Sofroniew, M.V., Ellisman, M.H., and Khakh, B.S. (2013). Imaging calcium microdomains within entire astrocyte territories and endfeet with GCaMPs expressed using adeno-associated viruses. J Gen Physiol 141, 633-647.

      Solenov, E., Watanabe, H., Manley, G.T., and Verkman, A.S. (2004). Sevenfold-reduced osmotic water permeability in primary astrocyte cultures from AQP-4-deficient mice, measured by a fluorescence quenching method. Am J Physiol Cell Physiol 286, C426-432.

      Verkman, A.S., Binder, D.K., Bloch, O., Auguste, K., and Papadopoulos, M.C. (2006). Three distinct roles of aquaporin-4 in brain function revealed by knockout mice. Biochim Biophys Acta 1758, 10851093.

    2. eLife Assessment

      In this work, the authors propose that astrocytic aquaporin 4 (AQP4) is the main pathway for tonic water efflux, without which astrocytes undergo cell swelling. These findings are important, because they shed light on key molecular mechanisms implicated with the regulation of brain water homeostasis. The authors use a broad set of experimental tools (e.g., acute brain slices, in vivo recording, and diffusion-weighted MRI) but the evidence remains incomplete without ruling out non-specific effects of TGN-020, and without evidence that changes in sulforhodamine B fluorescence can be used as reliable readouts of cell volume dynamics.

    3. Reviewer #1 (Public review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increased the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular endfeet which all have different AQP4 expression).

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for much of the other features of the CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling)

      Comments on revised version:

      The authors have addressed these suggestions as additional supplementary figures. Notably they find increased calcium signaling and stronger inhibition of calcium signaling by TGN-020 in astrocytic endfeet, where AQP4 is enriched.

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water selective. The authors here present important data showing that application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4], have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly AQP4 dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.<br /> (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.<br /> (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.<br /> (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature communications, 2020. 11(1).

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the Authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine B fluorescence as the proxy for cell volume dynamics. Using this approach, they have performed a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume "signal" in response to the AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key findings are that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume dynamics after spreading depolarizations. This study is perceived as potentially highly significant. However, several technical caveats could be considered better and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically sound study, in which the Authors employed a number of complementary ex vivo and in vivo techniques. The presented results are of interest to the field and potentially highly significant.

      (2) The innovative use of sulforhodamine B for in situ measurements of astrocyte cell volume dynamics is thoroughly validated in brain slices by quantifying changes in sulforhodamine fluorescence in response to hypoosmotic and hyperosmotic media.

      (3) The combination of cell volume measurements with registering functional outcomes in both astrocytes and neurons (cell-specific GCaMP6 signaling) is appropriate and adds to the significance of the work.

      (4) The use of ChR2 optogenetics for producing spreading depolarization allows to avoid many complications of chemical manipulations and much appreciated.

      Remaining limitations:

      (1) In the opinion of this reviewer, the effects of TGN-020 are not entirely consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically, genetic deletion of AQP4 reduces plasmalemmal water permeability in astrocytes by ~two-three-fold (when measured at 37oC, E. Solenov et al., AJP-Cell, 2004). This difference is significant but thought to have limited impact on steady-state water distribution. To the best of this reviewer's knowledge, cultured AQP4-null astrocytes do not show changes in degree of hypoosmotic swelling or hyperosmotic shrinkage. Thus, the findings of Solenov et al. are not (entirely) congruent with the conclusions of the current manuscript.

      Also, as noted by the Authors, the AQP4 knockout does not modify astrocytes swelling induced by hypoosmotic solution in brain slices (T.R. Murphy et al., Front Neurosci., 2017), further suggesting that AQP4 is not a significant rate-limiting factor for water movement across astrocyte membranes.

      The Authors do discuss the above-mentioned discrepancies and explain them by the context-dependent changes in water fluxes. Nevertheless, with these caveats in mind, it would be highly desirable to utilize an independent method measuring astrocytic volume and extracellular volume fraction.

      (2) As noted by this reviewer and now discussed by the Authors, changes in ADC signal (presented in in Fig. 5) may be confounded by the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes water fluxes across pia matter which is highly enriched in AQP4. If this is the case, the proposed brain water accumulation may be explained by factors other than astrocytic water homeostasis. This caveat certainly deserves further experimental exploration.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.  

      Strengths:  

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.  

      Weaknesses:  

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design, and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or any other aspect optimized for any of the reactors used in the study, and if not, how were the values used in the study determined?  

      Thank you for your thoughtful comments. According to your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors  (Figure 6—figure supplement 1). We found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi: 10.5966/sctm.2015-0253)). We cited these previous studies in the Results and Materials and Methods section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Public Review):  

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.  

      Another potential advantage that perhaps wasn't well discussed in the manuscript is the reported suspension culture system does not require additional ECM to provide biophysical support for iPSC, which differentiates from previous studies using hydrogel and this should further simplify the hiPSC culture protocol.  

      Interestingly, although several hiPSC suspension media are currently available commercially, the content of these suspension media remained proprietary, as such the signaling that represses differentiation/maintains pluripotency in hiPSC suspension culture remained unclear. This study provided clear evidence that inhibition of the Wnt/PKC pathways is critical to repress spontaneous differentiation in hiPSC suspension culture.  

      I have several concerns that the authors should address, in particular, it is important to benchmark the reported suspension system with the current conventional culture system (eg adherent feeder-free culture), which will be important to evaluate the usefulness of the reported suspension system.  

      Thank you for this insightful suggestion. In this revised manuscript, we have performed additional experiments using conventional media, mTeSR1 (Stem Cell Technologies, Vancouver, Canada), comparing with the adherent feeder-free culture system in four different hiPSC lines simultaneously. Compared to the adherent conditions, the suspension conditions without chemical treatment decreased the expression of self-renewal marker genes/proteins and increased the expression levels of SOX17, T, and PAX6 (Figure 4 - figure supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in mTeSR1 medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions, reaching the comparable levels of the adherent culture conditions. These results indicated that these chemical treatments in suspension culture are beneficial even when using a conventional culture medium.

      Also, the manuscript lacks a clear description of a consistent robust effect in hiPSC maintenance across multiple cell lines.  

      Thank you for this insightful suggestion. We have performed additional experiments on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E). Overall, the treatment of LY333531 and IWR-1-endo in the StemFit AK02N medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. Also as above, we have added results using conventional media, mTeSR1, in comparison to the adherent feeder-free culture system in four different hiPSC lines simultaneously. These results show that this chemical treatment consistently produced robust effects in hiPSC maintenance across multiple cell lines using multiple conventional media.

      There are also several minor comments that should be addressed to improve readability, including some modifications to the wording to better reflect the results and conclusions.  

      In the revised manuscript, we have added and corrected the descriptions to improve readability, including some modifications to the wording to better reflect the results and conclusions. 

      Reviewer #3 (Public Review):  

      In the current manuscript, Matsuo-Takasaki et al. have demonstrated that the addition of PKCβ and WNT signaling pathway inhibitors to the suspension cultures of iPSCs suppresses spontaneous differentiation. These conditions are suitable for large-scale expansion of iPSCs. The authors have shown that they can perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs in these conditions. Moreover, the authors have performed a thorough characterization of iPSCs cultured in these conditions, including an assessment of undifferentiated stem cell markers and genetic stability. The authors have elegantly shown that iPSCs cultured in these conditions can be differentiated into derivatives of three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes they have shown that differentiation is comparable to adherent cultures.

      This new method of expanding iPSCs will benefit the clinical applications of iPSCs.  

      Recently, multiple protocols have been optimized for culturing human pluripotent stem cells in suspension conditions and their expansion. Additionally, a variety of commercially available media for suspension cultures are also accessible. However, the authors have not adequately justified why their conditions are superior to previously published protocols (indicated in Table 1) and commercially available media. They have not conducted direct comparisons.  

      Thank you for this careful suggestion. In this revised manuscript, we have added results using a conventional medium, mTeSR1 (Stem Cell Technologies), which has been used for the suspension culture in several studies. Compared to the adherent conditions using mTeSR1 medium, the suspension conditions with the same medium decreased the ratio of TRA1-60/SSEA4-positive cells and OCT4positive cells and the expression levels of OCT4 and NANOG and decreased the expression levels of SOX17, T, and PAX6 in 4 different hiPSC lines simultaneously (Figure 4 - Supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in the mTeSR1 medium reversed the decreased expression of these undifferentiated markers. With these direct comparisons, we were able to justify why our conditions are superior to previously published protocols using commercially available media.

      Additionally, the authors have not adequately addressed the observed variability among iPSC lines. While they claim in the Materials and Methods section to have tested multiple pluripotent stem cell lines, they do not clarify in the Results section which line they used for specific experiments and the rationale behind their choices. There is a lack of comparison among the different cell lines. It would also be beneficial to include testing with human embryonic stem cell lines.  

      Thank you for this insightful suggestion. In this revised manuscript, we have added results on 5 different hiPSC lines at the same time (Figure 3 C-E). Excuse for us, but it is hard to use human embryonic stem cell lines for this study due to ethical issues in Japanese governmental regulations. The treatment of LY333531 and IWR-1-endo increased the expression of self-renewal marker genes/proteins and decreased the expression levels of SOX17, T, and PAX6 in these hiPSC lines in general. These results indicated that these chemical treatments in suspension culture were robust in general while addressing the observed variability among iPSC lines.

      Additionally, there is a lack of information regarding the specific role of the two small molecules in these conditions.  

      In this revised manuscript, we have added data and discussion regarding the specific role of the two small molecules in these conditions in the Results and Discussion section. For using WNT signaling inhibitor, we hypothesized that adding Wnt signaling inhibitors may inhibit the spontaneous differentiation of hiPSCs into mesendoderm. Because exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of Wnt signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021). For using PKC inhibitors, "To identify molecules with inhibitory activity on neuroectodermal differentiation, hiPSCs were treated with candidate molecules in suspension conditions. We selected these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (GiacomanLozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017))." 

      We also found that the expression of naïve pluripotency markers, KLF2, KLF4, KLF5, and DPPA3, were up-regulated in the suspension conditions treated with LY333531 and IWR-1-endo while the expression of OCT4 and NANOG was at the same levels (Figure 5—figure supplement 2). Combined with RT-qPCR analysis data on 5 different hiPSC lines (Figure 3E), these results suggest that IWRLY conditions may drive hiPSCs in suspension conditions to shift toward naïve pluripotent states.

      The authors have not attempted to elucidate the underlying mechanism other than RNA expression analysis.  

      Regarding the underlying mechanisms, we have added results and discussion in the revised manuscript.  For Wnt activation in human pluripotent stem cells, several studies reported some WNT agonists were expressed in undifferentiated human pluripotent stem cells (Dziedzicka et al., 2021; Jiang et al, 2013; Konze et al, 2014). In suspension culture, cell aggregation causes tight cell-cell interaction. The paracrine effect of WNT agonists in the cell aggregation may strongly affect neighbor cells to induce spontaneous differentiation into mesendodermal cells. Thus, we think that the inhibition of WNT signaling is effective to suppress the spontaneous differentiation into mesendodermal lineages in suspension culture.

      For PKC beta activation in human pluripotent stem cells, we have shown that phosphorylated PKC beta protein expression is up-regulated in suspension culture than in adherent culture with western blotting (Figure 3 - figure supplement 1). The treatment of PKCβ inhibitor is effective to suppress spontaneous differentiation into neuroectodermal lineages. For future perspectives, it is interesting to examine (1) how and why PKCβ is activated (or phosphorylated), especially in suspension culture conditions, and (2) how and why PKCβ inhibition can suppress the neuroectodermal differentiation. Conversely, it is also interesting to examine how and why PKCβ activation is related to neuroectodermal differentiation.

      For these reasons some aspects of the manuscript need to be extended:  

      (1) It is crucial for authors to specify the culture media used for suspension cultures. In the Materials and Methods section, the authors mentioned that cells in suspension were cultured in either StemFit AK02N medium, 415 StemFit AK03N (Cat# AK03N, Ajinomoto, Co., Ltd., Tokyo, Japan), or StemScale PSC416 suspension medium (A4965001, Thermo Fisher Scientific, MA, USA). The authors should clarify in the text which medium was used for suspension cultures and whether they observed any differences among these media.  

      Sorry for this confusion. Basically in this study, we use StemFit AK02N medium (Figure 1-5, 7-9). For bioreactor experiments (Figure 6), we use StemFit AK03N medium, which is free of human and animalderived components and GMP grade. To confirm the effect of IWRLY chemical treatment, we use StemScale suspension medium (Figure 4 - figure supplement 1) and mTeSR1 medium (Figure 4 - figure supplement 2 and Figure 8 - figure supplement 1). In the revised manuscript we clarified which medium was used for suspension cultures in the Results and Materials and Methods section.

      Although we have not compared directly among these media in suspension culture (, which is primarily out of the focus of this study), we have observed some differences in maintaining self-renewal characteristics, preventing spontaneous differentiation (including tendencies to differentiate into specific lineages), stability or variation among different experimental times in suspension culture conditions. Overcoming these heterogeneity caused by different media, the IWRLY chemical treatment stably maintain hiPSC self-renewal in general. We have added this issue in the Discussion section.

      (2) In the Materials and Methods section, the authors mentioned that they used multiple cell lines for this study. However, it is not clear in the text which cell lines were used for various experiments. Since there is considerable variation among iPSC lines, I suggest that the authors simultaneously compare 2 to 3 pluripotent stem cell lines for expansion, differentiation, etc.  

      Thank you for this careful suggestion. We have added more results on the simultaneous comparison using StemFit AK02N medium in 5 different hiPSC lines (Figure 3 C-E) and using mTeSR1 medium in 4 different hiPSC lines (Figure 4 - figure supplement 2). From both results, we have shown that the treatment of LY333531 and IWR-1-endo was beneficial in maintaining the self-renewal of hiPSCs while suppressing spontaneous differentiation.

      (3) Single-cell sorting can be confusing. Can iPSCs grown in suspensions be single-cell sorted?

      Additionally, what was the cloning efficiency? The cloning efficiency should be compared with adherent cultures.  

      Sorry for this confusion. With our method, iPSCs grown in IWRLY suspension conditions can be singlecell sorted. We have improved the clarity of the schematics (Figure 7A). Also, we added the data on the cloning efficiency, which are compared with adherent cultures (Figure 7B). The cloning efficiency of adherent cultures was around 30%. While the cloning efficiency of suspension cultures without any chemical treatment was less than 10%, the IWR-1-endo treatment in the suspension cultures increased the efficiency was more than 20%. However, the treatment of LY333531 decreased the efficiency. These results indicated that the IWR-1-endo treatment is beneficial in single-cell cloning in suspension culture.

      (4) The authors have not addressed the naïve pluripotent state in their suspension cultures, even though PKC inhibition has been shown to drive cells toward this state. I suggest the authors measure the expression of a few naïve pluripotent state markers and compare them with adherent cultures  

      Thank you for this insightful comment. In the revised manuscript, we have added the data of RT-qPCR in 5 different hiPSC lines and specific gene expression from RNA-seq on naïve pluripotent state markers (Figure 3E and Figure 5 - figure supplement 2), respectively. Interestingly, the expression of KLF2, KLF4, KLF5, and DPPA3 is significantly up-regulated in IWRLY conditions. These results suggested that IWRLY suspension conditions drove hiPSCs toward naïve pluripotent state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Overall, I feel that this study is very interesting and comprehensive, but has significant weaknesses in the bioprocessing aspects. More optimization data is required for the suspension culture to truly show that the differentiation they are observing is not an artifact of a non-optimized protocol.  

      Thank you for your thoughtful comments. Following your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors (Figure 6—figure supplement 1). From these optimization experiments, we found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with acceptable stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi:10.5966/sctm.2015-0253). We cited these previous studies in the Results section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Recommendations For The Authors):  

      The following comments should be addressed by the authors to improve the manuscript:  

      (1) Abstract: '...a scalable culture system that can precisely control the cell status for hiPSCs is not developed yet.' There were previous reports for a scalable iPSC culture system so I would suggest toning down/rephrasing this point: eg that improvement in a scalable iPSC culture system is needed.  

      Thank you for this careful suggestion. Following this suggestion, We have changed the sentence as "the improvement in a scalable culture system that can precisely control the cell status for hiPSCs is needed."

      (2) Line 71: please specify what media was used as a 'conventional medium' for suspension culture, was it Stemscale?  

      As suggested, we specified the media as StemFit AK02N used for this experiment. 

      (3) Fig 1E: It's not easy to see gating in the FACS plots as the threshold line is very faint, please fix this issue.  

      As suggested, we used thicker lines for the gating in the FACS plots (Figure 1E).

      (4) Fig 1G-J, Fig 2D-H: The RNAseq figures appeared pixelated and the resolution of these figures should be improved. The x-axis label for Fig 1H is missing.  

      We have improved these figures in their resolution and clarity. Also, we have added the x-axis label as "enrichment distribution" for gene set enrichment analysis (GSEA) in Figures 1H, 5F, and 5- figure supplement 1B.

      (5) Line 103-107: 'Since Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages, and is endogenously involved in the regulation of mesendoderm differentiation of pluripotent stem cells.....'. The two points seem the same and should be clarified.  

      Sorry for this unclear description. We have changed this description as "Exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of WNT signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021; Jiang et al, 2013)." With this description, we hope that you will understand the difference of two points.

      (6) Line 113: 'In samples treated with inhibitors' should be 'In samples treated with Wnt inhibitors'.  

      Thank you for this careful suggestion. We have corrected this. 

      (7) Line 115: '....there was no reduction in PAX6 expression.' That's not entirely correct, there was a reduction in PAX6 in IWR-1 endo treatment compared to control suspension culture (is this significant?), but not consistently for IWP-2 treatment. Please rephrase to more accurately describe the results.  

      Sorry for this inaccurate description. We have corrected this phrase as "there was only a small reduction in PAX6 expression in the IWR-1-endo-treated condition and no reduction in the IWP2-treated condition" as recommended.

      (8) It's critical to show that the effect of the suspension culture system developed here can maintain an undifferentiated state for multiple hiPSC lines. I think the author did test this in multiple cell lines, but the results are scattered and not easy to extract. I would recommend adding info for the hiPSC line used for the results in the legend, eg WTC11 line was used for Figure 3, 201B7 line was used for Figure 2. I would suggest compiling a figure that confirms the developed suspension system (IWR-1 +LY) can support the maintenance of multiple hiPSC lines.  

      Thank you for this insightful suggestion. We have added data on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E) and on hiPSC maintenance across 4 hiPSC lines in suspension culture using mTeSR1 medium simultaneously  (Figure 4 - figure supplement 2). Together, the treatment of LY333531 and IWR-1-endo in these media reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. These results show that these chemical treatment produced a consistent robust effect in hiPSC maintenance across multiple cell lines.

      (9) Line 166: Please use the correct gene nomenclature format for a human gene (italicised uppercase) throughout the manuscript. Also, list the full gene name rather than PAX2,3,5.  

      Sorry for the incorrectness of the gene names. We have corrected them.

      (10) Please improve the resolution for Figure 4D.  

      We have provided clearer images of Figure 4D.

      (11) In the first part of the study, the control condition was referred to as 'suspension culture' with spontaneous differentiation, but in the later parts sometimes the term 'suspension culture' was used to describe the IWR1+LY condition (ie lines 271-272). I would suggest the authors carefully go through the manuscript to avoid misinterpretation on this issue.  

      Thank you for this careful suggestion. To avoid this misinterpretation on this issue, we use 'suspension culture' for just the conventional culture medium and 'LYIWR suspension culture' for the culture medium supplemented with LY333531 and IWR1-endo in this manuscript.

      (12) Figure 5: It is impressive to demonstrate that the IWR1+LY suspension culture enables large-scale expansion of a clinical-grade hiPSC line using a bioreactor, yielding 300 vials/passage. Can the author add some information regarding cell yield using a conventional adherent culture system in this cell line? This will provide a comparison of the performance of the IWR1+LY suspension culture system to the conventional method.  

      Thank you for this valuable suggestion. We have provided information regarding cell yield using a conventional adherent culture system in this cell line in the Results as "Since the population doubling time (PDT) of this hiPSC line in adherent culture conditions is 21.8 - 32.9 hours at its production (https://www.cira-foundation.or.jp/e/assets/file/provision-of-ips-cells/QHJI14s04_en.pdf), this proliferation rate in this large scale suspension culture is comparable to adherent culture conditions."

      (13) Line 273: For testing the feasibility of using IWR1+LY media to support the freeze and thaw process, the author described the cell number and TRA160+/OCT4+ cell %. How is this compared to conventional media (eg E8)? It would be nice to see a head-to-head comparison with conventional media, quantification of cell count or survival would be helpful to determine this.  

      For this issue, we attempted a direct freeze and thaw process using conventional media, StemFit AK02N in 201B7 line (Figure 8) or mTeSR1 in 4 different hiPSC lines(Figure 8 - figure supplement 1) with or without IWR1+LY. However, since the hiPSCs cultured in suspension culture conditions without IWR1+LY quickly lost their self-renewal ability, these frozen cells could not be recovered in these conditions nor counted. Our results indicate that the addition to IWR1+LY in the thawing process support the successful recovery in suspension conditions.

      (14) More details of the passaging method should be added in the method section. Do you do cell count following accutase dissociation and replate a defined density (eg 1x10^5/ml)?  

      Yes. We counted the cells in every passage in suspension culture conditions. We have added more explanation in the Materials and Methods as below.

      "The dissociated cells were counted with an automatic cell counter (Model R1, Olympus) with Trypan Blue staining to detect live/dead cells. The cell-containing medium was spun down at 200 rpm for 3 minutes, and the supernatant was aspirated. The cell pellet was re-suspended with a new culture medium at an appropriate cell concentration and used for the next suspension culture."

      (15) The IWR1+LY suspension culture system requires passage every 3-5 days. Is there still spontaneous differentiation if the hiPSC aggregate grows too big?  

      Thank you for this insightful question.

      Yes. The size of hiPSC aggregates is critical in maintaining self-renewal in our method as previous studies showed. Stirring speed is a key to make the proper size of hiPSC aggregates in suspension culture. Also, the culture period between passages is another key not to exceed the proper size of hiPSC aggregates. Thus, we keep stirring speed at 90 rpm (135 rpm for bioreactor conditions) basically and passaging every 3 - 5 days in suspension culture conditions.

      (16) Several previous studies have described the development of hiPSC suspension culture system using hydrogel encapsulation to provide biophysical modulation (reviewed in PMID: 32117992). In comparison, it seems that the IWR1+LY suspension system described here does not require ECM addition which further simplifies the culture system for iPSC. It would be good to add more discussion on this topic in the manuscript, such as the potential role of the E-cadherin in mediating this effect - as RNAseq results indicated that CDH1 was upregulated in the IWR1+LY condition).  

      Thank you for this valuable suggestion. We have added more discussion on this topic in the Discussion section as below.

      "Thus, our findings show that suspension culture conditions with Wnt and PKCβ inhibitors (IWRLY suspension conditions) can precisely control cell conditions and are comparable to conventional adhesion cultures regarding cellular function and proliferation. Many previous 3D culture methods intended for mass expansion used hydrogel-based encapsulation or microcarrier-based methods to provide scaffolds and biophysical modulation (Chan et al, 2020). These methods are useful in that they enable mass culture while maintaining scaffold dependence. However, the need for special materials and equipment and the labor and cost involved are concerns toward industrial mass culture. On the other hand, our IWRLY suspension conditions do not require special materials such as hydrogels, microcarriers, or dialysis bags, and have the advantage that common bioreactors can be used. "

      "On the other hand, it is interesting to see whether and how the properties of hiPSCs cultured in IWRLY suspension culture conditions are altered from the adherent conditions. Our transcriptome results in comparison to adherent conditions show that gene expression associated with cell-to-cell attachment, including E-cadherin (CDH1), is more activated. This may be due to the status that these hiPSCs are more dependent on cell-to-cell adhesion where there is no exogenous cell-to-substrate attachment in the three-dimensional culture. Previous studies have shown that cell-to-cell adhesion by E-cadherin positively regulates the survival, proliferation, and self-renewal of human pluripotent stem cells (Aban et al, 2021; Li et al, 2012; Ohgushi et al, 2010). Furthermore, studies have shown that human pluripotent stem cells can be cultured using an artificial substrate consisting of recombinant E-cadherin protein alone without any ECM proteins (Nagaoka et al, 2010). Also, cell-to-cell adhesion through gap junctions regulates the survival and proliferation of human pluripotent stem cells (Wong et al, 2006; Wong et al, 2004). These findings raise the possibility that the cell-to-cell adhesion, such as E-cadherin and gap junctions, are compensatory activated and support hiPSC self-renewal in situations where there are no exogenous ECM components and its downstream integrin and focal adhesion signals are not forcedly activated in suspension culture conditions. It will be interesting to elucidate these molecular mechanisms related to E-cadherin in the hiPSC survival and self-renewal in IWRLY suspension conditions in the future."

      Reviewer #3 (Recommendations For The Authors):  

      (1) I am a bit confused about the passage of adherent cultures. The authors claim that they used EDTA for passaging and plated cells at a density of 2500 cells/cm2. My understanding is that EDTA is typically used for clump passaging rather than single-cell passaging.  

      Sorry about this confusion. We routinely use an automatic cell counter (model R1, Olympus) which can even count small clumpy cells accurately. Thus, we show the cell numbers in the passaging of adherent hiPSCs.  

      (2) Figure 2D- The authors have not directly compared IWR-1-endo with IWR-1-endo+Go6983 for the expression of T and SOX17, a simultaneous comparison would be an interesting data.  

      As recommended, we have added the data that directly compared IWR-1-endo with IWR-1endo+Go6983 for the expression of T and SOX17 in Figure 2D. The addition of IWR-1-endo alone decreased the expression of T and SOX17, but not PAX6, which were similar to the data in Figure 2C.

      (3) Oxygen levels play a crucial role in pluripotency maintenance. Could the authors please specify the oxygen levels used for culturing cells in suspension?  

      Sorry for not mentioning about oxygen levels in this study. We basically use normal oxygen levels (i.e., 21% O2) in suspension culture conditions. We have explained this in the Materials and Methods section.

      (4) Figure supplement 1 (G and H): In the images, it is difficult to determine whether the green (PAX6 and SOX17) overlaps with tdT tomato. For better visualization, I suggest that the authors provide separate images for the green and red colors, as well as an overlay.  

      Sorry for these unclear images. We have provided separate images for the green and red colors, as well as an overlay in Figure 1- figure supplement 1 G and H.

      (5) The authors have only compared quantitatively the expression of TRA-1-60 for most of the figures. I suggest that the authors quantitatively measure the expression of other markers of undifferentiated stem cells, such as NANOG, OCT4, SSEA4, TRA-1-81, etc.  

      We have added the quantitative data of the expression of markers of undifferentiated hiPSCs including NANOG, OCT4, SSEA4, and TRA-1-60 on 5 different hiPSC lines in Figure 3 C-E.

      (6) In Figure 2D, the authors have tested various small molecules but the rationale behind testing those molecules is missing in the text.  

      These molecules are chosen as putatively affecting neuroectodermal induction from the pluripotent state.

      We have added the rationale with appropriate references in the Results section as below.

      "We have chosen these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (Giacoman-Lozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017)) (Figure 2A; listed in Supplementary Table 1). "

      (7) In the beginning authors used Go6983 but later they switched to LY333531, the reasoning behind the switch is not explained well.  

      To explain the reasons for switching to LY333531 from Go6983 clearly, we reorganized the order of results and figures. In short, we found that the suppression of PAX6 expression in hiPSCs cultured in suspension conditions was observed with many PKC inhibitors, all of which possessed PKCβ inhibition activity (Figure 2—figure supplement 2B-D). Also, elevated expression of PKCβ in suspension-cultured hiPSCs could affect the spontaneous differentiation (Figure 3—figure supplement 1A-C). To further explore the possibility that the inhibition of PKCβ is critical for the maintenance of self-renewal of hiPSCs in the suspension culture, we evaluated the effect of LY333531, a PKCβ specific inhibitor. The maintenance of suspension-cultured hiPSCs is specifically facilitated by the combination of PKCβ and Wnt signaling inhibition (Figure 3A and B; Figure 2—figure supplement 1). Last, we performed longterm culture for 10 passages in suspension conditions and compared hiPSC growth in the presence of LY333531 or Go6983. LY333531 was superior in the proliferation rate and maintaining OCT4 protein expression in the long-term culture (Figure 4). Thus, we used IWR-1-endo and LY333531 for the rest of this study.

      (8) I suggest the authors measure cell death after the treatment with LY+IWR-1-endo.  

      Thank you for this valuable suggestion. We have measured cell death after the treatment with LY+IWR1-endo and found that the chemical combination had no or little effects on the cell death. We have added data in Figure 3—figure supplement 2 and the description in the Results section as below. "We also examined whether the combination of PKCb and Wnt signaling inhibition affects the cell survival in suspension conditions. In this experiment, we used another PKC inhibitor, Staurosporine (Omura et al, 1977), which has a strong cytotoxic effect as a positive control of cell death in suspension conditions. The addition of IWR-1-endo and LY333531 for 10 days had no effects on the apoptosis while the addition of Staurosporine for 2 hours induced Annexin-V-positive apoptotic cells  (Figure 3—figure supplement 2). These results indicate that the combination of PKCb and Wnt signaling inhibition has no or little effects on the cell survival in suspension conditions."

      (9) The authors have performed reprogramming using episomal vectors and using Sendai viruses. In both the protocols authors have added small molecules at different time points, for episomal vector protocol at day 3 and Sendai virus protocol at day 23. Why is this different?  

      Thank you for this insightful question. We intended that these differences should be reflected in the degree of the expression from these reprogramming vectors. The expression of reprogramming factors from these vectors should suppress the spontaneous differentiation in reprogramming cells. Sendai viral vectors should last longer than episomal plasmid vectors. Thus, we thought that adding these chemical inhibitors for episomal plasmid vector conditions from the early phase of reprogramming and for Sendai viral vector conditions from the late phase of reprogramming. For future perspectives, we might further need to optimize the timing of adding these molecules.

      (10) The protocol for three germ layer differentiation using a specific differentiation medium requires further elaboration. For instance, the authors mentioned that suspension cultures were transferred to differentiation media but did not emphasize the cell number and culture conditions before moving the cultures to the differentiation media.  

      Sorry for this unclear description. We have added the explanation on the cell number and culture conditions before moving the cultures to the differentiation media in the Materials and Methods section as below.

      "As in the maintenance conditions, 4 × 105 hiPSC were seeded in one well of a low-attachment 6-well plate with 4 mL of StemFit AK02N medium supplemented with 10 µM Y-27632. This plate was placed onto the plate shaker in the CO2 incubator. Next day, the medium was changed to the germ layer specific differentiation medium."

    2. eLife Assessment

      This comprehensive and compelling study presents a robust, cost-effective method for expanding pluripotent stem cells. The authors have identified a media condition that maintains iPSCs in suspension cultures by inhibiting the PKCβ and Wnt signaling pathways. The manuscript is important for the pluripotent stem cell field as it seeks robust and economical approaches to expand iPSCs at scale for high throughput screens and preclinical studies. While the authors have tested their media and protocol on a few lines, given the variability of iPSCs, further testing across more cell lines and in different laboratory settings will be crucial to evaluate its reproducibility.

    3. Reviewer #1 (Public review):

      Summary:

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.

      Strengths:

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.

      Weaknesses:

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or anything other aspect optimized for any of the reactors used in the study and if not, how were the values used in the study determined?

      Post-revision:

      The authors did a commendable job in responding and addressing my comments and concerns in addition to those of the other reviewers. I think this study will be of interest to the field and will add to our collective knowledge on how PSCs react to being cultured in suspension conditions.

    4. Reviewer #2 (Public review):

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.

      Review for second submission:

      In this revised manuscript, the authors provided new data to further support that suspension culture with Wnt/PKC inhibitors can be used for long-term hiPSC maintenance across multiple cell lines, as well as comparison with current benchmark culture system. New discussion sections were also added to put the findings into perspective of current development and the need for hiPSC maintenance culture system, and the figures were updated to improve readability. Overall, the authors have addressed all my concerns in this revised manuscript. Congratulations to the authors on this very interesting study.

    5. Reviewer #3 (Public review):

      In the current manuscript, Matsuo-Takasaki et al. demonstrate that the addition of PKCβ and WNT signaling pathway inhibitors to suspension cultures of iPSCs effectively suppresses spontaneous differentiation. These conditions are well-suited for the large-scale expansion of iPSCs. The authors have shown that, under these conditions, they can successfully perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs. Furthermore, they provide a comprehensive characterization of iPSCs grown in these conditions, including assessments of undifferentiated stem cell markers and genetic stability.

      They have elegantly demonstrated that iPSCs cultured in these conditions can differentiate into derivatives of all three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes, the authors show that differentiation is comparable to that of adherent cultures. This new method of expanding iPSCs has significant potential for clinical applications. The authors also tested these conditions in multiple cell lines and observed consistent results.

      Although the authors have elaborated on the mechanism to some extent-suggesting that PKCβ and WNT signaling pathway inhibition suppresses differentiation and shifts cells toward a naïve pluripotency state in suspension cultures-further research is needed to fully understand this process. Nevertheless, their findings are promising and will be beneficial for producing scalable amounts of iPSCs in controlled conditions.

    1. eLife Assessment

      This important study provides interesting insights into the mechanisms of action of adjuvants. It shows that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of antigen presenting cells, and surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by antigen presenting cells. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively. Evidence in support of these conclusions is solid, and this paper would be of interest to vaccinologists and immunologists.

    2. Reviewer #1 (Public Review):

      Summary:

      Li et al investigated how adjuvants such as MPLA and CpG influence antigen presentation at the level of the Antigen presenting cell and MHCII : peptide interaction. They found that use of MPLA or CpG influences the exogenous peptide repertoire presented by MHC II molecules. Additionally, their observations included the finding that peptides with low-stability peptide:MHC interactions yielded more robust CD4+ T cell responses in mice. These phenomena were illustrated specifically for 2 pattern recognition receptor activating adjuvants. This work represents a step forward for how adjuvants program CD4+ Th responses and provide further evidence regarding expected mechanisms of PRR adjuvants in enhancing CD4+ T cell responses in the setting of vaccination.

      Strengths:

      The authors use a variety of systems to analyze this question. Initial observations were collected in an H pylori model of vaccination with a demonstration of immunodominance differences simply by adjuvant type, followed by analysis of MHC:peptide as well as proteomic analysis with comparison by adjuvant group. Their analysis returns to peptide immunization and analysis of strength of relative CD4+ T cell responses, through calculation of IC:50 values and strength of binding. This is a comprehensive work. The logical sequence of experiments makes sense and follows an unexpected observation through to trying to understand that process further with peptide immunization and its impact on Th responses. This work will premise further studies into the mechanisms of adjuvants on T cells

      Weaknesses:

      While MDP has a different manner of interaction as an adjuvant compared to CpG and MPLA, it is unclear why MDP has a different impact on peptide presentation and it should be further investigated, or at minimum highlighted in the discussion as an area that requires further investigation.

      It is alluded by the authors that TLR activating adjuvants mediate selective, low affinity, exogenous peptide binding onto MHC class II molecules. However, this was not demonstrated to be related specifically to TLR binding. Wonder if some work with TLR deficient mice (TLR 4KO for example) could evaluate this phenomenon more specifically

      Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low stability peptides among the peptides analyzed.

    3. Reviewer #2 (Public Review):

      Adjuvants boost antigen-specific immune responses to vaccines. However, whether adjuvants modulate the epitope immunodominance and the mechanisms involved in adjuvant's effect on antigen processing and presentation are not fully characterized. In this manuscript, Li et al report that immunodominant epitopes recognized by antigen-specific T cells are altered by adjuvants.

      Using MPLA, CpG, and MDP adjuvants and H. pylori antigens, the authors screened the dominant epitopes of Th1 responses in mice post-vaccination with different adjuvants and found that adjuvants altered antigen-specific CD4+ T cell immunodominant epitope hierarchy. They show that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of APCs. Surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by APCs. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively.